Logo for Tableau Software

Software Review: Tableau as a Teaching Tool

Tableau is unique and a valuable teaching tool because it provides an easy interface for the creation of charts, graphs and even maps.  Students can explore data in sophisticated ways with only a short training session.  Even better, as students they can get free licenses for the software, allowing faculty to use it for classes without ensuing large financial commitments.

A map showing fatality and even types of different violent events in Africa

What sets Tableau apart from other data visualization or business intelligence software is its intuitive, user-friendly drag-and-drop interface. For more sophisticated applications this is supplemented by a variety of easy to understand menus. By using contextual menus and panels instead of typing in code, Tableau lowers the learning curve needed to create visualizations. For example, creating a line graph or a map is as easy as selecting the variables in question and selecting the appropriate type of visualization.

Classic tables like the one below are easy to construct and can also be augmented with color-coded hotspot analyses.

A highlight table showing the number of violent events happening in Egypt, Libya, South Sudan, and Sudan broken down by Country and Event Type

Tableau provides the opportunity to construct data visualizations that are more complex than those generated by most traditional statistical packages.  For example, the graphic below compares the number of conflicts over time for four North African countries in a fairly normal plot, but add an additional variable, the number of fatalities by varying line thickness.

A line graph showing the trend of the number of violent events in 4 African countries (Egypt, Libya, South Sudan, and Sudan) between 1997 and 2015. The thickness of the lines represent number of fatalities.

For classes working with data, Tableau presents a significant opportunity for instructors to integrate more data into the classroom, especially with students who might not have experience with more advanced statistical software. Making it easier for students to explore and understand data, as well as to ask their own questions through investigative learning, encourages them to gain a deeper appreciation for data as it relates to their discipline. In fact, as of the time of writing, Tableau is currently being successfully used in several of our classes at Grinnell College.

However, Tableau does have its drawbacks. In particular, visualizations created with Tableau are not as customizable as more powerful languages such as R or Javascript. In addition, Tableau is not created for data analysis.  It is a data visualization tool, not a statistical package. Another small downside is that data entered into Tableau must be formatted in a specific way.  While Tableau is able to do some data manipulation, spreadsheet programs like Excel are much easier for this. So, Tableau’s role in classrooms or in research might only be restricted to surface-level explorations of the data in question. Despite this limitation, Tableau remains a tool with great potential, especially in the possibilities it presents to the user in creating quick and easy visualizations.

Please like & share:

5 Things To Do with a Data Set

Clustering

Like prediction and classification, understanding the way the data is organized can help us with analysis of data. One way to tease out the structure of the data is by examining clustering. Based on the patterns shown in the data, we can group individual observational units into distinct clusters. Clusters are defined so that observations within each cluster will have similar characteristics.  We can do further analysis of each group as well as comparing between groups. For example, marketers may want to know the customer segments to develop targeted marketing strategies. A cluster analysis will group customers so that people in the same customer segments tend to have similar needs but are different from those in other customer segments.  Some popular clustering methods are multi-dimensional scaling and latent class analysis.

An picture example of customer segmentation

Image source: http://www.dynamic-concepts.nl/en/segmentation/

Classification

The first step is constructing a classification system. The categories can be created based on either theories or observed statistical patterns such as those detected using clustering techniques. The next step is to identify the category or group to which a new observation belongs. For example, a new email can be put in the spam bin or non-spam bin based on the contents of the email. In statistics and machine learning, logistic regression, linear classifier, support vector machine and linear discriminant analysis are popular techniques used for classification problem.

Prediction

Predictive models can be built with the available data to tell you what is likely to happen. Predictive models assume either that a knowledge of past statistical patterning can be used to predict the future or the validity of some type of theoretical model.  For example, Netflix recommends movies to users based on the movies and shows which users have watched in the past.

Can we predict who will be the next president, Clinton or Trump? Yes, we can. Based on the polling data or candidates’ speeches, you can build a predictive model for the 2016 presidential election.  Nate Silver is well-known for the accuracy of his predictions of both political and sporting events. Here is his prediction model on the 2016 presidential election:

A map of polls-only forecast of the 2016 presidential election by Nate Silver

Source: http://projects.fivethirtyeight.com/2016-election-forecast/

Predictive modeling utilizes regression analysis, including linear regression, multiple regression and generalized linear models, as well as some machine learning algorithms, such as random forest tree and factor analysis.   Time series analysis can be used to forecast weather and the sales of a product of next season.

Anomaly Detection

Anomaly detection identifies unexpected or abnormal events. In the other words, we seek to find deviations from expected patterns. Detecting credit card fraud provides an example.  Credit card companies can analyze customers’ purchase behavior and history, so they can alert customers of possible fraud. Here are examples of popular anomaly detection techniques: k-nearest neighbor, neural network, support vector machine and cluster analysis.

Decision Making

One of the most common motivations for analyzing data is to drive better decision making. When a company needs to promote a new product, it can employ data analysis to set the price to maximize profit and avoid price wars with other competitors. Data analysis is so central to decision making that almost all analytic techniques – including not only the ones mentioned above but also geographical information systems, social network analysis, and qualitative analysis – can be applied.

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Please like & share:

Portraits of Donald Trump and Hillary Clinton

Clinton vs. Trump on Immigration: What Do Their Official Websites Reveal?

On her website, Clinton provides positions on over thirty-five issues, while Trump lists positions on just thirteen issues, a number that has grown from a mere seven positions a month ago. Trump and Clinton’s stances on immigration differ dramatically. While the Trump campaign frames immigration as a source of tremendous economic turmoil and a gateway for crime into the United States, Clinton devotes much more of her rhetoric towards demonstrating compassion for immigrants.

Word cloud presenting 30 most commonly used words in Clinton's position on immigration

Word Cloud: 30 Most Commonly Used Words in Clinton’s Position on Immigration

Word cloud presenting 30 most commonly used words in Trump's position on immigration

Word Cloud: The 30 Most Commonly Used Words in Trump’s Position on Immigration

After “immigration,” the most commonly used word on Clinton’s immigration webpage was “families” (16 uses), while for Trump  it was “illegal” (18 uses). Other common Trump words include: “visa,” “states,” officers,” “aliens,” and “ICE” (Immigration Customs Enforcement). All reflect his conceptualization of immigration as a legal issue that necessitates aggressive enforcement.

The immigration statement posted on Trump’s website has twelve references to the economy and seven references to crime. Simultaneously framing immigration as a cause for economic and criminal concern, Trump cited the “horrific crimes” border-crossing criminals have committed against Americans.

Screenshot of Donald Trump's Immigration Reform Webpage

Screenshot of Donald Trump’s Immigration Reform Webpage

Trump attempts to strike fear in the hearts of everyday Americans by explicitly connecting unlawful immigration with infrequent and sensationalized violent crimes. His website graphically describes, “an illegal immigrant from Mexico, with a long arrest record, is charged with breaking into a 64 year old woman’s home [and] crushing her skull and eye sockets with a hammer.” He also links immigration to terroristic crime: “From the 9/11 hijackers, to the Boston Bombers, and many others, our immigration system is being used to attack us.”

For Trump immigration is a cause of economic anxieties for ordinary citizens.  He claims that “U.S. taxpayers have been asked to pick up hundreds of billions of healthcare costs, education costs, welfare costs, etc. Indeed the annual cost of free tax credits alone paid to illegal immigrants quadrupled to $4.2 billion in 201. The effects on jobseekers have also been disastrous, and black Americans have been particularly harmed.”

Many of his policy plans tie the economy to immigration. Beneath a heading that reads “Jobs program for inner city youth,” Trump explains that under his administration, “The J-1 visa jobs program for foreign youth will be terminated and replaced with a resume bank for inner city youth provided to all corporate subscribers to the J-1 visa program.”

“Us Versus Them” provides a consistent theme. Trump’s platform states, “Real immigration reform puts the needs of working people first – not wealthy globetrotting donors,” once again emphasizing his economic concerns regarding immigration while appealing to working-class Americans. He assures voters that “We will not be taken advantage of anymore” by Mexico.

In contrast, Clinton’s position on immigration reform (listed under the “Justice and Equality” section of her issues webpage) uses pro-immigrant and pro-family rhetoric.

Screenshot of Hillary Clinton’s Immigration Reform Webpage

Screenshot of Hillary Clinton’s Immigration Reform Webpage

Unlike her opponent, Clinton does not use the word “illegal” a single time on her immigration webpage.  Notably, she does not use the politically correct alternative “undocumented” either.   Clinton asserts that Americans must “stay true to our fundamental American values; that we are a nation of immigrants, and we treat those who come to our country with dignity and respect—and that we embrace immigrants, not denigrate them.”

Clinton refers to immigration as a crime only once. She claims that: “Immigration enforcement must be humane, targeted, and effective,”  and that she will “focus resources on detaining and deporting those individuals who pose a violent threat to public safety.” While this part of the statement does frame some immigrants as “violent threats,” it positions most as law-abiding members of families.

In stark contrast to Trump, Clinton places a premium on showing compassion for immigrants who face difficult circumstances and emphasizes keeping families together as a top priority of her immigration policy.  Clinton states that she would “Do everything possible under the law to protect families.” She “will end family detention for parents and children who arrive at our border in desperate situations and close private immigrant detention centers,” and even ensure health care to all families including those of immigrants.

Clinton’s page includes words like “heartbreaking” and “sympathetic” to describe the cases of immigrants who do not enjoy full legal status and claims that her plan for immigration reform will “bring millions of hardworking people into the formal economy.”

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Please like & share:

Portraits of Donald Trump and Hillary Clinton

Clinton vs. Trump: Who‘s Winning on Twitter?

An analysis of 3000 tweets sent by the Clinton and Trump campaigns between March and early September this year (2016) reveals stark differences in both content and social media exposure.

Up to September 2016, Trump boasted a number of 11.6 million Twitter followers compared to 8.86 million for Clinton. The average number of retweet for Trumps’ Twitter posts (5493) is also roughly twice as high as that from Clinton’s (2556).

Interestingly, as of July 2016, the number of daily tweets from Clinton’s account doubled to roughly 30 tweets daily while that same figure from Trump hovered around 12 tweets a day. These statistics suggest that Trump is gaining more engagement from Twitters’ users, even though Clinton is also fighting hard to gain presence in social media.

Tweets’ content analysisWord clouds featuring 100 most frequent words in Hillary Clinton's and Donald Trump's tweets

Clearly, both candidates refer to each other consistently in their tweets. Clinton mentions Trump primarily in terms of his disrespect for generals, immigration policies, tax breaks for the wealthy and failure to release tax returns. Popular themes in Clinton’s tweets are “families”, “women” and “jobs”. She tends to use words that suggests the togetherness of the American community as well as a positive attitude towards good changes for America.

Similarly, Trump made many references to Clinton through his posts, although he tended to use her first name, rather than her last. The word “Hillary” or the phrase “Crooked Hillary” was mentioned 547 times. Common topics that Trump addresses include controversy around Clinton’s emails, media manipulation, and criticism towards Clinton’s policies on foreign affairs and immigration.

Interestingly, Trump refers to Democrats, like Bernie Sanders and Barack Obama, as well as Republicans like his former opponents Ted Cruz and Marco Rubio. Unlike Clinton, Trump tends to make a greater use of words with negative tones, which sketch a pretty bleak portrait of the  U.S. , perhaps to make the case for the need to, as his campaign slogan phrases it, “Make American great again”.

Who do they talk to? 

In her tweets, Clinton frequently mentions @realDonaldTrump, but Trump does not seem to tag Clinton or mention her Twitter handle even though his Tweets mention her name consistently. Clinton seems to employ the strategy of engaging with and directly mentioning her opponent’s Twitter account, whereas Trump chooses to simply ignore his rival. It remains to be seen which strategy is more effective in this presidential election.Most common mentions in Trump's tweetsPie chart featuring the most common mentions in Clinton's tweetsWhile Clinton tends to mention Twitter users who are figures of her political party – @BillClinton for example, Trump referred to various right-wing media shows and channels – such as @FoxNews and @MegynKelly. Interestingly, Trump’s posts with mentions have on average 4 times the number of retweets as those without mention. Clinton’s posts with mentions have a 3 times higher number of retweets. Thus, mentions appear to increase the likelihood of retweets. Given that the average number of retweets for Trump’s posts is greater than that of Clinton’s, it seems like Trump’s way of using mentions may help him gain more attention from Twitter’s user community, although there may be other explanations as well.

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Please like & share:

5 Must-See TED Talks on Data Visualization!

Data visualization is crucial in understanding data and identifying hidden connections that matter. Below are 5 TED talks on data visualization you don’t want to miss!

1. Hans Rosling: The best stats you’ve ever seen

Han Rosling, cofounder of the Gapminder Foundation, developed the Trendalyzer software that converts international statistics – such as life expectancy and child mortality rate – into innovative, interactive graphics. The statistics guru is a strong advocate for public access to data and the development of tools that make it accessible and usable for all.  In this classic talk, Rosling highlights the importance of data in debunking myths about the gap between developed countries and the so-called “developing world.” Even though the talk was filmed 10 years ago, it still carries very important and relevant messages.

Watch more of Rosling’s TED talks here.

2. David McCandless: The beauty of data visualization

In this visually captivating talk, data journalist David McCandless suggests that data visualization is a quick solution to our current problem of information overload. Visualizations allow us to see the hidden patterns, identify connections that matter, and tell stories with data. To McCandless, “even when the information is terrible, the visual can be quite beautiful”; this is a controversial claim, however, since the main goal of data visualization should be to communicate information effectively through graphical means.

3. Dave Troy: Social maps that reveal a city’s intersections – and separations

A serial entrepreneur and data-viz fan, Dave Troy takes a people-focused approach to data visualization. Troy has been mapping tweets among city dwellers, revealing what connects communities and what separates them – above and beyond demographic factors such as race or ethnicity. He compares a city to a “giant high school cafeteria” and suggests that we see “how everybody arranged themselves in a seating chart”, arguing that “maybe it’s time to shake up the seating chart a little bit” to reshape our cities.

4. Eric Berlow & Sean Gourley: Mapping ideas worth spreading

An ecologist and a physicist, Eric Berlow and Sean Gourley, collaborate in this presentation to create stunning 3D visualizations demonstrating the interconnectedness of ideas. Taking 4,000 TEDx talks from 147 countries representing 50 languages, they explore their “meme-omes” – the mathematical structures that underlie the ideas behind these talks – and discover similarities between seemingly unconnected topics. Berlow and Gourley also broke down complex themes into multiple more specific ones, seeing what topics resonated with viewers and what kind of audience looked at what topic. To Gourley, mapping ideas in this way will help us “to see what’s being said, to see what’s not being said, and to be a little bit more human and, hopefully, a little smarter.”

5. Manuel Lima: A visual history of human knowledge

Founder of VisualComplexity.com Manuel Lima, described by Wired Magazine as “the man who turns data into art,” explains the visual metaphor shift from the tree to the network as “a new lens to understand the world around us.” Lima argues that the tree – an important tool to map everything from genealogy to systems of law to Darwin’s “Tree of Life” – is being replaced by a new metaphor – the network. Rigid structures are evolving into interdependent systems, and networks emerge to embody the nonlinearity, decentralization, interconnectedness, and multiplicity of ideas and knowledge. The shift in visual metaphor also represents a new way of thinking – one that is critical for us to solve many complex problems we are facing.

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Please like & share: