# 5 Things To Do with a Data Set

Clustering

Like prediction and classification, understanding the way the data is organized can help us with analysis of data. One way to tease out the structure of the data is by examining clustering. Based on the patterns shown in the data, we can group individual observational units into distinct clusters. Clusters are defined so that observations within each cluster will have similar characteristics.  We can do further analysis of each group as well as comparing between groups. For example, marketers may want to know the customer segments to develop targeted marketing strategies. A cluster analysis will group customers so that people in the same customer segments tend to have similar needs but are different from those in other customer segments.  Some popular clustering methods are multi-dimensional scaling and latent class analysis.

Image source: http://www.dynamic-concepts.nl/en/segmentation/

Classification

The first step is constructing a classification system. The categories can be created based on either theories or observed statistical patterns such as those detected using clustering techniques. The next step is to identify the category or group to which a new observation belongs. For example, a new email can be put in the spam bin or non-spam bin based on the contents of the email. In statistics and machine learning, logistic regression, linear classifier, support vector machine and linear discriminant analysis are popular techniques used for classification problem.

Prediction

Predictive models can be built with the available data to tell you what is likely to happen. Predictive models assume either that a knowledge of past statistical patterning can be used to predict the future or the validity of some type of theoretical model.  For example, Netflix recommends movies to users based on the movies and shows which users have watched in the past.

Can we predict who will be the next president, Clinton or Trump? Yes, we can. Based on the polling data or candidates’ speeches, you can build a predictive model for the 2016 presidential election.  Nate Silver is well-known for the accuracy of his predictions of both political and sporting events. Here is his prediction model on the 2016 presidential election:

Source: http://projects.fivethirtyeight.com/2016-election-forecast/

Predictive modeling utilizes regression analysis, including linear regression, multiple regression and generalized linear models, as well as some machine learning algorithms, such as random forest tree and factor analysis.   Time series analysis can be used to forecast weather and the sales of a product of next season.

Anomaly Detection

Anomaly detection identifies unexpected or abnormal events. In the other words, we seek to find deviations from expected patterns. Detecting credit card fraud provides an example.  Credit card companies can analyze customers’ purchase behavior and history, so they can alert customers of possible fraud. Here are examples of popular anomaly detection techniques: k-nearest neighbor, neural network, support vector machine and cluster analysis.

Decision Making

One of the most common motivations for analyzing data is to drive better decision making. When a company needs to promote a new product, it can employ data analysis to set the price to maximize profit and avoid price wars with other competitors. Data analysis is so central to decision making that almost all analytic techniques – including not only the ones mentioned above but also geographical information systems, social network analysis, and qualitative analysis – can be applied.

You can leave the list at any time. Removal instructions are included in each message.

# Software Review: NVivo as a Teaching Tool

For the past few weeks, DASIL has been publishing a series of blog posts comparing the two presidential candidates this year – Hillary Clinton and Donald Trump – using NVivo, a text analysis software. Given the increasing demand for qualitative data analysis in academic research and teaching, this blog post will discuss the strengths and weaknesses of NVivo as a teaching tool in qualitative analysis.

Efficiency and reliability

Using software like NVivo in content analysis can add rigor to qualitative research. Doing word search or coding using NVivo will produce more reliable results than doing so manually since the software rules out human error. Furthermore, NVivo proves to be really useful with large data sets – it would be extremely time-consuming to code hundreds of documents by hand with a highlighter pen.

Ease of use

NVivo is relatively simple to use. Users can import documents directly from word processing packages in various forms, including Word documents and pdfs, and code these documents easily on screen via the point-and-click system. Teachers and students can quickly become proficient in use of this software.

NVivo and social media

NVivo allows users to import Tweets, Facebook posts, and Youtube comments and incorporate them as part of their data. Given the rise of social media and increased interest in studying its impact on our society, this capability of NVivo may become more heavily employed.

Segmenting and identifying patterns

NVivo allows users to create clusters of nodes and organize their data into categories and themes, making it easy for researchers to identify patterns. At the same time, the use of word clouds and cluster analysis also provides insight into prevailing themes and topics across data sets.

Limitations

While NVivo seems to be a great software that serves to provide a reliable, general picture of the data, it is important to be aware of its limitations. It may be tempting to limit the data analysis process to automatic word searches that yield a list of nodes and themes. While it is alluring to do so, in-depth analyses and critical thinking skill are needed for meaningful data analysis.

Although it is possible to search for particular words and derivations of those words, various ways in which ideas are expressed make it difficult to find all instances of a particular usage of words or ideas. Manual searches and evaluation of automatic word searches help to ensure that the data are, in fact, thoroughly examined.

Once individual themes in a data set are found, NVivo doesn’t provides tools to map out how these themes relate to one another, making it difficult to visualize the inter-relationships of the nodes and topics across data sets. Users need to think critically about ways in which these themes emerge and relate to each other to gain a deeper understanding of the data.

You can leave the list at any time. Removal instructions are included in each message.

# Clinton vs. Trump on Immigration: What Do Their Official Websites Reveal?

On her website, Clinton provides positions on over thirty-five issues, while Trump lists positions on just thirteen issues, a number that has grown from a mere seven positions a month ago. Trump and Clinton’s stances on immigration differ dramatically. While the Trump campaign frames immigration as a source of tremendous economic turmoil and a gateway for crime into the United States, Clinton devotes much more of her rhetoric towards demonstrating compassion for immigrants.

Word Cloud: 30 Most Commonly Used Words in Clinton’s Position on Immigration

Word Cloud: The 30 Most Commonly Used Words in Trump’s Position on Immigration

After “immigration,” the most commonly used word on Clinton’s immigration webpage was “families” (16 uses), while for Trump  it was “illegal” (18 uses). Other common Trump words include: “visa,” “states,” officers,” “aliens,” and “ICE” (Immigration Customs Enforcement). All reflect his conceptualization of immigration as a legal issue that necessitates aggressive enforcement.

The immigration statement posted on Trump’s website has twelve references to the economy and seven references to crime. Simultaneously framing immigration as a cause for economic and criminal concern, Trump cited the “horrific crimes” border-crossing criminals have committed against Americans.

Screenshot of Donald Trump’s Immigration Reform Webpage

Trump attempts to strike fear in the hearts of everyday Americans by explicitly connecting unlawful immigration with infrequent and sensationalized violent crimes. His website graphically describes, “an illegal immigrant from Mexico, with a long arrest record, is charged with breaking into a 64 year old woman’s home [and] crushing her skull and eye sockets with a hammer.” He also links immigration to terroristic crime: “From the 9/11 hijackers, to the Boston Bombers, and many others, our immigration system is being used to attack us.”

For Trump immigration is a cause of economic anxieties for ordinary citizens.  He claims that “U.S. taxpayers have been asked to pick up hundreds of billions of healthcare costs, education costs, welfare costs, etc. Indeed the annual cost of free tax credits alone paid to illegal immigrants quadrupled to \$4.2 billion in 201. The effects on jobseekers have also been disastrous, and black Americans have been particularly harmed.”

Many of his policy plans tie the economy to immigration. Beneath a heading that reads “Jobs program for inner city youth,” Trump explains that under his administration, “The J-1 visa jobs program for foreign youth will be terminated and replaced with a resume bank for inner city youth provided to all corporate subscribers to the J-1 visa program.”

“Us Versus Them” provides a consistent theme. Trump’s platform states, “Real immigration reform puts the needs of working people first – not wealthy globetrotting donors,” once again emphasizing his economic concerns regarding immigration while appealing to working-class Americans. He assures voters that “We will not be taken advantage of anymore” by Mexico.

In contrast, Clinton’s position on immigration reform (listed under the “Justice and Equality” section of her issues webpage) uses pro-immigrant and pro-family rhetoric.

Screenshot of Hillary Clinton’s Immigration Reform Webpage

Unlike her opponent, Clinton does not use the word “illegal” a single time on her immigration webpage.  Notably, she does not use the politically correct alternative “undocumented” either.   Clinton asserts that Americans must “stay true to our fundamental American values; that we are a nation of immigrants, and we treat those who come to our country with dignity and respect—and that we embrace immigrants, not denigrate them.”

Clinton refers to immigration as a crime only once. She claims that: “Immigration enforcement must be humane, targeted, and effective,”  and that she will “focus resources on detaining and deporting those individuals who pose a violent threat to public safety.” While this part of the statement does frame some immigrants as “violent threats,” it positions most as law-abiding members of families.

In stark contrast to Trump, Clinton places a premium on showing compassion for immigrants who face difficult circumstances and emphasizes keeping families together as a top priority of her immigration policy.  Clinton states that she would “Do everything possible under the law to protect families.” She “will end family detention for parents and children who arrive at our border in desperate situations and close private immigrant detention centers,” and even ensure health care to all families including those of immigrants.

Clinton’s page includes words like “heartbreaking” and “sympathetic” to describe the cases of immigrants who do not enjoy full legal status and claims that her plan for immigration reform will “bring millions of hardworking people into the formal economy.”