Sentiment Analysis of a Podcast

 

There has been an increase in the exploration of text as a rich data source. Quantifying textual data can reveal trends, patterns, and characteristics that may go unnoticed initially by human interpretation. Combining quantitative analyses with computing capabilities of modern technology allows for quick processing of substantially large amounts of text.

Here we present a sentiment analysis of an intriguing form of text – podcast transcripts – to provide a discussion on the process of text analysis.

Podcast transcripts are a unique form of text because their initial intent was to be listened to, not read, creating a more intimate form of communication. The text used in this example are the transcripts from the NPR “Serial” podcast hosted by Sarah Koenig. “Serial” explores the investigation and trial of Adnan Syed who was accused of the murder of his girlfriend in 1999. The podcast consists of 12 episodes averaging 43 minutes and 7,480 words ­­each. Here we examine the 12 episodes together as a whole text.

Sentiment analysis involves processing text data to identify, quantify, and extract subjective information from the text. Using the available tools from the Tidyr package in R, we can examine the polarity (positive or negative tone) and the emotional association (joy, anger, fear, trust, anticipation, surprise, disgust, and sadness) of the text. We present one method of sentiment analysis which involves referencing a sentiment dictionary or list of words coded based on the objective. For examining the polarity, each word is given a positive, negative, or neutral value and for examining the emotions, each word is tagged with any associated emotions. As an example, the word “murder” is coded as negative and tagged with fear and sadness. We chose to use the NRC sentiment dictionary for this analysis as it is the only one that includes emotions, and it was created as a general purpose lexicon that works well for different types of text.

Starting with an overall visualization of the emotions and polarity of the podcast, a bar graph (Figure 1) displays the percentage of the text characterized by each emotion. In examining the text in this way, a particularly intriguing discovery is that the most common emotion is trust, which may be surprising for a podcast about a murder investigation and trial. The next most common emotion is anticipation. This confirms what one may expect in the context of podcasts: hosts would want to keep their listeners interested in the story so anticipation would play a key role in getting people to listen regularly.

Figure 2 shows that overall this text is positive as a larger percent of the words are coded as positive. Looking closer at which words occur most often within a specific sentiment or emotion, a sorted word cloud allows one to visually identify the most commonly used words coded as positive or negative.

The most frequently used negative words are crime, murder, kill, and calls. The most frequently used positive words are friend, talk, police, and pretty. It is important to examine the context of the most common words. Consider the word “pretty”, in the text “pretty” was used as an adverb not as an adjective (e.g. “I’m pretty sure I was in school. I think– no?”) All 53 instances of “pretty” in the text were used to show uncertainty.  However, the NRC dictionary defines and codes the word “pretty” as an adjective describing something as being attractive. This mismatch between usage within the text and the dictionary impacts the sentiment analysis. One should carefully consider how to handle such words appropriately.

Similarly we can examine each emotion in more detail. These graphs allow one to see which words were most represented in each emotion.

 

This graph again illustrates the importance of critically examining the results. The word “don” is coded as a top positive word, however, in this text “don” is the name of a person and like the other names it should be coded as neutral. However, the NRC lexicon codes the word “don” defined as a gentlemen or mentor. Similar concerns may be present for other words that may have multiple meanings. These words should be appropriately considered, particularly if among the most frequently used words in the text.

These graphs show a few of the many ways to quantify and visualize text data through a sentiment analysis to understand a text more objectively. As text analyses become more prevalent, it is imperative to actively engage in the process and critically examine results paying attention to not only the numbers and graphs but also the subject matter of the text.

 

Please like & share:

Software Review: NVivo as a Teaching Tool

nvivo-logoFor the past few weeks, DASIL has been publishing a series of blog posts comparing the two presidential candidates this year – Hillary Clinton and Donald Trump – using NVivo, a text analysis software. Given the increasing demand for qualitative data analysis in academic research and teaching, this blog post will discuss the strengths and weaknesses of NVivo as a teaching tool in qualitative analysis.

Efficiency and reliability

Using software like NVivo in content analysis can add rigor to qualitative research. Doing word search or coding using NVivo will produce more reliable results than doing so manually since the software rules out human error. Furthermore, NVivo proves to be really useful with large data sets – it would be extremely time-consuming to code hundreds of documents by hand with a highlighter pen.

Ease of use

NVivo is relatively simple to use. Users can import documents directly from word processing packages in various forms, including Word documents and pdfs, and code these documents easily on screen via the point-and-click system. Teachers and students can quickly become proficient in use of this software.

NVivo and social media

NVivo allows users to import Tweets, Facebook posts, and Youtube comments and incorporate them as part of their data. Given the rise of social media and increased interest in studying its impact on our society, this capability of NVivo may become more heavily employed.

Segmenting and identifying patterns 

NVivo allows users to create clusters of nodes and organize their data into categories and themes, making it easy for researchers to identify patterns. At the same time, the use of word clouds and cluster analysis also provides insight into prevailing themes and topics across data sets.

Limitations

While NVivo seems to be a great software that serves to provide a reliable, general picture of the data, it is important to be aware of its limitations. It may be tempting to limit the data analysis process to automatic word searches that yield a list of nodes and themes. While it is alluring to do so, in-depth analyses and critical thinking skill are needed for meaningful data analysis.

Although it is possible to search for particular words and derivations of those words, various ways in which ideas are expressed make it difficult to find all instances of a particular usage of words or ideas. Manual searches and evaluation of automatic word searches help to ensure that the data are, in fact, thoroughly examined.

Once individual themes in a data set are found, NVivo doesn’t provides tools to map out how these themes relate to one another, making it difficult to visualize the inter-relationships of the nodes and topics across data sets. Users need to think critically about ways in which these themes emerge and relate to each other to gain a deeper understanding of the data.

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Please like & share:

Portraits of Donald Trump and Hillary Clinton

Clinton vs. Trump on Immigration: What Do Their Official Websites Reveal?

On her website, Clinton provides positions on over thirty-five issues, while Trump lists positions on just thirteen issues, a number that has grown from a mere seven positions a month ago. Trump and Clinton’s stances on immigration differ dramatically. While the Trump campaign frames immigration as a source of tremendous economic turmoil and a gateway for crime into the United States, Clinton devotes much more of her rhetoric towards demonstrating compassion for immigrants.

Word cloud presenting 30 most commonly used words in Clinton's position on immigration

Word Cloud: 30 Most Commonly Used Words in Clinton’s Position on Immigration

Word cloud presenting 30 most commonly used words in Trump's position on immigration

Word Cloud: The 30 Most Commonly Used Words in Trump’s Position on Immigration

After “immigration,” the most commonly used word on Clinton’s immigration webpage was “families” (16 uses), while for Trump  it was “illegal” (18 uses). Other common Trump words include: “visa,” “states,” officers,” “aliens,” and “ICE” (Immigration Customs Enforcement). All reflect his conceptualization of immigration as a legal issue that necessitates aggressive enforcement.

The immigration statement posted on Trump’s website has twelve references to the economy and seven references to crime. Simultaneously framing immigration as a cause for economic and criminal concern, Trump cited the “horrific crimes” border-crossing criminals have committed against Americans.

Screenshot of Donald Trump's Immigration Reform Webpage

Screenshot of Donald Trump’s Immigration Reform Webpage

Trump attempts to strike fear in the hearts of everyday Americans by explicitly connecting unlawful immigration with infrequent and sensationalized violent crimes. His website graphically describes, “an illegal immigrant from Mexico, with a long arrest record, is charged with breaking into a 64 year old woman’s home [and] crushing her skull and eye sockets with a hammer.” He also links immigration to terroristic crime: “From the 9/11 hijackers, to the Boston Bombers, and many others, our immigration system is being used to attack us.”

For Trump immigration is a cause of economic anxieties for ordinary citizens.  He claims that “U.S. taxpayers have been asked to pick up hundreds of billions of healthcare costs, education costs, welfare costs, etc. Indeed the annual cost of free tax credits alone paid to illegal immigrants quadrupled to $4.2 billion in 201. The effects on jobseekers have also been disastrous, and black Americans have been particularly harmed.”

Many of his policy plans tie the economy to immigration. Beneath a heading that reads “Jobs program for inner city youth,” Trump explains that under his administration, “The J-1 visa jobs program for foreign youth will be terminated and replaced with a resume bank for inner city youth provided to all corporate subscribers to the J-1 visa program.”

“Us Versus Them” provides a consistent theme. Trump’s platform states, “Real immigration reform puts the needs of working people first – not wealthy globetrotting donors,” once again emphasizing his economic concerns regarding immigration while appealing to working-class Americans. He assures voters that “We will not be taken advantage of anymore” by Mexico.

In contrast, Clinton’s position on immigration reform (listed under the “Justice and Equality” section of her issues webpage) uses pro-immigrant and pro-family rhetoric.

Screenshot of Hillary Clinton’s Immigration Reform Webpage

Screenshot of Hillary Clinton’s Immigration Reform Webpage

Unlike her opponent, Clinton does not use the word “illegal” a single time on her immigration webpage.  Notably, she does not use the politically correct alternative “undocumented” either.   Clinton asserts that Americans must “stay true to our fundamental American values; that we are a nation of immigrants, and we treat those who come to our country with dignity and respect—and that we embrace immigrants, not denigrate them.”

Clinton refers to immigration as a crime only once. She claims that: “Immigration enforcement must be humane, targeted, and effective,”  and that she will “focus resources on detaining and deporting those individuals who pose a violent threat to public safety.” While this part of the statement does frame some immigrants as “violent threats,” it positions most as law-abiding members of families.

In stark contrast to Trump, Clinton places a premium on showing compassion for immigrants who face difficult circumstances and emphasizes keeping families together as a top priority of her immigration policy.  Clinton states that she would “Do everything possible under the law to protect families.” She “will end family detention for parents and children who arrive at our border in desperate situations and close private immigrant detention centers,” and even ensure health care to all families including those of immigrants.

Clinton’s page includes words like “heartbreaking” and “sympathetic” to describe the cases of immigrants who do not enjoy full legal status and claims that her plan for immigration reform will “bring millions of hardworking people into the formal economy.”

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Please like & share:

Portraits of Donald Trump and Hillary Clinton

Clinton vs. Trump: Who‘s Winning on Twitter?

An analysis of 3000 tweets sent by the Clinton and Trump campaigns between March and early September this year (2016) reveals stark differences in both content and social media exposure.

Up to September 2016, Trump boasted a number of 11.6 million Twitter followers compared to 8.86 million for Clinton. The average number of retweet for Trumps’ Twitter posts (5493) is also roughly twice as high as that from Clinton’s (2556).

Interestingly, as of July 2016, the number of daily tweets from Clinton’s account doubled to roughly 30 tweets daily while that same figure from Trump hovered around 12 tweets a day. These statistics suggest that Trump is gaining more engagement from Twitters’ users, even though Clinton is also fighting hard to gain presence in social media.

Tweets’ content analysisWord clouds featuring 100 most frequent words in Hillary Clinton's and Donald Trump's tweets

Clearly, both candidates refer to each other consistently in their tweets. Clinton mentions Trump primarily in terms of his disrespect for generals, immigration policies, tax breaks for the wealthy and failure to release tax returns. Popular themes in Clinton’s tweets are “families”, “women” and “jobs”. She tends to use words that suggests the togetherness of the American community as well as a positive attitude towards good changes for America.

Similarly, Trump made many references to Clinton through his posts, although he tended to use her first name, rather than her last. The word “Hillary” or the phrase “Crooked Hillary” was mentioned 547 times. Common topics that Trump addresses include controversy around Clinton’s emails, media manipulation, and criticism towards Clinton’s policies on foreign affairs and immigration.

Interestingly, Trump refers to Democrats, like Bernie Sanders and Barack Obama, as well as Republicans like his former opponents Ted Cruz and Marco Rubio. Unlike Clinton, Trump tends to make a greater use of words with negative tones, which sketch a pretty bleak portrait of the  U.S. , perhaps to make the case for the need to, as his campaign slogan phrases it, “Make American great again”.

Who do they talk to? 

In her tweets, Clinton frequently mentions @realDonaldTrump, but Trump does not seem to tag Clinton or mention her Twitter handle even though his Tweets mention her name consistently. Clinton seems to employ the strategy of engaging with and directly mentioning her opponent’s Twitter account, whereas Trump chooses to simply ignore his rival. It remains to be seen which strategy is more effective in this presidential election.Most common mentions in Trump's tweetsPie chart featuring the most common mentions in Clinton's tweetsWhile Clinton tends to mention Twitter users who are figures of her political party – @BillClinton for example, Trump referred to various right-wing media shows and channels – such as @FoxNews and @MegynKelly. Interestingly, Trump’s posts with mentions have on average 4 times the number of retweets as those without mention. Clinton’s posts with mentions have a 3 times higher number of retweets. Thus, mentions appear to increase the likelihood of retweets. Given that the average number of retweets for Trump’s posts is greater than that of Clinton’s, it seems like Trump’s way of using mentions may help him gain more attention from Twitter’s user community, although there may be other explanations as well.

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Please like & share:

2016 RNC vs. DNC Convention: Night and Day

Using Nvivo, a text-analysis software, DASIL compared Clinton and Trump’s convention speeches to demonstrate the stark contrast between the two presidential candidates. The previous post briefly examined key themes in each candidate’s address using word clouds. This analysis expands on the previous post with a more in-depth comparison of the two candidates’ approaches to the following themes:

Immigration:

Table demonstrating the frequency of mention of the word “immigration” or “immigrant(s)” by count

Table demonstrating the frequency of mention of the word “immigration” or “immigrant(s)” by count

Table demonstrating the frequency of mention of the word “immigration” or “immigrant(s)” as percentage of total number of words in each speech.

Table demonstrating the frequency of mention of the word “immigration” or “immigrant(s)” as percentage of total number of words in each speech.

In Donald Trump’s speech, 10 out of 13 times in which “immigration” or “immigrant(s)” is mentioned, it’s accompanied by words with negative connotation such as “illegal”, “radical”, “dangerous”, or “uncontrolled”. According to Trump, immigration is deemed the cause of poverty, violence, drug issues, unemployment, and terrorism.

In contrast, Clinton presented herself as an advocate for comprehensive immigration integration, which is clearly demonstrated in her convention speech: 2 out of 4 times Clinton mentioned these words, “immigration” or “immigrant(s)” is accompanied by positive words and phrases. She described immigrants as “contributing to our economy” and “hardworking”.

Jobs:

Table demonstrating the frequency of mention of the word “job(s)” by count

Table demonstrating the frequency of mention of the word “job(s)” by count

Table demonstrating the frequency of mention of the word “job(s)” as percentage of total number of words in each speech.

Table demonstrating the frequency of mention of the word “job(s)” as percentage of total number of words in each speech.

Given the long-standing lag in job growth, outlining a vision for jobs creation and income gains is among the top priorities on the two candidates’ agenda. As mentioned in a previous post, Trump held a pessimistic outlook on the American economy: 4 out of 13 “job(s)” words mentioned by Trump are surrounded by words with negative connotation. The Republican nominee talked about the prospect of jobs and wages reduction with Clinton administration and consider regulation “one of the greatest job-killers of them all.”

On the other hand, Hillary Clinton chose to deliver a more hopeful view of the matter. She highlighted the prospect of good-paying jobs and the effectiveness of her policy in job creation. None of out of 18 times she touched upon the subject of employment did she make a negative remark on the issue.

Patriotism

Table demonstrating the frequency of mention of the word "America(ns)" by count and as percentage of total word count

Table demonstrating the frequency of mention of the word “America(ns)” by count and as percentage of total word count

The two presidential candidates frequently mentioned “America(ns)” in their speech, and the word clouds visualize the frequency of the use of these words between Clinton and Trump. In fact, Trump mentioned “America(ns) almost three times as often as Clinton did – both in terms of count (number of times “America(ns)” is mentioned) and percentage (number of times “America(ns)” is mentioned as a percentage of total word count).

Even though both Trump and Clinton embraced patriotism in their convention speeches, they did so in two strikingly different ways. The Republican Party and its presidential nominee portrayed America as a country under attack by all things foreign; the country is in a dark place and Trump is the one to “make America great again.” In contrast to Trump’s nationalism, Clinton talks about American in optimistic tones, emphasizing the family values – faith, community, and togetherness – that middle-class Americans adhere to.

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Please like & share: