Software Review: NVivo as a Teaching Tool

nvivo-logoFor the past few weeks, DASIL has been publishing a series of blog posts comparing the two presidential candidates this year – Hillary Clinton and Donald Trump – using NVivo, a text analysis software. Given the increasing demand for qualitative data analysis in academic research and teaching, this blog post will discuss the strengths and weaknesses of NVivo as a teaching tool in qualitative analysis.

Efficiency and reliability

Using software like NVivo in content analysis can add rigor to qualitative research. Doing word search or coding using NVivo will produce more reliable results than doing so manually since the software rules out human error. Furthermore, NVivo proves to be really useful with large data sets – it would be extremely time-consuming to code hundreds of documents by hand with a highlighter pen.

Ease of use

NVivo is relatively simple to use. Users can import documents directly from word processing packages in various forms, including Word documents and pdfs, and code these documents easily on screen via the point-and-click system. Teachers and students can quickly become proficient in use of this software.

NVivo and social media

NVivo allows users to import Tweets, Facebook posts, and Youtube comments and incorporate them as part of their data. Given the rise of social media and increased interest in studying its impact on our society, this capability of NVivo may become more heavily employed.

Segmenting and identifying patterns 

NVivo allows users to create clusters of nodes and organize their data into categories and themes, making it easy for researchers to identify patterns. At the same time, the use of word clouds and cluster analysis also provides insight into prevailing themes and topics across data sets.


While NVivo seems to be a great software that serves to provide a reliable, general picture of the data, it is important to be aware of its limitations. It may be tempting to limit the data analysis process to automatic word searches that yield a list of nodes and themes. While it is alluring to do so, in-depth analyses and critical thinking skill are needed for meaningful data analysis.

Although it is possible to search for particular words and derivations of those words, various ways in which ideas are expressed make it difficult to find all instances of a particular usage of words or ideas. Manual searches and evaluation of automatic word searches help to ensure that the data are, in fact, thoroughly examined.

Once individual themes in a data set are found, NVivo doesn’t provides tools to map out how these themes relate to one another, making it difficult to visualize the inter-relationships of the nodes and topics across data sets. Users need to think critically about ways in which these themes emerge and relate to each other to gain a deeper understanding of the data.

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Please like & share:

A Tool for Visualizing Regression Models

Will sales of a good increase when its price goes down? Does the life expectancy of a country have anything to do with its GDP? To help answer these questions concerning different measures, researchers and analysts often employ the use of regression techniques.

Linear regression is a widely-used tool for quantifying the relationship between two or more quantitative variables. The underlying premise is simple: no more complicated than drawing a straight line through a scatterplot! This simple tool is nevertheless used for everything from market forecasting to economic models. Due to its pervasiveness in analytical fields, it is important to develop an intuition behind regression models and what they actually do. For this, I have developed a visualization tool that allows you to explore the way regressions work.

You can import your own dataset or choose from a selection of others, but the default one is information on a selection of movies. Suppose you want to know the strategy for making the most money from a film. In regression terminology, you ask what variables (factors) might be good predictors of a film’s box office gross?

The response variable is the measure you want to predict, which in this situation will be the box office gross (BoxOfficeGross). The attribute that you think might be a good predictor is the explanatory variable. The budget of the film might be a good explanatory variable to predict the revenue a film might earn, for example. Let’s change the explanatory variable of interest to Budget to explore this relationship. Do you see a clear pattern emerge from the scatterplot? Can you find a better predictor of BoxOfficeGross?

If you want to control for the effects of other pesky variables without having to worry about them directly, you can include them in your model as control variables.

Below the scatterplot are two important measures that are used in evaluating regression models: the p-value and the R2 value. What the p-value tells us is the probability of getting our result just by chance. In the context of a regression model, it suggests whether the specific combination of explanatory and control variables really do seem to affect the response variable in some way: a lower p-value means that there seems to be something actually going on with the data, as opposed to the points being just scattered randomly.  The R2 value, on the other hand, tells us how what proportion of the variability in the response (predicted) variable is explained by the explanatory (predictor) variable, in other words, how good the model is. If a model has a low R2 value and is incredibly bad at predicting our response, it might not be such a good model after all.

score vs runtime plot

If you want to predict a movie’s RottenTomatoesScore from its RunTime, for example, the incredibly small p-value might tempt you to conclude that, yes, longer movies do get better reviews! However, if you look at the scatterplot, you might get the feeling that something’s not right. The R2 value tells us this other side of the story: though RunTime does appear to be correlated to RottenTomatoesScore, the strength of that relationship is just too weak for us to do anything with!

Play around with the default dataset provided, or use your own dataset by going to the Change Dataset tab on top of the page. This visualization tool can be used to develop an intuition for regression analysis, to get a feel of a new dataset, or even in classrooms for a visual introduction to linear regression techniques.

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Please like & share:

10 Suggestions for Making an Effective Poster


Written papers are the traditional way to share research results at professional meetings, but poster sessions have been gaining popularity in many fields. Posters are particularly effective for sharing quantitative data, as they provide a good format for presenting data visualizations and allow readers to peruse the information at leisure.  For students they are a great teaching tool, as preparing a good poster also requires clear and concise writing.

Making a poster is easy, but making a really good poster is hard.  I have found the guidelines below helpful to students.  The most important piece of advice, however, is the one true for all writing—write, read and revise; write, read and revise; write, read and revise!

  1. Make your poster using PowerPoint. This will allow you to put in text via text boxes as well as to paste in charts, graphs, tables, maps, and pictures.  It is easy! To get your pictures and text boxes to line up consistently, use snap to grid.  In the Format tab choose Arrange>>Align and then Grid Setting. Select to view the grid and to snap to the grid.  You can set the grid size here as well.
  1. Use a single slide. In the Design Tab pick Page Setup, select custom, and then set the width and height to maximize your slide, given the locally-available paper size. At Grinnell the paper width available is 36”, so we set the width to 45” and the height to 36”.  Use “landscape” for your orientation.
  1. As in a written paper, have a descriptive title. Put the title (in 68 point type or larger) at the top of the poster.  Place your name and college affiliation in slightly smaller type immediately below it.
  1. The exact sections of the poster will vary some depending on the project, but include an abstract placed either under the title or in the upper left column.
  1. As in a written paper, be sure you have a good thesis and present it early in the poster, support it with evidence, then remind your audience of it as you conclude. Finish with a minimum of citations and acknowledgements in the lower right hand corner.
  1. Posters should read sequentially from the upper left, down the left column, then down the central column (if you have one) and finally down the right column. Alternative layouts are possible, but the order in which the poster is read must be obvious.
  1. Use a large font–a minimum of 28 point.
  1. Limit the number of words. Be concise and think of much of your text as captions for illustrations.
  1. Use lots of charts, graphs, maps, and other pictures. Be sure to label your figures and refer to them in the text.
  1. Make your poster attractive. Use color.  Pay attention to layout.  Do not have large empty areas.

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman


Please like & share:

Testing Weighted Data

In previous posts we discussed the challenges of accounting for weights in stratified random samples. While the calculation of population estimates is relatively standard, there is no universally accepted norm for statistical inference for weighted data. However, some methods are more appropriate than others. We will focus on examining three different methods for analyzing weighted data and discuss which is most appropriate to use, given the information available.

Three common methods for testing stratified random samples (weighted data) are:

  • The Simple Random Sample (SRS) Method assumes that the sample is an unweighted sample that is representative of the population, and does not include adjustments based on the weights that are assigned to each entry in the data set. This is the basic chi-square test taught in most introductory statistics classes.
  • The Raw Weight (RW) Method multiplies each entry by their respective weight and runs the analysis on this adjusted weighted sample.
  • The Rao-Scott Method takes into account both sampling variability and varibility among the assigned weights to adjust the chi-square from the RW method.

One example of a data set which incorporates a weight variable is the Complementary and Alternative Medicine (CAM) Survey, which was conducted by the National Center for Health Statistics (NCHS) in 2012. For the CAM survey, NCHS researchers gathered information on numerous variables such as race, sex, region, employment, marital status, and whether each individual surveyed used various types of CAM. In this dataset, weights were assigned based on race, sex, and age.

Among African Americans who used CAM for wellness, we conducted a chi-square test to determine whether there was a significant difference in the proportion of physical therapy users in each region. Below is a table comparing the test statistics and p-values for each of the three statistical tests:


The SRS method assumes that we are analyzing data collected from a simple random sample instead of a stratified random sample. Since the proportions in our sample do not represent the population, this method is inappropriate. The RW method multiplies each entry by their weight giving a slightly more representative sample. While this method is useful for estimating populations, the multiplication of the weights tends to give p-values that are much too small. Thus, both the SRS and RW methods are inaccurate methods for testing this data set. The Rao-Scott method involves adjustments for non-SRS sample designs as well as accounting for the weights, resulting in a better representation of the population.
Try it on your own!
Through a summer MAP with Pam Fellers and Shonda Kuiper, we created a CAM Data shiny app. Go to this app and compare how population estimates and test statistics can changes based upon the statistical method that is used. For example, select the X Axis Variable to be Sex and the Color By variable to be Surgery. Examine the chi-square values from each of the three types of tests. Which test gives the most extreme p-value? The least extreme? You can also find multiple datasets and student lab activities giving details on how to properly analyze weighted data here.

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Please like & share:

Data Across the Curriculum: Using Qualitative Data Analysis in Teaching Spanish

When Spanish Professor Pérez incorporates NVivo, a qualitative research tool, into her teaching of Spanish, she sees it as a way to prepare her students for their future careers. Based on the trajectory of the field, she believes that “the digital humanities are here to stay.” While she realizes that not every student that studies Spanish plans on a career in academia or as a Spanish teacher, she hopes that working with digital technology will prepare her students to adapt to a variety of digital research tools in a wide range of fields.

After learning about NVivo, Professor Pérez decided to try using the program in her own research on festival books. Her initial project included only a small number of texts; however, with NVivo’s capacity for large-scale comparison between digital texts, her project has expanded to include around 700 texts.

Once she was familiar with NVivo, Professor Pérez decided to include a short assignment using the program in her Spanish seminar focused on Miguel de Cervantes’ classic novel, Don Quijote.


SaraSanders ‘14, the 2014-15 DASIL Post-Baccalaureate Fellow, gave an introductory workshop in the class, and Professor Pérez assigned three chapters of the Quijote to each small group of students to analyze digitally. Students then produced reports that included their analytical findings and reflections on NVivo’s usefulness.

So far, Professor Pérez has noted differences in how students respond to NVivo: the majority of her science-major students critiqued the program, wishing that it included detailed quantitative analysis, while humanities majors were usually complimentary. Eventually, she hopes to share further observations about the connection between digital technology and pedagogy at conferences and in a published article. As one of the first professors in Grinnell’s Spanish department to utilize digital analysis in her classes, she also hopes that her experiences with the developing field of digital humanities will facilitate other professors’ explorations of new technologies.


This past summer, Professor Pérez received a Steven Elkes Grant to develop the use of technology in a new course.  With the help of her research assistant, Alex Claycomb ’18, she is in the process of designing a course entitled “Designing Empire: Plazas, Power and Urban Planning in Habsburg Spain and its Colonies,” which integrates two new NVivo assignments as well as work with GIS and mapping.

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Please like & share: