The Mass Shooting Epidemic in the United States

An examination of Stanford University’s Mass Shootings of America (MSA) dataset shows why shootings have been making the headlines in the U.S. and gun violence has become a big issue addressed in the campaigns of presidential hopefuls. Stanford MSA defines a mass shooting as “3 or more shooting victims (not necessarily fatalities), not including the shooter. The shooting must not be identifiably gang or drug related” (Stanford Mass Shootings in America, courtesy of the Stanford Geospatial Center and Stanford Libraries).


The dramatic change in the number of mass shootings in the past two years is readily apparent. There were 121 mass shooting events from 1966 to 2009, but 116 just in the past 5 years. 2015 alone had 65 separate instances of mass shootings. In terms of total number of fatalities, the past 7 years are noticeably thicker than earlier years. Even in years with low numbers of mass shootings, such as 1991 which only had 5 incidences, there were a large number of fatalities (47).



The Southern states had the largest numbers of mass shootings in 2015. Florida led with 6. Even though Texas had fewer mass shootings (4), the state sustained the most fatalities, 20. North Dakota and New Hampshire are the only 2 states that have not experienced any mass shootings in the 49 year time period covered by the data (not shown). In 2015 39 mass shootings occurred in residential homes & neighborhoods, while 21 happened in public places. Back in the late 90s, schools were the primary target of mass shooters, with 3 incidents in 1997 and 1999, each.


Most of the mass shootings in the past 8 years have stemmed from a variant of an altercation, be it domestic, legal, financial, or school-related. Of course it can always be argued that all mass shooters have mental health issues, but contrary to popular belief, according to these classifications,  shooters’ mental health issues as a direct motive for shootings  has not increased in recent years, with only one incident in 2015 attributed to mental health issues. Perhaps what’s most troubling is the high number of cases where a motive can’t be identified, 23 in 2015, suggesting the need for further, more comprehensive study into the underlying causes of these mass shootings.

Many pundits largely attribute the US-specific phenomenon to things to lax gun policy. However, any progress to change gun laws, even to fund research into the causes of gun violence, has been (and continues to be) stymied by the gun lobby, led by the National Rifle Association (NRA). Re-examining the nation’s access to guns is imperative, and those in Congress who are funded by the gun lobby need to be open to that re-examination. While the data available is informative, unfettered research is integral to truly understanding the nature of gun violence and to finding effective policy solutions.

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Please like & share:

A Tool for Visualizing Regression Models

Will sales of a good increase when its price goes down? Does the life expectancy of a country have anything to do with its GDP? To help answer these questions concerning different measures, researchers and analysts often employ the use of regression techniques.

Linear regression is a widely-used tool for quantifying the relationship between two or more quantitative variables. The underlying premise is simple: no more complicated than drawing a straight line through a scatterplot! This simple tool is nevertheless used for everything from market forecasting to economic models. Due to its pervasiveness in analytical fields, it is important to develop an intuition behind regression models and what they actually do. For this, I have developed a visualization tool that allows you to explore the way regressions work.

You can import your own dataset or choose from a selection of others, but the default one is information on a selection of movies. Suppose you want to know the strategy for making the most money from a film. In regression terminology, you ask what variables (factors) might be good predictors of a film’s box office gross?

The response variable is the measure you want to predict, which in this situation will be the box office gross (BoxOfficeGross). The attribute that you think might be a good predictor is the explanatory variable. The budget of the film might be a good explanatory variable to predict the revenue a film might earn, for example. Let’s change the explanatory variable of interest to Budget to explore this relationship. Do you see a clear pattern emerge from the scatterplot? Can you find a better predictor of BoxOfficeGross?

If you want to control for the effects of other pesky variables without having to worry about them directly, you can include them in your model as control variables.

Below the scatterplot are two important measures that are used in evaluating regression models: the p-value and the R2 value. What the p-value tells us is the probability of getting our result just by chance. In the context of a regression model, it suggests whether the specific combination of explanatory and control variables really do seem to affect the response variable in some way: a lower p-value means that there seems to be something actually going on with the data, as opposed to the points being just scattered randomly.  The R2 value, on the other hand, tells us how what proportion of the variability in the response (predicted) variable is explained by the explanatory (predictor) variable, in other words, how good the model is. If a model has a low R2 value and is incredibly bad at predicting our response, it might not be such a good model after all.

score vs runtime plot

If you want to predict a movie’s RottenTomatoesScore from its RunTime, for example, the incredibly small p-value might tempt you to conclude that, yes, longer movies do get better reviews! However, if you look at the scatterplot, you might get the feeling that something’s not right. The R2 value tells us this other side of the story: though RunTime does appear to be correlated to RottenTomatoesScore, the strength of that relationship is just too weak for us to do anything with!

Play around with the default dataset provided, or use your own dataset by going to the Change Dataset tab on top of the page. This visualization tool can be used to develop an intuition for regression analysis, to get a feel of a new dataset, or even in classrooms for a visual introduction to linear regression techniques.

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Please like & share:

Modeling Population Growth in Excel

The Malthus and Condorcet Equations, simple formulas that model relatively complex trends in population growth, are now accessible with an Excel calculator that allows the user full control over every component of the equations. Students can use the Excel file to model human population growth under the assumption that a human carrying capacity exists.

The Malthus Equation expresses the growth rate of a population as a function of the current population size and current carrying capacity. Specifically, the growth rate of a population is equal to a Malthusian parameter multiplied by the current population size multiplied by the difference between the current carrying capacity and the current population size. This relationship creates a high growth rate once a population is large enough to reproduce at its full potential, but remains a low growth rate when the population is very small or when a population is nearing its carrying capacity and feeling the effect of constrained resources. The Malthusian parameter is almost invariably between zero and one because a negative Malthusian parameter would lead to a population’s gradual extinction while a Malthusian parameter greater than one would lead to explosive population growth that would greatly exceed the carrying capacity. In the latter situation, unrealistically rapid and extreme periods of growth and contraction would ensue.

The Condorcet Equation expresses the growth rate of the carrying capacity of a population as equal to the growth rate of the population multiplied by a constant termed the Condorcet parameter. The logic behind this mathematical relationship is that the carrying capacity of a population increases or decreases proportionally with the growth rate of a population because an additional person in a population can have a positive or negative effect on the carrying capacity. This implies that a Condorcet parameter greater than one results from a society where an additional individual somehow increases the number of people that can be supported even when taking into account the resources that additional individual consumes; this could result from a situation where there are increasing returns to labor. If doctors cure diseases better when more of them work together, this is reflected by a Condorcet parameter greater than one. A Condorcet parameter between zero and one is most realistic for human populations because the contribution of another person will probably grow the carrying capacity but not by more than one. A negative parameter implies that an additional person would actually lower the carrying capacity; perhaps every additional person would consume natural resources at a rate greater than the previous individual’s rate.

As Cohen (1995 Science 269: 341-346) points out, the equations are not necessarily realistic models of human population growth. There is no consensus about whether or not a human carrying capacity exists. In theory, we as a species might be able to continually develop technology at such a rate that we are unable to approach a carrying capacity. A slowdown in overall human population growth is more likely due to a global increase in income per capita that leads to altered reproductive strategies.

With r=0.1 and c=0.1 as parameters, the population experiences a positive but steadily decreasing growth rate because the carrying capacity increases at 1/10th the rate of population growth, and since population growth slows as the population size approaches the carrying capacity, we observe almost asymptotic behavior. This is a realistic pattern for human population growth if a carrying capacity exists.

Figure 1: with r=0.1 and c=0.1 as parameters, the population experiences a positive but steadily decreasing growth rate because the carrying capacity increases at 1/10th the rate of population growth, and since population growth slows as the population size approaches the carrying capacity, we observe almost asymptotic behavior. This is a realistic pattern for human population growth if a carrying capacity exists.

The calculator defines the Malthus Equation as dP(t)/dt=rP(t)[K(t)-P(t)] and the Condorcet Equation as dK(t)/dt=c dP(t)/dt (See Cohen 1995: 343). The user may enter values for the initial states of r (the “Malthusian parameter”), P(t), (population size), K(t) (carrying capacity), c (“Condorcet parameter”), t_0 (the starting time for the model) and dt (the length of one interval in time) that determine all of the future changes in population size. The rates of change of population and carrying capacity at time t, dP(t)/dt and dK(t)/dt respectively, are determined by the equations. The Malthusian and Condorcet parameters are constant in a growth model provided that there are no exogenous shocks that affect the nature of population or carrying capacity growth. Because of this, they do not vary as a function of t.

To explore the Malthus-Condorcet calculator, please follow this link to an automatic download of the Excel spreadsheet containing the calculator.

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Please like & share:

Highlighting the Importance of Intersectionality in the Gender Pay Gap

The gender pay gap is again receiving much-needed publicity in recent years as a topic of debate between US presidential hopefuls for 2016 and information uncovered from Sony’s email hack this time last year. While the phrase “women get paid 78% of what men are paid” is touted frequently in discussion, the 78% figure is static in dimension. Do all women get paid 78% of what men are paid, or is it just a subset of the female working population?

There is a lot more to the 78% figure than meets the eye, and the intersection of race and gender is important to telling the fuller story behind the 78%, and the wider issue of gender parity in earnings.

Using DASIL’s Pay by Race & Gender visualization, we can see that race plays a significant role in the pay of a full-time working woman and reveals the nuances to the widely-cited 78% figure. Asian women working full-time in the US are (and have been) the subset of women getting paid closest to what all men are getting paid throughout history, at 86% that of men in 2013. However, Asian women were only paid 75% that of Asian men in 2013. On the other end of the spectrum, Hispanic women were disproportionately getting paid only 60% of men’s wages in 2013, the lowest of all recorded races. Hispanic males also earn the lowest in comparison to all men, at 64% of what all men earn (not shown) in 2013. As the graph indicates, the asymmetric trends for Hispanic and Black women have remained relatively constant for the past twenty years.


With regard to part-time labor, however, there is virtually complete gender parity in 2013 when focusing on average figures, with “all women” receiving 99% of what a man earns. When filtering by race, part-time working White and Asian women even get paid more than that of average men; white women receive 106% of what a man earns, and Asian women 101% in 2013. However, racial disparity still persists: both Black and Hispanic women in part-time labor received 85% of what part-time men were paid in 2013, and the closest Black and Hispanic women have been in achieving pay parity with the average man was in 1994.

As this infographic suggests, one reason for full-time and part-time pay disparity can be due to industry: black women are more likely to work in less-lucrative jobs (e.g. service, healthcare) than high-lucrative jobs (e.g. STEM, management). Relatedly, education can be a contributing factor: Hispanic and Black women are less likely to graduate than whites. Yet, even if women of color have the same education levels as their white peers, they are still paid less; there is more contributing to pay disparity than the educational attainment of women of color.


While there is clear cause for more work to be done in bridging the pay gap between men and women, recognizing the multiple dimensions of the issue will be key to creating meaningful and effective policy changes.

Explore more trends with our Pay by Gender and Race visualization here.

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman


Please like & share:

Data Across the Curriculum: Integrating Data Analysis with Narrative in Political Science

From a pedagogical standpoint, Danielle Lussier, Assistant Professor of Political Science, stresses data as a tool for helping students approach problems from multiple perspectives. Working interactively with data allows them to better compare narratives and better understand the research process in both lower-level and upper-level material.

Political science is both a quantitative and qualitative field, so students at all levels of Lussier’s political science classes delve into both data types extensively and build data analytic skills as students progress in the major. Every class taught by Lussier involves data labs that draw on both cross-national data with countries as the unit of measure and on data with individuals as the unit of measure. The labs directly relate to readings, concepts, and/or countries that students study.

At the 100-level, students gain both an introduction to fundamental data concepts such as the construction and measurement of variables and to analytical computer programs like STATA, a statistical package, and ArcGIS, which analyses spatial data. The image below is of a GIS map her introductory political science students make in a data lab.


At the 200-level, Lussier’s students delve into applied data analysis and write in-depth data reports that compare data analyses from the course readings to data analyses that students reconstruct and update from the readings.

At the 300-level, students get the opportunity to pose questions about class readings and use lab time to test their inquiries with actual data from the readings. In addition, Lussier assigns students research modules that allow them to create their own qualitative variables from cross-national data that they then transform into quantitative data, giving students the opportunity to apply the data skills they’ve accumulated in each course level.

The positive impact of incorporating data into classroom work is not lost on students. Students in all levels of her courses are widely receptive to data in coursework and have viewed working with data in her classes as an integral stepping stone to both academic and professional pursuits. Adam Lauretig ’13, the first Post-Baccalaureate Fellow for DASIL, was inspired by Lussier’s data-driven coursework to pursue more advanced courses in spatial statistics, and subsequently created visualizations like the interactive timeline map of historical coups d’etat. Additionally, many of her students have cited the research and data skills developed in her class work as marketable to employers and graduate programs.

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Please like & share: