Data do not speak. As has famously been noted, data and especially data displays –whether maps, statistics, or word clouds– can lie or at least be deceptive. Access to easy methods for generating visualizations and analyses may be as dangerous as liberating, unless we are careful as both producers and consumers.
The following three maps all show exactly the same data, but look very different—due to the choices made in display.
The first map uses natural breaks in the data to separate categories. The second uses quartiles, a measure based on medians. For this the states are separated into 4 equal piles and the most densely-populated states are given the darkest color. Note how much variation this group exhibits. While the least dense two groups have only a small range, the range for the most densely populated is huge. Continue reading →
Cartogram of the world based on population size.
Cartograms are spatial depictions that rely on quantitative attributes other than area to size their units. The most common cartograms show the world, and distort it by population or by wealth, but any geographic entity can be transformed into a cartogram. Because they force us to view the map in unfamiliar ways, cartograms provide dramatic visual portrayals of geographic, political, and socio-economic relationships. A look at a cartogram of the world based on population (above) quickly shows the potentially dominant places of India and China in Asia with respect to Russia and the significance of Korea and Japan as well.
The world sized by Gross National Product (GDP).
Quite a different picture is presented by the world sized by Gross Domestic Product (GDP). Here the U.S. and Europe dominate. China is still large,Russia is still small, and the importance of Korea and Japan is still evident.
Continue reading →
No weighting variable: the estimate is that about 50% of the population knows that the Jewish Sabbath starts on Friday.
Appropriately weighted data: The estimate changes by about 5 percentage points, suggesting that only 45% of the population knows the correct start time.
Full disclosure: I approach this topic simultaneously from the perspective of a social scientist and as the instructor of a traditional introductory statistics class for over twenty years. I am, thus, myself part of the problem. While I am mainly following the dictates of some of the most popular text books, it is fully within my power to diverge from the book. When I do not do so, it is really my own fault—a sheep following the sheep dogs.
Our worst failure as statistics teachers is to teach as if all or most of the data that our students will engage with in their future careers are from simple random samples. Continue reading →
One of the monikers that history allots the early 21st century may be “The Age of Data”—not big data or little data—not quantitative data or qualitative data, but data of all sizes and shapes. The vast proliferation of information, particularly that which is now digitally available, provides scholars, students, and the general public both exciting new opportunities for exploration and significant challenges.
Today Grinnell’s Data Analysis and Social Inquiry Lab (known as DASIL, which we pronounce DAZZLE) is inaugurating a resource to assist faculty, students, and anyone else fascinated with data in their exploration of the world. A blog will initiate discussions of ways to find data, appropriate methods for evaluating data quality, and ideas about the analysis and visualization of both quantitative and qualitative data. Resources pages will provide, primarily interactive, visualizations of a wide range of data and data types. Eventually these will be augmented with data sets and class exercises that faculty at Grinnell have prepared and invite others to use and modify as desired.
Continue reading →