Understanding Population Estimates Based Upon Stratified Random Samples

When a researcher is interested in examining distinct subgroups within a population, it is common to use a stratified random sample to better represent the entire population. This method involves dividing the population of interest into several small subgroups (called strata) based on specific variables of interest and then taking a simple random sample from each of these smaller groups. To account for stratified random samples, weights are used to better estimate population parameters.

Many people fail to recognize that data from a stratified random sample should not treated as a simple random sample (SRS), as Kathy Kamp, Professor of Anthropology, mentions in an earlier blog post. The following example explains why it is important to treat stratified random samples and SRS differently.

In 2010, CBS and the New York Times conducted a national phone survey (a stratified random sample) of 1,087 subjects as part of “a continuing series of monthly surveys that solicit[ed] public opinion on a range of political and social issues” (ICPSR 33183, 2012 March 15). In addition to political preference, they gathered information on race, sex, age, and region of residence.

The figure below demonstrates how population estimates vary depending on the use of weights. The unweighted graph incorrectly overestimates the number of females in the democratic party (52% Democrat and 40% Republican). This leads to an incorrect overestimate of the number of democrats in the nation. However, when weights are properly incorporated into the analysis we see that the ratios are actually much closer (46% Democrat and 45% Republican).




As demonstrated above, there is a difference between the weighted and unweighted graphs and resulting proportions. Specifically, the number and percent of Republican supporters increases when we take into account the weights. The weighted graph and proportions give a more accurate estimation of Political Preference by Sex in the population than the unweighted graph.

Try it on your own!

Through a summer MAP with Pam Fellers and Shonda Kuiper, we have created a Political Data app using this dataset. Follow this link in  to view the influence of weights on the population estimates for all the subgroups within this dataset. For example, select the X Axis Variable to be “Region” and the Y Axis Variable to be “Political Preference”. What do you notice about the weighted graph in comparison to the unweighted graph? You can also find datasets and several student lab activities giving details for proper estimation and testing for survey (weighted) data at this website.

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Please like & share:

Improving Nutrition in Poweshiek County One Food Box at a Time

Today, we are sharing an example of community collaboration, emphasizing a practical application of data to produce real-world solutions to policy issues. Mid-Iowa Community Action (MICA), located in Grinnell, IA, partnered with DASIL to evaluate the quality of its food pantry services and determine ways to promote healthier eating among the families it serves.  This partnership allows for the investigation of data, providing the necessary concrete evidence to drive future changes in MICA’s food box policy. Seth hopes that this will inaugurate a shift to more data-driven decision-making at MICA.

Obesity and Type II Diabetes differentially affect the lower-income Americans who are the clients of MICA. This has been largely attributed to financial constraints leaving families with no choice but purchasing the most inexpensive food they can, which is frequently less nutritional. Thus the food pantry is potentially an important potential part of the solution. To learn more about the influence of income on diabetes rates, take a look at this study by the Center for Disease Control and Prevention or explore DASIL’s interactive visualization on factors correlating with diabetes.

Food boxes are distributed monthly to the families MICA serves, providing varying amounts of food based on family size. After a few weeks at MICA, Grinnell Corps Fellow Seth Howard approached his director about conducting a survey to evaluate the need for changes in the food boxes. The goal of the survey was twofold: to assess satisfaction with MICA services, as it had been years since the food services had been adequately evaluated, and to ascertain the demand for healthier foods, different foods, nutritional information, and cooking tips.

Seth surveyed every individual who utilized the food pantry in the month of July using a questionnaire that could be returned anonymously to a submission box.  A total of 195 household took the survey, giving a response rate of 78.9% of the 247 households served in that month. Using a 5-point Likert scale (1-Strongly Negative, 2- Somewhat Negative, 3-Neutral, 4- Somewhat Positive, 5- Strongly Positive), survey takers responded to the frequency with which they use common food box items, as well as answering some questions about what they’d like to see in future food boxes.

As the graphic below shows, overall, MICA households using the food pantry wanted to see healthier items despite being generally satisfied with the food boxes (only 6.15% reported strong or slight dissatisfaction). Providing even better, healthier options will increase satisfaction and drastically boost use of food box contents.

Would you like to receive healthier food items in the monthly box?  72% Yes, 28% No

Continue reading →

Please like & share:

How Traditional Introductory Statistics Textbooks Fail to Serve Social Science Undergraduates

When no weighting variable is used, the estimate is that about 50% of the population know the Jewish Sabbath starts on Friday.

No weighting variable: the estimate is that about 50% of the population knows that the Jewish Sabbath starts on Friday.

When the data is appropriately weighted, the estimate changes by about 5 percentage points.

Appropriately weighted data: The estimate changes by about 5 percentage points, suggesting that only 45% of the population knows the correct start time.

Full disclosure: I approach this topic simultaneously from the perspective of a social scientist and as the instructor of a traditional introductory statistics class for over twenty years. I am, thus, myself part of the problem. While I am mainly following the dictates of some of the most popular text books, it is fully within my power to diverge from the book. When I do not do so, it is really my own fault—a sheep following the sheep dogs.

Our worst failure as statistics teachers is to teach as if all or most of the data that our students will engage with in their future careers are from simple random samples. Continue reading →

Please like & share: