Testing Weighted Data

In previous posts we discussed the challenges of accounting for weights in stratified random samples. While the calculation of population estimates is relatively standard, there is no universally accepted norm for statistical inference for weighted data. However, some methods are more appropriate than others. We will focus on examining three different methods for analyzing weighted data and discuss which is most appropriate to use, given the information available.

Three common methods for testing stratified random samples (weighted data) are:

  • The Simple Random Sample (SRS) Method assumes that the sample is an unweighted sample that is representative of the population, and does not include adjustments based on the weights that are assigned to each entry in the data set. This is the basic chi-square test taught in most introductory statistics classes.
  • The Raw Weight (RW) Method multiplies each entry by their respective weight and runs the analysis on this adjusted weighted sample.
  • The Rao-Scott Method takes into account both sampling variability and varibility among the assigned weights to adjust the chi-square from the RW method.

One example of a data set which incorporates a weight variable is the Complementary and Alternative Medicine (CAM) Survey, which was conducted by the National Center for Health Statistics (NCHS) in 2012. For the CAM survey, NCHS researchers gathered information on numerous variables such as race, sex, region, employment, marital status, and whether each individual surveyed used various types of CAM. In this dataset, weights were assigned based on race, sex, and age.

Among African Americans who used CAM for wellness, we conducted a chi-square test to determine whether there was a significant difference in the proportion of physical therapy users in each region. Below is a table comparing the test statistics and p-values for each of the three statistical tests:

blogpost1

The SRS method assumes that we are analyzing data collected from a simple random sample instead of a stratified random sample. Since the proportions in our sample do not represent the population, this method is inappropriate. The RW method multiplies each entry by their weight giving a slightly more representative sample. While this method is useful for estimating populations, the multiplication of the weights tends to give p-values that are much too small. Thus, both the SRS and RW methods are inaccurate methods for testing this data set. The Rao-Scott method involves adjustments for non-SRS sample designs as well as accounting for the weights, resulting in a better representation of the population.
Try it on your own!
Through a summer MAP with Pam Fellers and Shonda Kuiper, we created a CAM Data shiny app. Go to this app and compare how population estimates and test statistics can changes based upon the statistical method that is used. For example, select the X Axis Variable to be Sex and the Color By variable to be Surgery. Examine the chi-square values from each of the three types of tests. Which test gives the most extreme p-value? The least extreme? You can also find multiple datasets and student lab activities giving details on how to properly analyze weighted data here.

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Please like & share:

Understanding Population Estimates Based Upon Stratified Random Samples

When a researcher is interested in examining distinct subgroups within a population, it is common to use a stratified random sample to better represent the entire population. This method involves dividing the population of interest into several small subgroups (called strata) based on specific variables of interest and then taking a simple random sample from each of these smaller groups. To account for stratified random samples, weights are used to better estimate population parameters.

Many people fail to recognize that data from a stratified random sample should not treated as a simple random sample (SRS), as Kathy Kamp, Professor of Anthropology, mentions in an earlier blog post. The following example explains why it is important to treat stratified random samples and SRS differently.

In 2010, CBS and the New York Times conducted a national phone survey (a stratified random sample) of 1,087 subjects as part of “a continuing series of monthly surveys that solicit[ed] public opinion on a range of political and social issues” (ICPSR 33183, 2012 March 15). In addition to political preference, they gathered information on race, sex, age, and region of residence.

The figure below demonstrates how population estimates vary depending on the use of weights. The unweighted graph incorrectly overestimates the number of females in the democratic party (52% Democrat and 40% Republican). This leads to an incorrect overestimate of the number of democrats in the nation. However, when weights are properly incorporated into the analysis we see that the ratios are actually much closer (46% Democrat and 45% Republican).

 

karincombined

 

As demonstrated above, there is a difference between the weighted and unweighted graphs and resulting proportions. Specifically, the number and percent of Republican supporters increases when we take into account the weights. The weighted graph and proportions give a more accurate estimation of Political Preference by Sex in the population than the unweighted graph.

Try it on your own!

Through a summer MAP with Pam Fellers and Shonda Kuiper, we have created a Political Data app using this dataset. Follow this link in  to view the influence of weights on the population estimates for all the subgroups within this dataset. For example, select the X Axis Variable to be “Region” and the Y Axis Variable to be “Political Preference”. What do you notice about the weighted graph in comparison to the unweighted graph? You can also find datasets and several student lab activities giving details for proper estimation and testing for survey (weighted) data at this website.

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Please like & share:

Data Across the Curriculum: Helping the Local and the International with Consulting Research

Students in Monty Roper’s Anthropology and Global Development Studies classes gain practical experience in fieldwork, data analysis, and ways to deal effectively with clients when they act as consultants for both local organizations in Grinnell and internationally in an agricultural village in Costa Rica.  The clients they work with get free research which is presented to them both in the form of an oral consultation and in a written report.

 


From left: Roni Finkelstein ’15, Ellen Pinnette ’15, Liberty Britton ’14, Rosalie Curtain ’15, Emily Nucaro ’14, Ben Mothershead ’15, Zhaoyi Chen ’14, and M’tep Blount ’15, listen to Juan Carlos Bejarono explain the palm growing process.

 

For a Global Development Studies/Anthropology seminar, students prepare research plans during the first half of the semester and then travel to a rural agricultural community in Costa Rica to spend the two weeks of spring break collecting data which is then analyzed and written up during the remaining weeks of the semester.  The first year of the project, the class conducted an in-depth community development diagnostic.  Since then, they have investigated a variety of rural development issues, mainly focusing on tourism, women’s empowerment, and organizational issues and agricultural projects of the town’s two cooperatives.

 

From left: Chloe Griffin ’14 and Samanea Karrfalt ’14 present their research on “Professional Black Hair Care in Grinnell, IA”


From left: Irene Bruce ’15 and Matt Miller ’15 present their research and answer questions.

 

In Grinnell, Monty works with Susan Sanning, Director of Service and Social Innovation, to identify and explore possible collaborations with community partners who have research needs.  In the past, for example, Mid-Iowa Community Action (MICA) was interested in knowing why families dropped out of their Family Development and Self-Sufficiency Program (FaDSS) before their benefits were fully used, Drake Library was interested in what kinds of programming would best serve the town’s “tween” population, and a hair salon wanted to find out whether it was economically viable to invest in special hair care products and services for black customers.

Ideally positive change occurs because of the class’ research.  Grinnell students, Dillon Fischer ’13 and Sarah Burnell ’13, interviewed graduates of Grinnell High School who had gone on to attend college about their preparedness for college academics. According to the GHS Principal, these findings led the school to revise its minimum writing standards, making them more challenging. The local after school youth program, Galaxy, requested a study on donor perceptions and desires and subsequently used the results to write a successful grant proposal for support. This year’s class is planning to do more follow-ups on previous projects to ascertain longer term results.

Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Please like & share:

Teaching Basic Quantitative Concepts with Visualizations

Data do not speak. As has famously been noted, data and especially data displays –whether maps, statistics, or word clouds– can lie or at least be deceptive. Access to easy methods for generating visualizations and analyses may be as dangerous as liberating, unless we are careful as both producers and consumers.

The following three maps all show exactly the same data, but look very different—due to the choices made in display.

natbreaksdensitypopdenquart

The first map uses natural breaks in the data to separate categories. The second uses quartiles, a measure based on medians. For this the states are separated into 4 equal piles and the most densely-populated states are given the darkest color. Note how much variation this group exhibits. While the least dense two groups have only a small range, the range for the most densely populated is huge. Continue reading →

Please like & share: