National Longitudinal Survey of Youth (1979 – 2012) is a longitudinal project that follows a sample of American youth born between 1957-64 on various life aspects from 1979 to 2012. The data set provided below is a subset of this database, focusing on variables of 4 main topics: socioeconomic status, employment, education, and marriage. Some recommended statistical analysis techniques to be applied are multiple regression, time series analysis, logistic regression, and ANOVA.
Dataset 1: Individual by Year Level
Download: CSV (39.1MB) STATA (39.1MB)
Dataset 2: Individual Level
Download: CSV (412KB) STATA (527KB)
These data from NIBRS include the nature and types of specific offenses in the incident, characteristics of victim(s) and offender(s), and types and values of property stolen and recovered.
Dataset 1: Individual Incident Level
Download: CSV (1.03GB) STATA (1.04GB)
Dataset 2: State By Date Level
Download: CSV (8.84MB) STATA (12.0MB)
No weighting variable: the estimate is that about 50% of the population knows that the Jewish Sabbath starts on Friday.
Appropriately weighted data: The estimate changes by about 5 percentage points, suggesting that only 45% of the population knows the correct start time.
Full disclosure: I approach this topic simultaneously from the perspective of a social scientist and as the instructor of a traditional introductory statistics class for over twenty years. I am, thus, myself part of the problem. While I am mainly following the dictates of some of the most popular text books, it is fully within my power to diverge from the book. When I do not do so, it is really my own fault—a sheep following the sheep dogs.
Our worst failure as statistics teachers is to teach as if all or most of the data that our students will engage with in their future careers are from simple random samples. Continue reading →