Meet Yujing Cao, DASIL’s new data scientist!

This year, DASIL welcomes a new member of our staff, Yujing Cao, who will be serving as the new data scientist. In her position at DASIL, Yujing will bring her expertise in data analysis and visualization to further expand DASIL’s capability to help students and faculty members integrate data analysis into research and classroom work.  In today’s big data era, enormous quantities of data are available, and Yujing will help Grinnell students and faculty explore them.

Yujing Cao is excited about joining DASIL and bringing a new level of data analysis to faculty research and teaching!

Yujing Cao is excited about joining DASIL and bringing a new level of data analysis to faculty research and teaching!

Originally from China, Yujing got her bachelor degree in Statistics from Anhui University. Her passion for data science led her to a PhD program in Statistics at the University of Texas at Dallas, where she obtained her degree in 2016. Her research was on graphical modeling of biological pathways in genomic studies. She is also interested in network analysis, machine learning, and trying different tools for data visualization. In her spare time, she enjoys reading, hiking, and exercising.

Yujing was excited about the position at Grinnell because of her strong interests in teaching and in data visualization. As she puts it:

“I wanted to look for a position which provides opportunities to create interesting data visualizations along with other data analysis work. I love using graphs to tell stories behind different data sets.

Working environment is another factor that led to my decision to come to Grinnell.  I strongly resonate with the core values of a liberal arts education. At Grinnell College, I can work in an academic environment helping faculty and students while promoting the use of data in research and learning.

Yujing also discusses a number of skills crucial to succeed in the field of data science. Data science is an interdisciplinary field requiring knowledge from mathematics, statistics, data mining and machine learning. Statistical knowledge and knowledge from other fields can help form good questions and seek direction, while programming skills (e.g. joining data sets and visualizing data) are needed for implementing our ideas. To be a good data scientist, you should possess strong programming and analytical skills.”

According to Yujing, “One of the most important qualities for any data scientist is curiosity. Curiosity encourages us to dig in and make interesting discoveries about data. Also, good communication skills can make a great data scientist. You should be able to clearly articulate your results and the implications of your findings to others, including other data scientists and people who don’t share a similar background.”

Her tip for students interested in a career in data science is to keep an open mind to learn from different disciplines and sharpen your programming skills.  In addition, a student who is interested in being a data scientist should take advantage of any opportunities to get hands-on projects that use real data.”

Faculty or students interested in meeting with Yujing should drop by DASIL(ARH 130) or her office (Goodnow 103) or contact her via email at caoyujin@grinnell.edu for an appointment.


Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

costex2

Visualizing the Production Function and Cost Curves

Single, static images of data trends aren’t the most effective way to communicate the ways the different elements of an equation or formula contribute to a trend.  This is especially true for introductory economics concepts such as cost curves or the production function. Dynamic, interactive visualizations that allow users to manipulate the variables contributing to a relationship which enables the audience to better understand how equations express trends.

Krit Petrachaianan ‘17 of DASIL programmed a visualization using R that illustrates cost curves and the production function, two core concepts of introductory economics.  DASIL’s visualization allows users to manipulate the different parts of the equations that define cost curves and the production function. For instance, users can manipulate the costs per input (denoted r and w) and the amount of a particular input (denoted K for capital and L for labor). Users can also define the productivity of the firm’s inputs.

Cost curves visualize the costs of producing different levels of output. The total cost of production for a business can be subdivided into fixed and variable costs.  Some costs, such as raw materials and production supplies, change proportionally as more or less of the good or service is produced and are known as variable costs.  Other costs, such as the annual rent or salary of workers, are independent of the level of goods or services a business produces and are known as fixed costs.

The production function shows the relationship between the output produced by a firm from a given amount of inputs (i.e. labor and capital). The productivity of inputs in producing output can vary in three ways: 1) with constant productivity, the additional output produced by a given amount of input is constant as more of the input is used, 2) with diminishing productivity, the additional output produced by a given amount of input declines as more of the input is used, and 3) with increasing productivity, the additional output produced by a given amount of input increases as more of the input is used.

Explore DASIL’s latest R visualization below, as well as in the Graphs section of the Data Visualizations page and in the Economics tab of the DASIL website.

costex3


Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

The Common Mistakes Made in Creating a Data Visualization

Oftentimes the best way to learn about how to do something right is by learning what not to do, especially for how to make good data visualizations. WTF Visualizations is a website that compiles poorly crafted data visualizations from across the web and media. Below is a sampling of some of the visualizations featured that illustrate some of the most common data visualization mistakes:

  • Absence of Proper Scaling

Including proper scaling is essential in accurately representing your data. In the example below, the differentiation between values is misrepresented due to the absence of a clear scaling measure. The 52% measure does not appear to be as large as it should be in comparison to the other bars, and the 13% figure appears to be much larger than 3% when compared to the two 10% figures.

datavizmistake1

  • Too Much Information

While the inclination is to include as much information in visualizations as possible, oftentimes including too much information detracts from the clarity and concision that is essential to good data visualization. The example below perfectly illustrates how including a myriad of different categories can muddle your visualization, as well as the importance of clear axis labels and descriptive titles.

Ensure that the data that you do decide to visualize is comprehensible to your audience: recode categories when there are too many; don’t include measures that illustrate the same phenomenon; don’t include 10 different variables when 3 will do. If need be, include more than one visualization to highlight different sub categories or variables.

datavizmistake2

  • Bad Math

Always double check your math before sharing your visualization to the public. You may run the risk of misrepresenting your data, as well as appearing as though you are not capable of simple arithmetic. The example below illustrates this point perfectly: while the creator uses a pie chart, the sections do not add up to 100%, but rather, 128%. The sections of the pie chart also do not accurately reflect the values they supposedly represent: The “51% Today” section, for instance, should be taking up a little more than half of the pie chart.

datavizmistake3


Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

What Makes a Good Data Visualization?

Being able to represent data in a clear, concise, and engaging way is an essential skill. While an effective poster is key, data visualizations are a tool that enhance the communication of narratives underlying the data. David McCandless, a world-renown data visualization maker and creator of Information is Beautiful, constructed a Venn diagram that depicts the essential elements of a successful data visualization.

DataViz

 

The irony of this data visualization, which aims at serving as an aesthetically-pleasing vehicle for what comprises a good visualization, is that there are several elements that don’t make it a successful one. Firstly, the information he wants to communicate is not immediately obvious. Figuring out how each of the circle categories and their intersections relate to the associated examples (e.g. information x goal = plot) takes time and is distracting. In addition, some of the examples he gives aren’t very descriptive. What does he mean by “pure data viz” in the visual form x information intersection? What about “proof of concept” at the intersection of goal and story? There isn’t enough context available to make sense of these examples and categories. While the visualization is accessible to colorblind audiences (a very important element to a good data visualization), the point that McCandless wants to communicate is lost due to its lack of description and over-complicated use of the Venn diagram model.


Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman

Does Marriage Affect Earning Potential?

Using DASIL’s United States Income Data by Marital Status, Race, and Sex visualization, one can see how the effect of marriage on a person’s earnings is multifaceted in nature: it depends on who we focus on and other factors at play. However, there are general trends that do prevail.

allracesboth

Married people overall have higher earnings, although the difference between divorced people is smaller than that of single people. Married people with a spouse present earned over $33 annually, while single people earned on average well over $10,000 less than married people with a spouse present. While it may appear that being single correlates to lower earnings, inter-related variables may explain some of the earning discrepancies observed.

maritalstatusall

One important variable to consider is the effect of age. As we discuss in another blogpost, workers ages 15-24 earn less than those of other age brackets. Studies suggest that those belonging to the 15-24 age bracket are less likely to be married, so some of the earning trends shown may not be strictly due to marriage. In addition, as illustrated in the aforementioned blogpost, 25-34 year-olds and 65+ year-olds make about the same and the next least age demographic (about $25000 more in 2013 dollars), and 35-64 make about $20,000 more on average. The 35-64 year-olds are more likely to be established in their careers, earning their highest-paying years within this age bracket. So, some earnings trends may be attributed to the pace of a career’s trajectory.

Breaking down by gender, the general trend persists: married men make a lot more than divorced and single men of all races, $44k, $33K, and $20k respectively. Married women have been making more than single men in recent years, averaging about $2K more in 2006 and persisting into 2010. While single women made more than married women in the 80s, the trend has reversed in recent years.

marriedfemale

marriedmale

Breaking down by race, both Asian single men and women make more than any other singles demographically, at both averaging about $21K in 2010. Hispanic single women make the least of all demographics of men and women, at $15.1K, although Black single men are a close second. Earnings of Black single men peaked in 1998, only separated from white men by about a $200 difference. Studies attribute this peak to the economic boom of the 1990s and the transition of Black men into higher-skilled service-industry jobs.

singlefemale

singlemale

Married Hispanic women still make less in comparison to all other married women, at $19.1K, but still substantially more than if they are single. Black females top the earnings compared to women of other races, at $26.6K, with the trend moving more or less in the same way as Asian married women.


Enter your e-mail address to receive notifications of new blog posts.
You can leave the list at any time. Removal instructions are included in each message.

Powered by WPNewsman