Misleading data visualizations: Too much data
1 media/Screenshot 2020-05-19 19.34.59_thumb.png 2020-05-19T16:35:25-07:00 Tamara Shreiner 72eaa2d1ba1352b75b8a8da73e879a4ceb510ae0 35133 2 plain 2020-05-19T16:39:46-07:00 Tamara Shreiner 72eaa2d1ba1352b75b8a8da73e879a4ceb510ae0This page is referenced by:
-
1
2020-04-30T10:30:24-07:00
A Closer Look at Misleading Data Visualizations
46
There are multiple ways that data visualizations can mislead readers or viewers. This page provides a summary of the ways data visualizations can deceive, and provides some examples.
plain
2020-11-14T06:29:21-08:00
READ THE FOLLOWING INFORMATION AND THEN CHECK OUT THIS CLASSROOM ACTIVITY ON MISLEADING DATA VISUALIZATIONS!
Data visualizations can serve as efficient and effective means to communicate information, but they can also spread misinformation by misrepresenting, distorting, or obfuscating data. Sometimes misleading data visualizations are the result of good intentions – they have simply been created by people who are bad or inexperienced at visualizing data. Other times, the creator may have lacked relevant information because of the limitations of the time or place in which they were creating the data visualization, or because of poor research methods. And, unfortunately, misleading data visualizations can be the result of purposeful manipulation, created and/or disseminated by people who want to sway people’s beliefs or opinions. In any case, it is important to be aware of the ways that data visualizations can confuse or misinform learners.
Common ways that data visualizations can mislead learners include:- Hiding or omitting relevant data that provides context
- Distorting the presentation of data, such as
- Omitting the baseline
- Manipulating the x- and/or y-axis
- Using the wrong visualization
- Going against conventions
- Describing the data inaccurately in annotations, title, or within the visualization itself
- Rendering a data visualization useless, including
- Presenting the wrong data
- Presenting too much data
Hiding or omitting data that provides context
In 2015, this pie chart on federal spending circulated as a meme on social media. The problem with the chart, as Politifact pointed out in an August 2015 article, is that it shows only the government’s discretionary spending, and leaves out mandatory spending on programs like Medicare, Medicaid, and Social Security. These mandatory programs, of which food stamps are a part, account for 60% of all government spending. A Quartz article showed what the chart would look like if mandatory spending was included.
It is difficult to say if the person who created the pie chart-turned-meme purposely left out relevant data on government spending, or if they had a fundamental misunderstanding of the concept. Either way, readers with an incomplete understanding of government spending would be misinformed as a result of the incomplete picture the pie chart presented.
Another way that data visualization creators can manipulate information is to hide data from the data set so that only trends or patterns that suit their narrative are included. One good example of this was used in an article to illustrate the impact of "cherry picking" data. If you look at the following graph, it appears that UK federal debt is abnormally high, which might lead you to believe that the government needs to implement policies to reduce the debt.
However, if you add previous years' data on federal debt, you are presented with a much different picture of current debt compared to that of previous years.
Distorting the presentation of data
While the examples above illustrate how data visualization creators can leave out data points, there are also ways that they can include data points but distort their presentation in such a way that people come away with wholly or partially inaccurate conclusions. One way to manipulate a visualization is by omitting the baseline. For example, as pointed out on the site Statistics How To, this USA Today graph makes it look like welfare between 2009 and 2011 was spiraling out of control. However, notice that the y-axis starts at 95,000,0000.Why does this matter? Omitting the baseline can make small changes look incredibly large. Venngage contributor, Ryan McCready, illustrates this phenomenon by showing how different a graph with an omitted baseline would look if you started at zero. The first graph below is similar to one used by CNN to show who agreed with the decision to remove Terry Schiavo from life support after a years-long court battle. Notice where the y-axis begins. The second graph is the one Venngage created to provide a more accurate comparison of party-affiliated viewpoints.
This is not to say, however, that graphs should always start at zero. In graphs that show changes or trends over time, such as line graphs, it may be that small changes can only be shown by starting the y-axis at a higher number so that you can include smaller increments within a reasonable space. And sometimes small changes can be incredibly impactful and therefore important to capture. Think about how seemingly small changes in body temperature, sodium levels, or white blood cell count can lead doctors to evaluate you as healthy or unhealthy. Small changes are also important with regard to climate, where every degree of difference in average temperatures over a period of time can have a devastating impact on ecosystems. In this case, starting at zero on a graph could diminish or downplay the problem, as these graphs on global climate change illustrate. Notice how the first graph barely shows any change at all, while the second graph from a Quartz article illustrates the changes that catch climate scientists' attention because they know they matter.
Rather than thinking only, "The y-axis should always start at zero," which is a common maxim, it might be better to think, "What difference would this make in the overall message if the baseline were different?"
In addition to adjusting the baseline, data visualization creators can manipulate the x- and/or y-axis to distort the data and the message that it conveys. Two particularly egregious examples appeared amid the COVID-19 crisis. The first, from Fox News, claims to show COVID-19 cases in the United States from late February through March. Clearly the numbers are rising, but they actually downplayed the severity of the line slope by manipulating both of the graph's axes — the x-axis by choosing seemingly randomly spaced dates, and the y-axis by going from zero to 5000, and then from 5000 to 20,000. A more accurate graph, again created by Venngage contributor Ryan McCready, would look like the one below it.
Another COVID-19 graph, released by the Georgia Department of Public Health, was circulated quickly on social media when people noticed that the dates on the x-axis were out of order (You can click on the graph to see them clearly, but they are April 28, 27, 29, May 1, April 30, May 4, 6, 5, 2, 7, April 26, May 3, 8, 9). It is hard to imagine that this wasn't done purposely since the reordering made it appear as though cases across counties were consistently declining, versus the reality that is shown more accurately in the subsequent graph, created by Joey Devilla.
Using the wrong visualization is another way that data visualization creators can distort data. Sometimes this is an innocent mistake by data visualization creators. For example, the following set of graphs was shared by a contributor to The Economist. She pointed out that the original graph she created for an article on opinions about the wisdom of Brexit gave the unintended impression that views were wildly erratic. She admits that a better graph would have the been the one on the right.
The next graph, from the Pew Research Center, uses lines to connect data points on social media use. This is a curious choice, given that there is no need to draw lines between data for discrete categories, because lines typically signal that you are looking at trends rather than comparisons among categories. A multi-set bar graph would have been a better choice here.Finally, readers come to expect that data visualizations follow certain conventions in the display of data, so they might make assumptions about what particular colors, lines, or symbols represent. Therefore, going against conventions qualifies as another way to distort data and mislead readers. For example, choropleth maps that use a spectrum of colors typically choose a spectrum so that the extreme end uses the most eye-catching color. That is not the case with the choropleth map below. There are other issues with the map, too, such as throwing in purple, and going from a subdued to vivid color, and then back to a subdued color.
Describing data inaccurately
Sometimes authors or data visualization creators present accurate data, but they describe the data inaccurately in annotation, surrounding text, the title or caption, or within the data visualization itself. Unless readers take the time to read the data carefully, they will never know that they have been misled, and most readers are apt to trust the author's or creator's interpretation. Consider the data visualization below. Looking at the large title, one might be led to believe that the percentages indicate the percentage of children in the population who suffer each kind of injury. Spinal cord injuries among the general youth population would be alarming to any parent. However, as the smaller print indicates, the chart shows percentages among all reported injuries to children, not among children themselves. The percentage in the general population is actually a fraction of a percent.
In this graphical display, which appeared in a British tabloid newspaper, the author used accurate data about temperature but he didn't say what kind of temperature he's showing. In fact, the graph shows air temperature, which is a poor indicator of global warming, and, in any case, scientists use several indicators to measure global warming. Note that he also shows only short-term trends, thereby committing another sin of data visualization, which is omitting data.
Rendering a data visualization useless
Sometimes data visualizations show and describe data accurately, but the creator does something else to render the data visualization basically useless. One is showing data that does not really fit the intent of the visualization. Here, for example, the data visualization is intended to illustrate the extent and severity of COVID-19, but the creator has used the wrong data. The map shows raw numbers of cases, but percentages of the population in the region would be much more useful for understanding how the virus is impacting communities. In addition, the map uses several bins for ranges under 101, and lumps any number over 101 in one bin. However, 102 cases is much different than 502, and certainly matters when you're considering the severity of the virus.
An even worse case of using the wrong data is taking mismatched data points and using them for comparison. In the following graph, the data on welfare recipients adds up numbers of people in households where one member is a welfare recipient, and compares it to individuals who are employed. In other words, if an adult family member in a household of five receives welfare, the number on the left includes all five people. However, if an adult family member in a household of five is employed, the number on the right only includes that one person.
Other data visualizations just show too much information. They overwhelm the reader to the point that they will probably draw little information from the display. Here are a few examples.
There are numerous ways that data visualization creators and authors can use data to mislead learners. As educators, we have a unique opportunity to help students understand how data can be used to spread misinformation, and to help them develop a critical and discerning eye.