TycheManual:UnbiasedStorytelling

From Tyche Insights
Jump to navigation Jump to search

Introduction

The purpose of this article is to provide guidance on neutral, unbiased storytelling.  Thankfully we can draw on resources from the biggest collaborative system - Wikipedia.  Wikipedia has an article focused on Neutral point of view; we can mine this content and combine it with our personal experience and how we work at Tyche Insights. 

It is impossible to remove all bias from all stages of creating data stories, from envisioning the stories through publishing.  Bias awareness and removal are everyone’s responsibility - the author, the editor, the publisher and anyone additional who is assisting such as someone who is creating a data visualization for the article.

Draw on Wikipedia’s Neutral POV

When researching unbiased data stories we used Wikipedia’s Neutral Point of View as a source for what is important.  For guidance that goes beyond what we have suggested above, we can draw important points from Wikipedia’s Neutral POV and utilize them.

Remember bias takes many forms

“Cognitive biases can lead us to poor decisions even when good data is available. Confirmation bias focuses us to only look at data that supports our assumptions. Anchoring bias encourages us to stick with our initial conclusion, even if data emerges that refutes it. And availability bias favors familiar, top-of-mind options over better, lesser known alternatives. These biases are unconscious, and require us to explicitly challenge ourselves, and our assumptions, to overcome them.” -  Dr. Paul Barth

Building on Dr. Barth’s comments, we want everyone involved in creating, editing and publishing a data story to be aware of how bias enters the storytelling process.  Bias can occur in the framing of a question, the acquisition and analysis of data, the headline and salient points of an article, the presentation of visualizations and maps, and so much more.  

Crafting research questions

Questions that your story will answer (and typically the headlines that accompany them) are the entry point for your analysis and for the reader.  Our foremost suggestion is to not use leading words and phrases that suggest or imply an answer.  “Should a responsible city be investing in sidewalks?”, “What does the data suggest about how happy citizens are with Police Department performance?”

[we may include these links as well for more practical advice:

What are the best practices for ensuring research questions are neutral and unbiased?How to Write a Research Question for 2025: Types, Steps, and Examples | Research.com

Using primary data sources

Any Tyche Insights story uses public data - open data or obtained through a FOIA (USA, UK), AITA (CAN), or similar laws.  Public data should be treated as fact, however we know through experience that data can be wrong, the collection of data should come with a bias, and data changes and may not reflect current state.  

  1. Data sources should be cited, with vintage noted.  If more recent or better data is not being used then an explanation is appropriate
  2. Any filtering or removal of data should be explained - e.g. removing the first months of data from a multiyear dataset where the data owner was clearly facing process startup challenges
  3. Any data hygiene efforts (e.g. cleaning up a postcode field, extracting values from a dirty alphanumeric field) should be explained

Using secondary sources

Any data story could make use of secondary sources that are not public data.  For example, this could include data from an industry trade group, a journalistic program or organization, an academic research project that collected data or conducted analysis, or a commercial data company.

Secondary sources can assist in providing additional context or fill in gaps in the story.  For example, you may find that the city you’re studying spends X% of their budget on Y.  You may find that a secondary source has analyzed the median amount that cities spend on Y and you could use that piece of information.  

You may also use a secondary source as a contrasting and potentially erroneous data point.  For example, an industry trade might make a data-based claim; your analysis might support, partially agree with or debunk these claims.  

When using secondary sources:

  1. Secondary sources should be used for contrast or measurement but should not be the focal point of the data story.
  2. Examine secondary sources to ensure credibility.  A good secondary source should have a methodology statement and/or an explanation of the data and analysis.  
  3. Secondary sources must be credited.
  4. Secondary sources should be examined to ensure you are aligning with any terms and conditions for using the data source.  

Using quotes and official comments

Quotes and official comments can provide important context to a data story.  Researching a data story about a city’s budget can benefit from a quote from the mayor about his or her budgetary goals.  An article about the increasing costs of road repair can benefit from an economist speaking about increased raw material costs.

Our guidance on quotes includes:

  • Direct quotes only; no indirect quotes or claims about what a person said
  • Credit the source and link the quote.  If the quote is behind a paywall, state that
  • Quotes should inform and provide context to the article

Conveying opinions

Wikipedia’s Neutral POV has an excellent, succinct section on presenting opinions that we will use.  

Choosing words

Wikipedia’s “Words to Watch” is our starting point for word choice guidance.  In addition, when telling data stories we will pay attention to words that are subjective superlatives and characterizations of quantities (“lots”, “huge”, “many”)