Skip to Main Content

Data Research for Labor Economics: Data Evaluation

Doubting Data or Statistics

D. Huff's famous book "How to Lie with Statistics", 1954, describes seven common tactics to misleading with statistics, called ‘statisticulation’:

  • Biased sampling: polling a non-representative group.
  • Small sample sizes
  • Poorly-chosen averages: values across non-uniform populations.
  • Results falling within the standard error:  a survey can only be as accurate as its standard error.
  • Using graphs to create an impression:  accuracy and scale
  • “The semi-attached figure” stating one thing as a proof for something else
  • “Post-hoc fallacy”: incorrectly asserting that there is a direct correlation between two findings

What was true in 1954 is just as true today.

Additional reading on data/ statistical skepticism

Best, J. (2004). More Damned Lies and Statistics: How Numbers Confuse Public Issues. University of California Press.  

Best, J. (2012). Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists. University of California Press. 

Best, J. (2013). Stat-Spotting: A Field Guide to Identifying Dubious Data. University of California Press. 

Herzog, M., Francis, G., & Clarke, A. (2019). Understanding Statistics and Experimental Design: How to Not Lie with Statistics (1st ed. 2019. ed., Learning Materials in Biosciences). Springer.

Zuberi, T. (2001). Thicker than Blood: How Racial Statistics Lie. University of Minnesota Press. 

What's in the Data?

Starting with Data

 

What's in the data?
  • What is a question I can potentially answer with this data?
  • What are the variables?
  • Are there other uses of this data in papers?
  • Can we find one paper support the research?
  • Don't forget to evaluate the quality!
  • Are this data cited by others? Remember to cite the data if you use them!

 

Assessing Data Quality

Four Metrics of Data Quality
  1. Completeness
  2. Validity
  3. Timeliness
  4. Consistency 

Evaluating Economic Data

Reliability of Economic Data
 
Parameters

How is the data collected? 

How selective is the data?

What is included in the data?

 
Presentation

In what context is the data presented?

How is the data used?

Is the data being used to support a particular opinion?

 
Provenance

Who publishes the data? 

What is the political angle?

Who is to credit for the data?