Understanding your research goals

In computational text analysis, text is your data, and the text corpus is your dataset. The most important part of any data analysis is knowing the data you are working with: the context in which it was collected, its strengths, its limitations, why it has value and how you relate to it.

Have a healthy amount of skepticism as you move through the iterative process of crafting corpora and performing different text analysis methods. The analysis will always be influenced by what texts you choose to include (or not include) and the perspectives that you and your team bring to the research process.

As you begin to build your corpus, consider:

  • What is the main goal of your research? What texts do you anticipate needing for this project?
  • What kinds of patterns are you interested in exploring, and why?
  • Whose perspectives are incorporated into this text corpus? What historical and social contexts informed the creation of the texts? How might this impact the analysis? 
  • Positionality: How do you (or the research team) relate to the concepts reflected in these texts?
  • What assumptions do you have about the texts and the computational methodologies you'd like to use for analysis?

Further reading