ProQuest TDM Studio

Cornell University Library students, staff and faculty have access to Proquest TDM Studio, a cloud-based tool enabling text and data mining on content that Cornell University Library licenses from ProQuest. This includes thousands of newspapers, journals, reports, and press releases, as well as Web of Science data and the text of Congressional Hearings. Please refer to the list of ProQuest databases to which Cornell University Library subscribes to see what content is available to text mine.

TDM Studio allows researchers to analyze large textual datasets and offers two levels of working with the data: Visualizations and Workbench. Learn more about ProQuest TDM Studio's features using their product information guide.

Sign up for TDM Studio

Sign up to use ProQuest TDM Studio. A Cornell University Library staff member will follow up to get you started.

Product feedback

Please contact Iliana Burgos (itb23) or your liaison librarian with any feedback on the product.

Potential benefits and limitations of ProQuest TDM Studio

Potential benefits of this product include:

  • Onboarding training provided by ProQuest staff. Upon registration, you can sign up for a brief introduction session to learn about text data mining with TDM Studio. This makes getting started with text data mining more accessible, no matter how much previous experience you have.
  • Extensive text data mining access to ProQuest content. Our current ProQuest database licenses prohibit text data mining. ProQuest TDM Studio allows researchers to build and computationally analyze large textual datasets of ProQuest database content that is otherwise inaccessible. The Visualization tools allow up to 10,000 documents to be analyzed at once, while the Workbench interface limits analyses to 2 million documents.

Potential limitations include:

  • Content import and export limitations. While ProQuest TDM Studio allows researchers to analyze large textual datasets of ProQuest-provided content, you cannot import text data from other sources to facilitate analyses. You also cannot download the text dataset for analysis outside of the platform.
  • Performance in languages other than English. While ProQuest TDM Studio's Visualization tools seem to be optimized for English-language and romanized texts, there may be limitations for researchers working in languages that do not use Roman script and languages that do not split words on white spaces.
  • Quality control. The extent of quality control (e.g., OCR corrections and deduplication) on the ProQuest corpus is unclear. This may lead to redundancies or errors in the datasets and analyses.
  • Visualization feature tools are not transparent. Because ProQuest TDM Studio's visualization tools are closed proprietary software, the exact methods behind the Visualization tools are unclear. This can significantly impact research reproducibility and model explainability. See Jenkins' (2021) tweet thread for an example of a potentially misleading visualization created from ProQuest TDM Studio's "Geographic Analysis" method.  Researchers should carefully interpret visualizations and take extra steps to validate findings.

Source

Jenkins, K. [@kgjenkins]. (2021, July 22). I just tested ProQuest’s TDM Studio, which has a
“Geographic Analysis” visualization method that automatically extracts placenames from
a set of articles and gives you a map. [Tweet]. Twitter.
https://twitter.com/kgjenkins/status/1418262546289729537

Related reading

Kramer, W., Burgos, I., Muzzall, E., Esten, E., Karajgikar, J., Turnator, E., Shaw, W., Champagne, A., & Truslow, H. (2023). Ivy+ Text Data Mining Education for Advocacy (TEA) Task Force Phase One Report: Actions and Interventions to Address Concerns with Text Data Mining Platforms [Unpublished Manuscript]. Text Education for Advocacy Task Force, IvyPlus Digital Scholarship Affinity Group.