Skip to main content

Linguistics and Language: A Research Guide: Text
Analysis

This is an extensive, annotated list of the print and online resources available for research in linguistics. Click on the TABS to access each Section in this guide.

TEXT ANALYSIS: CORPORA

ARTFL
A Cooperative Project of the Centre National de la Recherche Scientifique and the University of Chicago, ARTFL is a research tool for scholars and students in all areas of French Studies. It evolved from the construction of the dictionary Trésor de la Langue Française: Dictionnaire de la langue du XIXe et du XXe siècle, 1789-1960, publié sous la direction de Paul Imbs, Paul Imbs, ed., Paris: Éditions du Centre national de la recherche scientifique, 1971-1994. 16 volumes. (Olin Reference ++ PC 2625 .I32)

At present, ARTFL's main corpus, ARTFL-FRANTEXT, consists of nearly 3,000 texts, ranging from classic works of French literature to various kinds of non-fiction prose and technical writing. The eighteenth, nineteenth and twentieth centuries are about equally represented, with a smaller selection of seventeenth century texts as well as some medieval and Renaissance texts. In addition to FRANTEXT, ARTFL has built hundreds of databases for researchers and students working in specialized disciplines and languages other than French.

Corpas Na Gaeilge, see the Electronic Text Center information below.
Corpora. Brigham Young University.
Free online searching of thirteen large corpora--collections of words--from Spanish, Portuguese, and various dialects of English. Here is a table of the titles of the corpora, the number of words in each, and the dates covered is here.
Cornell NLP Linguistic Data Resources.
Creates, collects, and distributes speech and text databases, lexicons, and other resources" for research use. Develops tools to collect and organize linguistic data. A membership organization of universities and research laboratories, located at the University of Pennsylvania. Cornell is a member and data sets can be obtained through the NLP Group. More information on this wiki page. "This page is meant to list what corpora, software, and other resources we [NLP] have available and where they are." Cornell only.
Words on Words: Quotations about Language and Languages. Chicago: University of Chicago Press, 2000.
(Olin Reference P 106 .C765 2000)
A collection of several thousand quotations on language and linguistics. Extensively indexed.

THE ELECTRONC TEXT CENTER

THE ELECTRONIC TEXT CENTER (ETC)

The Electronic Text Center, on the first floor of Olin Library, is a laboratory for the use of full-text primary sources in electronic form (such as CD-ROMs and DVDs). The ETC provides access to the library's electronic texts for scholarly textual analysis and editing from a set of dedicated workstations. The Text Center is open during the hours the library is open. Ask at the reference or information desks for assistance. Example of a corpus available on CD-ROM in the ETC:

Corpas Na Gaeilge, 1600-1882 = The Irish Language Corpus. Baile Átha Cliath : Acadamh Ríoga na hÉireann, 2004.
(Olin Reference Disk PB 1345 .C67 2004) Shelved in the Electronic Text Center.
A searchable collection of printed texts in Irish, 1600-1882. Includes 705 texts consisting of prose, poetry, folklore, religious works, historical documents, translations, etc. Also includes an index with frequencies, a reverse index, a custom search facility, and an index nominum of 270,000 place and personal names. (With accompanying user's guide in Irish and English).

Web Accessibility Assistance