Finding Electronic Texts

The best source for e-texts depends on the nature of the text you seek and who you are. A superb place for University of Michigan people to look for high quality texts for research in the humanities is the Humanities Text Initiative (http://www.hti.umich.edu/) which is part of the U-M Digital Library which in some senses is the world's largest digital library. HTI not only helps you locate texts by title but allows sophisticated searches both within and among texts. You can, say, set up proximity searches (example: find places in the King James Bible where "heart" is within 80 characters of "hand") or Boolean searches (example: find English language works published between 1750 and 1780 that use the words "heart" and "prejudice" within the same paragraph). First time visitors to HTI should make sure to read the opening page because it gives information about there being both publicly accessible texts and those additional texts available only to subscribers, which includes all members of the U-M community.

(If you are a member of the U-M community, you can also arrange for datasets through HathiTrust. For details, see the HathiTrust website datasets page: http://www.hathitrust.org/hathitrust_datasets.)

If you're after a particular book or journal title or a particular book author, you also can use Mirlyn (http://mirlyn.lib.umich.edu), the general U-M Library online catalog. Look for the lightning bolt icon on the left of the entry to indicate a live link to an electronic resource. By the way, when visiting Mirlyn, if you click on the Find Articles or Find Databases link at the top of the page, they will take you to search pages that will open up many other resources including EEBO (Early English Books Online), ARTFL for French language texts, and JStor for thousands of academic articles..

An excellent off-campus place to look for high quality humanities e-texts is the Electronic Text Center at the University of Virginia (http://etext.lib.virginia.edu/) but, again, there is a difference between what is available to the public and to subscribers, so I always start with U-M.

If U-M fails me, that may be because although a public text is available, our Library can't vouch for its quality. However, sometimes even an unverifiable text will serve our needs of the moment. A huge source of public e-texts is Project Gutenberg (http://www.gutenberg.org/), but, because the texts are keyed in by volunteers, they are not as editorially reliable as the HTI texts.

The Alex Catalog (http://www.infomotions.com/alex/) is a good search engine specifically for e-texts and its newer version allows searching within the texts.

There are also many special archives and repositories, such as the Rossetti Archive (http://www.rossettiarchive.org/) at the University of Virginia and American Memory (http://memory.loc.gov/ammem/), materials for American history and culture, at the Library of Congress (http://loc.gov), which also provides access to other electronic collections.

Perhaps the best foreign English-language source is the Oxford Text Archive (http://ota.ahds.ac.uk/). However, I rarely consult it because our Library, in addition to being a leading producer of e-texts, buys or licenses texts from OTA. If OTA has something non-public that I want, I usually find I can already get it from U-M.

If none of this works, you can also do a general Internet search for "electronic text sources" and you'll find many other useful guides, such as the University of Chicago Library's EFTS (Electronic Full-Text Sources). Many of these, like the University of Virginia's, have both public and restricted access. However, once you know a text exists, if you can't access it, you can request that our Library acquire it and sometimes you can buy it yourself.

If you want an e-text because it is highly manipulable (for example, doing simple word searches) and none seems to exist, you can, of course, scan in a paper copy. Groundworks (http://www.dc.umich.edu/groundworks/) in the Duderstadt Center on North Campus has equipment accessible twenty-four hours a day for that purpose and consultants who can help you master the equipment many hours of the week. READIRIS, a program available on all U-M's public sites, offers excellent OCR (Optical Character Recognition) for turning many types of sources (text images, scanned pages, pdf's, and so on) into digital text. If you are not on campus, Readiris can be accessed through virtualsites.umich.edu.

Some text analysis software, such as WordSmith, includes text-gathering features. WordSmith, for example, has a WebGetter tool that searches for and saves web pages, strip out their unwanted mark-up, and creates from them a text file. Many other programs, including high-end word processors and stand-alone text editors, can be used with wild card characters to find-and-replace mark-up, thus leaving nothing but pure digital text.

Copyright © 2005-2009 Eric S. Rabkin This page last modified