- Home
- About Us
- Workshops & Seminars
- Software Help
- Software Access
- Spatial Analysis / GIS
- External Resources
3550 Rackham Building University of Michigan Ann Arbor, MI 48109-1070
more contact info
Text Mining with Common Digital Documents
Wednesday and Thursday, May 18 and 19, 2011
Ed Rothman, Eric Rabkin, Giselle Kolenic, TBA
This workshop introduces the possibilities for using quantitative methods to study documents that are typically treated only qualitatively, such as novels, newspaper articles, judicial opinions, and web pages. These statistical methods will facilitate the formulation and exploration of research questions arising from studies of digitized but untagged texts. No prerequisites are required. No previous experiences with text analyses and/or statistical analyses are assumed.
Research and documents from business, law, policy, indeed, all humanistic and social scientific fields rely ever more on quantitative analyses of digitized texts, be they historical archives, legal documents, customer surveys, or poetry collections online. Often the amount of information is enormous, daunting analysts to review by eye or code by hand. Computer-aided methods of processing and analyzing textual data solve this problem and allow previously impossible discoveries. But the methods are unfamiliar to most people who could make good use of text mining. This workshop introduces concepts and teaches skills for mining and analyzing texts. It begins with key statistical concepts and graphical tools that are commonly used in analyzing textual data, such as word frequencies, standardized token/type ratio, and relationship statistics on word co-occurrences. The workshop also covers text processing skills, for example, preparing texts for analysis, developing stop-word lists and lemma (head-word) lists, and porting results for further processing in other programs. We will explore WordSmith (http://www.lexically.net/wordsmith/), a program using a capacious array of text study techniques, for example, producing interactive concordances; constructing concept sets for document and corpus analyses; applying statistics to understand definitions, styles, and thematic concerns both at one point and over time. Participants will have a chance to practice these skills using digitized texts provided by the workshop. The workshop will explore and entertain new questions about the Bible, the United States Constitution and its Amendments, and 2008 Democratic and Republican Convention speeches. An important local example, using radiology reports, will illustrate how text mining can be used as an expert system.
- Instructors
- Ed Rothman is the Director of CSCAR and a professor in the department of Statistics
at the University of Michigan. He has extensive experience as a statistical consultant
in studies both on and off campus.
Eric Rabkin is Arthur F. Thurnau Professor of English Language & Literature and co- director of the Genre Evolution Project (GEP: www.umich.edu/~genreevo).
Giselle Kolenic is a statistical consultant at CSCAR.
TBA is a statistical consultant at CSCAR.
- Audience
- Architects, artists, city planers, humanists, lawyers, linguistics, social scientists, or any qualitative scholar who is interested in doing quantitative analyses of common digitized documents.
Prerequisite- No prerequisites are required. No previous experiences with text analyses and/or statistical analyses are assumed.
Provisions- The enrollee will receive lecture notes and a glossary of concepts for future reference. Refreshments will be served.
Dates & Times- Thursday, May 18 and 19, 2011, 1:00 PM - 5:00 PM
Location- Modern Language Building, Room 2001A
- Fees
-
- $180 for University of Michigan affiliated faculty, staff and students
- $400 for others
Registrations on or before May 4, 2010 - $215 for University of Michigan affiliated faculty, staff and students
- $500 for others
Registrations after May 4, 2010 Please make check payable to CSCAR-University of Michigan, or give the University of Michigan Project/Grant or short code to be billed. Send check to CSCAR, 3550 Rackham Bldg., University of Michigan, 915 E. Washington St., Ann Arbor, MI, 48109-1070.
- Registration
- Call CSCAR at 734-764-7828. Enrollment is limited.