- Home
- About Us
- Workshops & Seminars
- Software Help
- Software Access
- Spatial Analysis / GIS
- External Resources
3550 Rackham Building University of Michigan Ann Arbor, MI 48109-1070 cscar@umich.edu
more info
Text Mining with Common Digital Documents
May 21, 2009
Ed Rothman, Eric Rabkin, Danielle Gwinn, Heidi Reichert, Lingling Zhang
The purpose of the workshop is to introduce the possibilities of using quantitative methods to study documents that are typically treated only qualitatively. These statistical methods will facilitate the formulation and exploration of research questions arising from studies of digitized but untagged texts. No prerequisites are required. No previous experiences with text analyses and/or statistical analyses are assumed.
Research and documents from business, law, policy, indeed, all humanistic and social scientific fields rely ever more on quantitative analyses of digitized texts, be they historical archives, legal documents, customer surveys, or poetry collections online. Often the amount of information is enormous, daunting analysts to review by eye or code by hand. Computer-aided methods of processing and analyzing textual data solve this problem and allow previously impossible discoveries. But the methods are unfamiliar to the general audience that needs text mining. This workshop introduces concepts and teaches skills for doing text analyses. It begins with key statistical concepts and graphical tools that are commonly used in analyzing textual data, such as word frequencies, standardized token/type ratio, and relationship statistics on word co-occurrences. The workshop also covers text processing skills, for example, preparing texts for analysis, developing stop-word lists and lemma lists, and porting results for further processing in other programs. We will explore WordSmith (http://www.lexically.net/wordsmith/), a program using a capacious array of text study techniques, for example, producing interactive concordances; constructing concept sets for document analyses; applying statistics to understand definitions, styles, and thematic concerns both at one point and over time. Participants will have a chance to practice these skills using digitized texts provided by the workshop. The workshop will explore and entertain new questions about the Bible, the United States Constitution and its Amendments, and 2008 Democratic and Republican Convention speeches. An important local example, using radiology reports, will illustrate how text mining can be used as an expert system.
- Instructors
- Ed Rothman is the Director of CSCAR and a professor in the department of Statistics
at the University of Michigan. He has extensive experience as a statistical consultant
in studies both on and off campus.
Eric Rabkin is Arthur F. Thurnau Professor of English Language & Literature and co- director of the Genre Evolution Project (GEP: www.umich.edu/~genreevo).
Danielle Gwinn is a GIS consultant at CSCAR.
Heidi Reichert is a statistical consultant at CSCAR.
Lingling Zhang is a lead statistician and consultant at CSCAR. - Audience
- Architects, artists, city planers, humanists, lawyers, linguistics, social scientists, or any qualitative scholar who is interested in doing quantitative analyses of common digitized documents.
- Prerequisite
- No prerequisites are required. No previous experiences with text analyses and/or statistical analyses are assumed.
- Provisions
- The enrollee will receive lecture notes and a glossary of concepts for future reference. Morning refreshments will be served. Break time for lunch (lunch not provided).
- Dates & Times
- Thursday, May 21, 2009, 9:00 AM - 5:00 PM
- Location
- Rackham Bldg, 3rd floor, East Seminar Room.
- Fees
- $150 for University of Michigan affiliated faculty, staff and students
- $325 for others
Registrations on or before May 7, 2009 - $180 for University of Michigan affiliated faculty, staff and students
- $390 for others
Registrations after May 7, 2009 Please make check payable to CSCAR-University of Michigan, or give the University of Michigan Project/Grant or shortcode to be billed. Send check to CSCAR, 3550 Rackham Bldg., University of Michigan, 915 E. Washington St., Ann Arbor, MI, 48109-1070.
- Registration
- Call CSCAR at 734-764-7828. Enrollment is limited.