--

--

CSCAR

 

--

--

 

center for statistical consultation and research

 

Hill Auditorium

--

about us

about us

location

workshops

software help

external resources

spatial

jobs

contact

search

--

The Center For Statistical Consultation and Research
3550 Rackham Building
University of Michigan
Ann Arbor, MI 48109-1070
cscar@umich.edu

.

Contact

 

Text Mining with Common Digital Documents


October 21, 2008

Ed Rothman, Eric Rabkin, Danielle Gwinn, Heidi Reichert, Lingling Zhang


The purpose of the workshop is to introduce the possibilities of using quantitative methods to study documents that are typically treated only qualitatively. These statistical methods will facilitate the formulation and exploration of research questions arising from studies of digitized but untagged texts. No prerequisites are required. No previous experiences with text analyses and/or statistical analyses are assumed.

Research and documents from business, law, policy, indeed, all humanistic and social scientific fields rely ever more on quantitative analyses of digitized texts, be they historical archives, legal documents, customer surveys, or poetry collections online. Often the amount of information is enormous, daunting analysts to review by eye or code by hand. Computer-aided methods of processing and analyzing textual data solve this problem and allow previously impossible discoveries. But the methods are unfamiliar to the general audience that needs text mining. This workshop introduces concepts and teaches skills for doing text analyses. It begins with key statistical concepts and graphical tools that are commonly used in analyzing textual data, such as word frequencies, standardized token/type ratio, and relationship statistics on word co-occurrences. The workshop also covers text processing skills, for example, preparing texts for analysis, developing stop-word lists and lemma lists, and porting results for further processing in other programs. We will explore WordSmith (http://www.lexically.net/wordsmith/), a program using a capacious array of text study techniques, for example, producing interactive concordances; constructing concept sets for document analyses; applying statistics to understand definitions, styles, and thematic concerns both at one point and over time. Participants will have a chance to practice these skills using digitized texts provided by the workshop. The workshop will explore and entertain new questions about the Bible, the United States Constitution and its Amendments, and Lewis Carroll's Alice books.


Instructors:

Ed Rothman is the Director of CSCAR and a professor in the department of Statistics

at the University of Michigan. He has extensive experience as a statistical consultant

in studies both on and off campus.

Eric Rabkin is Arthur F. Thurnau Professor of English Language & Literature and co-

director of the Genre Evolution Project (GEP: www.umich.edu/~genreevo).

Danielle Gwinn is a GIS consultant at CSCAR.

Heidi Reichert is a statistical consultant at CSCAR.

Lingling Zhang is a lead statistician and consultant at CSCAR.

Audience:

        Architects, artists, city planers, humanists, lawyers, linguistics, social scientists, or any

        qualitative scholar who is interested in doing quantitative analyses of common digitized

        documents.

Prerequisite:

        No prerequisites are required. No previous experiences with text analyses and/or statistical

        analyses are assumed.

Provisions:

        The enrollee will receive lecture notes and a glossary of concepts for future reference.

        Morning refreshments will be served. Break time for lunch (lunch not provided).

Date:

Tuesday, October 21, 2008

Time:

9:00 a.m. - 5:00 p.m.

Location:

Rackham Bldg, 2nd floor, North Alcove in the West Study Hall.

Fee:

Registration until October 7, 2008:

$150 for University of Michigan affiliated faculty, staff and students; $325 for others

Registration after October 7, 2008:

$180 for University of Michigan affiliated faculty, staff and students; $390 for others

 

Registration:

Call CSCAR at 734-764-7828. Enrollment is limited. Make check payable to CSCAR-

University of Michigan, or give the University of Michigan Project/Grant or short code to

be billed. Send check to CSCAR, 3550 Rackham Bldg., 915 E. Washington St., Ann

Arbor, MI 48109-1070.

CSCAR Home | About Us | Location | Workshops & Seminars | Software Help | External Resources | Spatial Analysis GIS | Contact Us | Search

 

 

--

 

 

Copyright © 1998 - 2001 The Regents of the University of Michigan, Ann Arbor