An intro to text mining

Candace, 11 October 2010, 3 comments
Categories: code, Uncategorized

I started my THATcamp Bay Area weekend in a bootcamp session on Text Mining with Aditi Muralidharan, a graduate student at UC Berkely. (@silverasm & http://mininghumanities.com). Links to the slides from the session are here. The session was geared for people who collect the data then ask “what do I do with all this stuff?!?” This definitely describes me. I have hours and hours of collected oral histories plus a few diaries and log books I’d love to analyze.

I’ve never done anything remotely close to text mining, which is why I attended this session. Here’s what I learned:

A variety of tools were suggested:

An example of text-mining an historical diary done by Cameron Blevins @historying at StanfordU:

Some limitations of text mining:

I’ll be working through this list (just as soon as I get my text in a digital format that can be processed).

Comments

3 Responses, Leave a Reply
  1. […] have notes for Text mining, Organizing an Unconference, Augmented Reality 4 Poets, Google Fusion Tables, and some […]

  2. […] Candace Nast […]

  3. Don Sawtelle
    12 October 2010, 2:32 pm

    A different but complementary tool is the Winnow classifier. You can try it by using http://winnowtag.org, which uses Winnow to create smart tags to find related items even when the amount of content is very large. You can create and share your own smart tags at winnowtag.org, and you can use Winnow directly in your own projects.

    winnowTag.org downloads and tags 7,500 feeds daily and keeps the items for three months, thus currently has about 700,000 items on a huge variety of topics. So winnowTag.org shows the accuracy and performance of the recommendations made by the Winnow classifier (a naive Bayesian variant). Here are a couple of illustrative tags:

    entomology: http://winnowtag.org/#mode=all&tag_ids=468

    space: http://winnowtag.org/#mode=all&tag_ids=11

    A higher number at the left of an item means the Winnow classifier is more certain that item is a correct match for the selected tag. winnowtag.org features are explained in the Help tab of http://doc.winnowtag.org, and http://doc.winnowtag.org/open-source has info on Winnow.

Leave a Reply:

Name *

Mail (hidden) *

Website