Week 10: Data Mining/Distant Reading

So far I think this week’s readings have left me feeling more outside the circle than any other course readings so far – and whether or not that’s an apt spatial analogy according to Franco Moretti, I have no idea.

I read Graphs Maps Tress. I thought about it. I even think I agree with parts of it. For example, I think I was most plugged into the section with Maps. Graphs always feel like a loss to me, whether presented in typical or novel usage. I see numbers and something in my brain (or my psyche) just refuses to participate. But visually mapping concepts in a novel then comparing these maps to show change in literature over time was pretty impressive. Similarly, I appreciated the way Moretti used “genetic” or “evolutionary” trees to investigate how genre conventions are determined over time. Working in a used bookstore I got real up close and personal with the unbelievable horde of books that didn’t pass whatever test is set for success. After you finish crying because you have to alphabetize and shelve ten flats (roughly 400-500 mass market paperbacks) in 2 hours, you wonder what it is exactly that allows so few books to pass the test. What Moretti has done here is try to answer that question, and in doing so, tried to nail down successful genre conventions. It was interesting that Moretti chose the genre of mystery to put into a tree because it was one of the genres we were always most bloated with.

In my view, Moretti’s model doesn’t necessarily provide for explanations of those changes, but simply tracks and displays them. The afterward got into this a little bit, and I suppose this is where the historian comes in: our job is to contextualize and interpret the data. What Moretti is doing is extracting it and presenting it in such a way that makes it easier find, read, and study.

As I write this evaluation of Moretti’s book I’m trying to decide how it can be tied to our other course readings and class discussions as a whole. For this week, the other readings discuss the creation of tools that can aggregate and display day in ways useful to historians, as Moretti has done. Graphs, Maps, and Trees are all tools (that he proposes), and similarly Dan or Google are working on ways to collect, mine, and display existing data in a way that makes it more accessible to researchers.

My question then becomes: what are the limitations of these presentations? The authors of our readings themselves struggle with this question. Dan Cohen breaks down the pros and cons of the Google N-Gram Viewer. I think he suggests that this tool is struggling with quantity vs. quality – it is a great way to find frequency of terms but it doesn’t contextualize them for us. When I was browsing the tools and sites for this week I was dismayed by how difficult it was to figure out how to use some of the tools (thought in many cases that might be more of a reflection on me). Toolmakers then also struggle with useability, and in this brave new world of digital humanities, user knowledge probably hasn’t caught up yet to the level of digital literacy necessary to operate a useable tool.

So our goal as researchers and explorers as we use these new tools is always to remember the limitations of the forms the tools come to us in, and the way they choose to present data. Because this presentation will shape our research and conclusions just as much as the data.

About The Author

Claire

Other posts by

Author his web site

31

10 2011

Your Comment