For NYCDH Week, I went to a workshop on literary cluster analysis at NYU. Never having an opportunity to do cluster visualizations with literary texts (I only have had the opportunity to experiment with some language data from undergrad), I was rather interested in some of the projects that the workshop surveyed. Some particular standouts were: stylistic similarities between Fielding’s parodies and Richardson’s novels, language preferences across ghost writers in children’s literature, and deviations from style when James switched to dictating his novels. My curiosity was also piqued when we discussed analytic techniques to compare topic preferences of two groups against each other, as there seems an interesting possibility to incorporate the technique into the Newbery project as a stretch goal. I’m interested in seeing how topic preferences vary between those novels that feature diverse characters and those that trend towards white only characters. While we will likely not have the time to gather data from every nominee under discussion, I’m wondering if we can find the time to take a look at simply the summaries provided by the Newbery group for award winners.
Another point that came up to me upon reflection is the question of proprietary software in project design. Strangely, while the workshop presented a host of open source tools that could be used, we only practiced with proprietary software (IBM origin to be specific). While the results were fun to produce, I couldn’t get a bad taste out of my mouth regarding a need to learn these techniques all over again through a different platform for those participants that lacked reliable access. As well, I kept reflecting on how ugly some of the platforms we were using presented themselves as- in spite of their proprietary nature. It appeared as though we were working in a sphere where industry dominance leads to an immediate disregard for intuitability and design, despite such aspects being the ideal selling point over open source alternatives.