Category Archives: Personal Blogs

Meg Williams Bio

Meg Williams is a student at the CUNY Graduate Center Program for Digital Humanities and holds an MFA in Poetry from Hunter College, where she worked as an adjunct lecturer and a substitute administrative coordinator. With this background, Meg explores the intersection of art and technology through digital poetics. She is currently working as a project coordinator for the New York Public Interest Group at Queens College, where she works on issues such as higher education affordability and campus sustainability.

Her main role in the Newbury project is outreach. She is particularly interested in issues around the importance of minority representation, the concept of whiteness, the economy of prizes and their ancient origins. Meg hopes her explorations of these topics will contextualize her groups finding in a critical analysis of the Newbury Prize.

Christofer’s Bio

Christofer Gass is a fulltime student in the Digital Humanities Masters Program at CUNY, The Graduate Center. His capstone project is a text-based game of the 1964-1965 World’s Fair. While an undergrad at Columbus State University he received his B.A. in Art History with a minor in Geography. While attending CSU, Christofer was deeply involved with the Arts in Columbus. As an intern, he worked with the Bo Bartlett Center, the Columbus Museum and CSU’s Illges Gallery. As an employee, he worked for Bo Bartlett Studios and Alan Rothschild’s Do Good Fund, a collection of Contemporary Photography of the American South. As a volunteer, Christofer directed Bartlett’s Home Is Where The Art Is: an art program for the homeless community and assisted Bartlett with Art in Jails: an art program for inmates in Muscogee County Jail. He was also on the board of the Historic District Preservation Society.

Christofer’s role within VRD: Virtual Reconstruction Database is developer and UX designer. In addition to these roles, Christofer will also assist with research and wherever else needed within the group.

Georgette’s Bio

Georgette Keane is the Library and Archives Curator at the American Irish Historical Society. She received her Masters of Library Science and Certificate in Archives and Preservation of Cultural Materials from CUNY Queens College in 2015, and is currently enrolled in the M.A. Digital Humanities Program at The Graduate Center, CUNY. Her research background is in Digital Palimpsests and promoting access to archival collections at historical societies and museums. She has experience with WordPress, Omeka, ArchivesSpace, and CONTENTdm.

Georgette serves as the Project Manager, which includes managing the Newbery project and creating a detailed project schedule to ensure completion by the established deadline. She is also responsible for providing support to the other team members, including assisting with data collection and establishing connections with children’s literature professionals and librarians.

Kelly’s Bio

Kelly Hammond is pursuing a master’s in Digital Humanities at CUNY’s Graduate Center. She has spent the last two decades integrating technology into humanities curricula in secondary schools, first at the Cincinnati Country Day School, where she served as the Dean of Studies, and more recently at the Chapin School in New York City as head of the Humanities Department. While at CUNY, she has focused on coding and data visualization. Her html and css work include a TWINE game focused on the publication history of Charlotte Perkins Gilman’s “The Yellow Wall-Paper” and a network-independent website intended to teach digital humanities to incarcerated citizens pursuing a college degree. Her data visualization projects include an interactive in Tableau investigating the authors and publishers behind the last decade of New York Times hardcover bestsellers. While an undergraduate at Amherst College, she investigated the works of W.E.B. Du Bois, initiating a lifetime of interest in the story of race in America, especially since Reconstruction—a passion that drew her to this project in the first place.

Kelly’s responsibilities for this project include automating data collection where possible with Python, cleaning data for analysis, and creating an interactive visualization of the data in Tableau. As an educator, she is keenly invested in this project, as her own diverse students and their parents are often steered to award-winning books.

Kelly’s Reflections: Week of 2/18 – 2/25

This week started off with a bang: I successfully built a program in Python that scraped Newbery Honors winners from a website and dumped them into a .csv file that I could, in turn, dump into Google Sheets for the team to investigate.

As has been the case with much of my work in DH so far, the breakthrough was made possible by a combination of online tutorials, trial and error, and logic. While I’d consulted tutorials last week, the majority of them walked me through sample scrapes that were unique to the site being scraped. Monday, though, I found a good introductory tutorial that focused on the big picture of web scraping as it worked through its sample rather than merely giving me code. The tutorial stressed how essential it is to study the architecture of the HTML you want to scrape first—to be able to articulate the sequence of the patterns of code that can isolate the data you need.

In the case of the site I found, the HTML was fairly basic and ambiguous. Instead of using helpful designations such as consistent and unique spans and classes to indicate titles, authors, and dates, the code organized all of the information on the page—introduction, visuals, and Honor book data alike—in rows in a table. So, I had to first use trial and error to isolate the rows that contained the book data, which turned out to be the range of cells [-323:-2]. (Yes, I realize that my range could have been positive, but I thought it easiest to work backward, since the book data extended to nearly the bottom of the page.) The dates of the Honors were tucked, fortunately, in separate cells in the table with a class designation of <td class=”order”>, so that was easy enough to scrape. But both the title and the author for each book were nestled in one table cell for each book, and each had its own anchor tag, which made for tricky disambiguation.

Fortunately, I’d seen Patrick Smyth in a workshop last year parse text by a specific word—maybe it was bylines in newspaper articles, though I don’t really recall. So, employing a bit of logic, I extracted the text of the <td> cells without the class of “order,” and then split that text at the “ by ” juncture, assigning the text prior to the it ([0]) as the title, and after it ([-1]) to the author. Worked like a charm. The rest of the code, which was easy to adapt from the tutorial, turned all of that into a dataframe (a new term for me!) suitable for exportation to a .csv file.

I’m particularly proud of the program for two reasons. First, it is only 17 lines of actual code—far more efficient than any other Python program I’ve written. Second, I employed a trick I learned in Patrick’s software lab last spring: writing comments as a way to solidify my own understanding of my code. So, I feel I am ready to continue to scrape sites as our crew looks at other award winners. (This practice proved to have extra benefits, as our team is interested in learning from each other, and Emily happily mentioned in Tuesday’s class that the comments helped her understand the code too.)

After generating the .csv file, I cleaned the data as I had done with Georgette’s set last week. Again, I saw and documented a slew of potential issues that might prove tricky as we bring this data into Tableau, such as accent marks in author names and even an obviously erroneous attribution of an author as “See and Read” rather than Miska Miles. With both data sets on the Medal winners and Honors winners cleaned, I popped them into Google Sheets for us all to explore and expand. The rest of the week was spent adding to the data set beyond what scraping can do, as Emily and I began to investigate the identities of the books’ protagonists as well as the identities of the Honor authors.

As has been my experience with all data gathering, the realities of the data revealed limitations in our spreadsheet. For example, we didn’t have a uniform approach to protagonists who weren’t human (some were animals, some animals’ nationalities actually mattered in the story, one protagonist was a steam engine, and one story revolved around a family of dolls). Nor did we have a way to indicate multiple protagonists, as is the case for collections of stories or novels with sibling pairs or families in main roles.

As our team discussed in both of our Skype sessions so far, the most important task this upcoming week is to resolve how to deduce, ascribe, and name the identity markers we have already begun to record. As my own students have been studying the Harlem Renaissance this winter, I am reminded that the inaugural year of the Newbery Awards came on the heels of the two-year experiment of the Brownies’ Book—a magazine created by W.E.B. Du Bois and Jessie Fauset to provide black American kids a way to see themselves in print, countering either the predominant white faces normalized in children’s literature of the time or the stereotypes of nonwhite children that abounded. (In fact, I found a 1919 letter from Du Bois responding himself to an inquiry about the use of racial terms in the Brownies’ Book in which he writes, “So long as the masses of educated people are agreed upon the significance of a word, it is impossible for you or me to ignore it, simply because we do not like it.”) So, Georgette will reach out to a host of experts in addition to the help she’s already found from founder of the Diverse Book Finder, and Meg and I will scour the online literature in hopes of additional clarity as we move forward.

As I’ve learned that the Brownies’ Book is due for rerelease this year, in honor of the 100 years since its initial publication, our work seems very timely.

1920 Brownies' Book

Inside cover of the 1920s edition of The Brownies Book. From https://www.loc.gov/item/22001351/

Bio – Emily

Emily Maanum is currently a student in the M.A. Digital Humanities (DH) program at the CUNY Graduate Center. She has a B.A. in History, with minors in Anthropology and Film Studies, from the University of Richmond. She recently began her journey into the field of DH. Her research interests include GIS mapping with historical maps, British history, specifically the creation of the British welfare state, and learning new DH tools. For the Newbery book award project, Emily serves as Designer/User Experience. In conjunction with the Outreach Coordinator, as well as the rest of the Newbery team, Emily will develop a website that is useful and informative for the project’s audience.

Kelly’s Reflections: Week of 2/11 – 2/18

This week for me was all about orienting myself in greater detail to the scope of the project and my role in it. As our team edited the proposal via Google Docs, I gained a greater appreciation for the potential good our project could do. Georgette had referenced in the proposal a great data repository that could expand the scope of our work: The Database of Award-Winning Children’s Literature (DAWCL). Her find reminds me of one of the many benefits of teamwork. As a non-librarian, I easily might have overlooked the site myself, as its design is a bit dated and amateurish by today’s standards. Yet, a little research confirmed that the information is thorough, accurate, and up-to-date—a real treasure.

As the programmer for our team, I wanted to figure out this week how we might automate data gathering with Python. Georgette had already gathered some initial data on the 98 Newbery Medal winners themselves. DAWCL offers additional data we hope to investigate such as the 400+ titles designated as Newbery Honorees, as well as avenues for broader analysis, such as the other awards Newbery books have won and award-winning titles that haven’t received Newbery recognition.

New to scraping websites with Python, I failed with the DAWCL spectacularly. But, I learned a ton in the process, and I hope to make some breakthroughs in the coming week. The DAWCL makes it easy to sift through thousands of children’s books by award. Yet, it returns its search results via an ASP, so Python can’t simply request the page contents by using the URL. After combing the web for help, I learned to leverage Chrome’s developer tools to dig beyond the first layer of code usually revealed by the Inspect command. I was able, ultimately, to follow the network requests made by the search form as I performed the search for Newbery winners (very cool), and I finally found the html behind the displayed search results. The html is not terribly sophisticated, which actually isn’t great as far as scraping goes. I’d have preferred designated classes for titles, authors, other awards, rather than just paragraph and break tags. So, this week, I’ll need to get creative, either treating the code I found like text and using Python parse it or changing tools to scrape from the ASP.

Lest I held the project up with that exploration, I turned to data cleaning, making some tweaks to the original set that Georgette posted, in hopes of showing up to our first Skype session with something concrete to offer. First, I split author names using Excel’s text-to-columns feature, so that ultimately our users could investigate by an author’s first or last name. Second, using what I learned from a data viz investigation of authors of New York Times bestsellers, I made a note of data that may cause trouble down the road, such as the tilde in Matt de la Peña’s name or the accent grave in William Pène du Bois’s. I also noted ethical issues we’ll need to tackle: keeping track of our data sources and having discussions about what constitutes ethnicity. Not only are these decisions essential to make in the early stages, but they are important to document and convey to our ultimate users.

Having wrapped up our first Skype session earlier this evening, I’m heading into the second week with great optimism. Each member of our team is elevating my thinking about our approach and is spurring me on to find an expedient way to get our data!

NYCDH Week – Tome Collaborative Course Publications

Friday morning, I attended the Tome Collaborative Course Publication workshop which was part of NYCDH week in New York City. The Workshop was led by Alexei Taylor, a digital creative and instructor at NYU.  Tome is a digital publication platform created for academic publishing which can be designated as a personal blog or a collaborative workspace for academic projects. Tome was built as an easier to use WordPress platform  and is often used by academics for publishing and course development.

For graduate and undergraduates, one of the biggest challenges they face from the pedagogical model is that often classroom projects are created without thinking about the potential audience beyond the classroom. This model incentives students working in isolation and only having a real academic interaction with the professor and not their peers. Another challenge is that students often don’t have a well-kept record of their academic accomplishments other than the course credit they receive. Alexei build tome with the student in mind. Tome really helps students think of themselves as public content creators, writers, and project builders. Students can use tome for assembling a personal portfolio of their work and use it as a collaborative platform for building a project with peers.  One of the projects that I was shown as an example was a creative writing course that was published as a digital anthology. Every student had their own page and their own page design but they were part of the same tome project which could be navigated from page to page.

Tome has many features that makes it very useful for people interested in publishing who may not have the skills to build their own platform. When you register for a tome account, you get a link to the front page which has a default minimal look with a small menu of a welcome page, a gallery, syllabus, and bibliography. You have the ability to delete any of the pages you don’t want through the back page. Tome assumes you may need certain pages for publishing.  Through the back page you have the ability to add users to your project, create new pages, add content, edit code to customize the look of your Tome, control your analytics, and many more features. Tome makes it very easy to annotate and cite your work, offering many ways to add endnotes, links, captions and descriptions for borrowed materials. In addition, there are many formatting tools to tailor your work to the look you want it to have.

As I was learning about Tome and all the different publishing features it has, I thought about the project I am currently a part of and how Tome could potentially be used to host our project. This is a new tool that I will definitely be sharing with my project team. It is super easy to learn and super easy to use.

NYCDH Week- SpatioScholar Workshop

On Friday afternoon, I attended the SpatioScholar (Unity) workshop at NYU. SpatioScholar is an application, built in Unity, developed by a group of academics for scholarly work that requires 3-D and time-based processing and visualization. This tool allows scholars to explore spatial and temporal datasets with a unique set of functionalities. One of its most useful features is a timeline slider; it demonstrates change over time, showing how a certain building or location evolved over a set period of time. Additionally, through the simulation feature, the viewer can experience the space in first-person. They are dropped into the middle of the model and can control which direction to walk in order to explore the model.  Another beneficial feature is the ability to connect primary materials to the model. Viewers can browse information like photos, drawings or textual documents that relate to a specific location in time. Finally, SpatioScholar lets viewers and users leave notes and view others’ notes.

I have never worked with 3-D models before so it was very cool to see how it can be rendered and manipulated in Unity. During the workshop we imported a generic model that the instructors created. We then added a timeline bar to the model; as you slid along the timeline, buildings appeared and disappeared over time. Next, we explored the different ways to view the model. When you are working in the edit mode, you can zoom in and out of the model and spin it around to view from different angles. Entering into the simulation mode, we were dropped into the model itself and had the ability to walk around and view the model in first-person. While in the first-person view, you can use the timeline bar to move across time to see how the space changes. We then learned how to view and add notes to the model. You can pin a note to a specific place or building as well as add to a specific time. Finally, we imported some primary material. Like the notes, you can pin photos or documents to specific buildings or to a specific time period. We pinned a few photos to a specific building and as we slid the timeline bar along, the photos appeared and disappeared with the building.

Overall SpatioScholar was pretty easy to use once I understood the type of data we were importing and how it corresponded to the data fields within Unity. The instructors also showed us some finished models created with this tool and that helped solidify the possibilities and capabilities available with this application.

Getting Started with TEI

For DH week, I attended the ‘Getting Started with TEI’ workshop hosted by Filipa Calado at the Grad Center. TEI, or Text Encoding Initiative is a markup schema for representing the structural, renditional, and conceptual features of texts. For anyone familiar with HTML, the two look similar. However, HTML encodes how a text should appear on a page while TEI encodes the context of a text. Filipa gave a brief introduction to TEI, the guidelines for using it, and then we practiced encoding with pages from the Picture of Dorian Gray manuscript. 

Since TEI has its roots in XML, many of the same rules in XML apply when encoding in TEI, such as  proper nesting. Meaning, you must always close the last tag you opened before closing a tag you opened previous to the last one. Ex: <sentence> <emphasis> </emphasis> </sentence>. Structurally, every TEI document begins with an XML declaration, or DTD (Document Type Definition). This declaration is necessary for a computer to read TEI. TEI documents will consist of two parts, the Head and the Body. The Head describes the source text’s metadata and includes the following elements: <TEI>, <teiHeader>, <fileDesc>, <titleStmt>, <title>, <publicationStmt>, and <sourceDesc>. The Body is the main section of the document and is what you will see when the TEI document is transformed. The Body section begins with <sourceDoc> and will feature such elements as <add> <del> and <line>.

Although it does sound like a lot of work, the end product is definitely worth it. Filipa shared pages from Mary Shelley’s Frankenstein manuscript from the Shelley-Godwin Archive. Using TEI, the project team was able to decipher Mary Shelley’s draft, while including Percy Shelley’s revisions in red. 

Before starting an encoding project, Filipia advises that it is important to think about your goals of the project and consider not just the document, but your audience, your research goals, and how you wish to represent textual data. She offered some guiding questions from the Women Writers Project (WWP) for preliminary document and project analysis:

  1. As far as you can tell, how is the document structured?
  2. Are there any kinds of regularization or editorial amendment you will perform as you transcribe the text?
  3. How much information about the appearance of the document do you need to capture?

Keeping these questions in mind, we then evaluated pages from the Picture of Dorian Gray manuscript, and tried to encode pages 20 and 21 in a text editor. We uploaded to a Google Drive and compared our results. I really enjoyed this process, but found myself getting stuck on the crossed out sections, trying to decipher the writing underneath. Something that Filipa recommended to me was to use the ‘<gapreason=’illegible’>’ when encoding. Filipa then showed the section she has been working on to check our work. There were a number of revisions on the page, and Filipa was able to get most of them. Some writing is still illegible and may need to undergo a similar process that palimpsests do to decipher the text.

If anyone is interested in TEI, Filipa shared a number of links, like the DARC wiki on Text Editing, and The Letters of Vincent van Gogh project.