Kelly’s Bio

Kelly Hammond is pursuing a master’s in Digital Humanities at CUNY’s Graduate Center. She has spent the last two decades integrating technology into humanities curricula in secondary schools, first at the Cincinnati Country Day School, where she served as the Dean of Studies, and more recently at the Chapin School in New York City as head of the Humanities Department. While at CUNY, she has focused on coding and data visualization. Her html and css work include a TWINE game focused on the publication history of Charlotte Perkins Gilman’s “The Yellow Wall-Paper” and a network-independent website intended to teach digital humanities to incarcerated citizens pursuing a college degree. Her data visualization projects include an interactive in Tableau investigating the authors and publishers behind the last decade of New York Times hardcover bestsellers. While an undergraduate at Amherst College, she investigated the works of W.E.B. Du Bois, initiating a lifetime of interest in the story of race in America, especially since Reconstruction—a passion that drew her to this project in the first place.

Kelly’s responsibilities for this project include automating data collection where possible with Python, cleaning data for analysis, and creating an interactive visualization of the data in Tableau. As an educator, she is keenly invested in this project, as her own diverse students and their parents are often steered to award-winning books.

Kelly’s Reflections: Week of 2/18 – 2/25

This week started off with a bang: I successfully built a program in Python that scraped Newbery Honors winners from a website and dumped them into a .csv file that I could, in turn, dump into Google Sheets for the team to investigate.

As has been the case with much of my work in DH so far, the breakthrough was made possible by a combination of online tutorials, trial and error, and logic. While I’d consulted tutorials last week, the majority of them walked me through sample scrapes that were unique to the site being scraped. Monday, though, I found a good introductory tutorial that focused on the big picture of web scraping as it worked through its sample rather than merely giving me code. The tutorial stressed how essential it is to study the architecture of the HTML you want to scrape first—to be able to articulate the sequence of the patterns of code that can isolate the data you need.

In the case of the site I found, the HTML was fairly basic and ambiguous. Instead of using helpful designations such as consistent and unique spans and classes to indicate titles, authors, and dates, the code organized all of the information on the page—introduction, visuals, and Honor book data alike—in rows in a table. So, I had to first use trial and error to isolate the rows that contained the book data, which turned out to be the range of cells [-323:-2]. (Yes, I realize that my range could have been positive, but I thought it easiest to work backward, since the book data extended to nearly the bottom of the page.) The dates of the Honors were tucked, fortunately, in separate cells in the table with a class designation of <td class=”order”>, so that was easy enough to scrape. But both the title and the author for each book were nestled in one table cell for each book, and each had its own anchor tag, which made for tricky disambiguation.

Fortunately, I’d seen Patrick Smyth in a workshop last year parse text by a specific word—maybe it was bylines in newspaper articles, though I don’t really recall. So, employing a bit of logic, I extracted the text of the <td> cells without the class of “order,” and then split that text at the “ by ” juncture, assigning the text prior to the it ([0]) as the title, and after it ([-1]) to the author. Worked like a charm. The rest of the code, which was easy to adapt from the tutorial, turned all of that into a dataframe (a new term for me!) suitable for exportation to a .csv file.

I’m particularly proud of the program for two reasons. First, it is only 17 lines of actual code—far more efficient than any other Python program I’ve written. Second, I employed a trick I learned in Patrick’s software lab last spring: writing comments as a way to solidify my own understanding of my code. So, I feel I am ready to continue to scrape sites as our crew looks at other award winners. (This practice proved to have extra benefits, as our team is interested in learning from each other, and Emily happily mentioned in Tuesday’s class that the comments helped her understand the code too.)

After generating the .csv file, I cleaned the data as I had done with Georgette’s set last week. Again, I saw and documented a slew of potential issues that might prove tricky as we bring this data into Tableau, such as accent marks in author names and even an obviously erroneous attribution of an author as “See and Read” rather than Miska Miles. With both data sets on the Medal winners and Honors winners cleaned, I popped them into Google Sheets for us all to explore and expand. The rest of the week was spent adding to the data set beyond what scraping can do, as Emily and I began to investigate the identities of the books’ protagonists as well as the identities of the Honor authors.

As has been my experience with all data gathering, the realities of the data revealed limitations in our spreadsheet. For example, we didn’t have a uniform approach to protagonists who weren’t human (some were animals, some animals’ nationalities actually mattered in the story, one protagonist was a steam engine, and one story revolved around a family of dolls). Nor did we have a way to indicate multiple protagonists, as is the case for collections of stories or novels with sibling pairs or families in main roles.

As our team discussed in both of our Skype sessions so far, the most important task this upcoming week is to resolve how to deduce, ascribe, and name the identity markers we have already begun to record. As my own students have been studying the Harlem Renaissance this winter, I am reminded that the inaugural year of the Newbery Awards came on the heels of the two-year experiment of the Brownies’ Book—a magazine created by W.E.B. Du Bois and Jessie Fauset to provide black American kids a way to see themselves in print, countering either the predominant white faces normalized in children’s literature of the time or the stereotypes of nonwhite children that abounded. (In fact, I found a 1919 letter from Du Bois responding himself to an inquiry about the use of racial terms in the Brownies’ Book in which he writes, “So long as the masses of educated people are agreed upon the significance of a word, it is impossible for you or me to ignore it, simply because we do not like it.”) So, Georgette will reach out to a host of experts in addition to the help she’s already found from founder of the Diverse Book Finder, and Meg and I will scour the online literature in hopes of additional clarity as we move forward.

As I’ve learned that the Brownies’ Book is due for rerelease this year, in honor of the 100 years since its initial publication, our work seems very timely.

Inside cover of the 1920s edition of The Brownies Book. From https://www.loc.gov/item/22001351/

Bio – Emily

Emily Maanum is currently a student in the M.A. Digital Humanities (DH) program at the CUNY Graduate Center. She has a B.A. in History, with minors in Anthropology and Film Studies, from the University of Richmond. She recently began her journey into the field of DH. Her research interests include GIS mapping with historical maps, British history, specifically the creation of the British welfare state, and learning new DH tools. For the Newbery book award project, Emily serves as Designer/User Experience. In conjunction with the Outreach Coordinator, as well as the rest of the Newbery team, Emily will develop a website that is useful and informative for the project’s audience.

Newbery Work Plan

The goal of this project is to collect the biographical and subject matter of the Newbery Medal and Honor Books to determine if there is an accurate representation of diversity amongst the honorees. The project will consist of four stages: gathering the data, organizing the data into the pre-approved format, analyzing the data using the visualization software Tableau Public, and then disseminating the results by creating a website dedicated to displaying our research and findings. Gathering the data from the four hundred and fifteen Newbery books will take the longest and involve the entire team. With guidance from the project director, the team will organize the data into the following categories: Year; Winner/Honor; Title; Author; Author’s Gender, Race, and Ethnicity; Protagonist’s Gender, Race and Ethnicity; and Themes. The first four categories are available on the ALA’s Newbery site. The team will find the author’s gender, race, and ethnicity on the authors’ websites, publishers’ sites, or an internet search (author interviews, etc.). The books’ protagonists and themes will be found with the Library of Congress’ and New York Public Library’s bibliographic records. The programmer will expand her understanding of Python to scrape data where possible.

The team will then organize and input the data using a pre-approved format into an Excel spreadsheet. The third stage is to enter all data into Tableau Public and create visualizations and analyze the findings. The final stage will include the creation of a user-friendly website to display an interactive visualization for those who wish to explore, along with recommended readings to help users think more critically about the role of awards in children’s literature. In addition, the designer and outreach specialist will lead the team in creating a shareable graphic for teachers and librarians that capture the most provocative of their findings. The project team will also submit proposals to present their findings at the American Library Association’s Annual Conference and the Association for Library Service to Children’s Midwinter Meetings and National Institute.

Revised Proposal-Diversity in Newbery Honorees

Abstract
Literacy is essential to a child’s development. Through reading, children expand not just their vocabulary but their understanding of the world around them. But can children really learn from books if a majority of groups and topics are misrepresented or ignored? Recent studies have shown that there is a lack of diversity in children’s books. And while there have been initiatives created to address this issue, the fact that children do not have access to all of these books is something to consider. But what about the books they do have access to?
This project will explore diversity in the most popular children’s literature books, the Newbery Medal and Honor Books. Data collected from the four hundred and fifteen Newbery Books will seek to answer the following questions: Do the Newbery Medal and Honor Books provide an accurate representation of diverse backgrounds and subject matter? If so, has this been a recent development? And are there any trends of note in the honorees? The project team will attempt to answer these questions by collecting the biographical data and subject matter of all four-hundred and fifteen ‘Newbery Honorees’ (both Medal Winners and Honor books), and use Tableau Public to create a digital visualization of their findings and share with the project’s intended audience of librarians, educators and the DH community.

List of Participants
Project Manager/Researcher: Georgette Keane, CUNY Graduate Center
Developer/Researcher: Kelly Hammond, CUNY Graduate Center
Designer/User Experience: Emily Maanum, CUNY Graduate Center
Outreach: Meaghann Williams, CUNY Graduate Center

Narrative
Literacy is essential to a child’s development. Through reading, children expand not just their vocabulary but their understanding of the world around them. But can children use books to expand their understanding of the world if a majority of groups and topics are misrepresented or completely ignored? Recent studies have shown that there is a lack of diversity in children’s books published. In a study performed by the Cooperative Children’s Book Center (CCBC) of three thousand books published in 2018, fifty percent of the books featured a white main character. Twenty-seven percent of books featured an animal, and African/African American, Asian Pacific Islander/Asian Pacific American, Latinx, and American Indians/First Nations were the least represented. Sarah Park Dahlen and David Huyck, who presented these findings in an infographic argue that children’s literature continues to misrepresent underrepresented communities. But their hope is that their findings push conversations about this issue and lead to a change in publishing. And while there have been initiatives created by the American Library Association (ALA) and children’s book publishers to address this issue, the fact that children do not have access to all of these books is something to consider. But what about the books they do have access to?
School and public libraries offer children (and their caregivers) access to a vast number of books that they would never be able to purchase for themselves. Libraries also feature carefully curated sub-collections that allow children to see themselves in a story and can help them understand and deal with difficult topics. And more people are going to public libraries each year. According to the 2016 Public Libraries Survey Report by the Institute for Museum and Library Services, more than 171 million registered users visited public libraries over 1.35 billion times in 2016. Even with this increase in patrons, librarians often deal with limited budgets and shelf space, so books must be carefully chosen. Librarians will often rely on book lists and reviews for guidance on purchasing, and the books usually topping these lists are the Newbery Medal and Honor books.
First awarded in 1922 to encourage original creative work in the field of books for children, the Newbery Medal is awarded to the author of the most distinguished contribution to American literature for children. The author must be a citizen or resident of the United States, and the book must be published by an American publisher in the United States in English during the preceding year. The Newbery Medal is the most distinguished award presented to children’s books, and studies have shown that after the winners are announced, book sales can increase up to 1,000%. Not only is the general public purchasing, but so are public and school libraries. Honorees are highlighted on ALA websites and accompanying book lists, and librarians will often feature honorees in their display areas and programming. Children (and their caregivers) become exposed to these works that may or may not help them to understand and handle situations that deal with diversity in religion, race, gender, etc. And these books, for better or worse, usually stay on library shelves much longer than other books due to their status as honorees. As one head of children’s services states, “I don’t weed Newbery and Caldecott winners…I feel like if you win the Newbery or Caldecott, you kind of have immortality as a book. I just won’t do it.”
Since the Newbery Medal and Honor Books are so popular amongst the public and librarians, the questions this project hopes to answer are do these books provide an accurate representation of diverse backgrounds and subject matter? If so, has this been a recent development? And are there any trends of note in the honorees? The project team will attempt to answer these questions by collecting the biographical data and subject matter of all four-hundred and fifteen ‘Newbery Honorees’ (both Medal Winners and Honor books), and use Tableau Public to create a digital visualization of their findings. Hopefully, librarians and educators can use the visualizations to argue for more funding to purchase a wider array of books which fully encompass the experience of their patrons, if in fact, the selections of these awards are found lacking in diversity of representation and subject matter.
The project team will begin by gathering all relevant data from all of the Newbery Honorees. The data will be organized into eight categories: Year; Winner/Honor; Title; Author; Author’s Gender; Author’s Race; Main Character(s); Themes. Half of the data can be found on the ALA’s Newbery Medal Homepage. The researchers will use authors’ and publishers’ websites and the Library of Congress’ and New York Public Library’s bibliographic records to find the author’s gender, race, and a summary of the books and relevant themes. The programmer will experiment with Python to scrape data from these sources where possible. It is important to note that the team will pre-approve the correct format for the data before it is entered into an Excel spreadsheet so as to avoid any errors in the final report. For example, if an author is African-American or Cuban-American, the terms will be entered as ‘African-American’ and ‘Cuban-American.’ In regards to a book’s themes, the team will make sure to use the Library of Congress Subject Headings’ format. If a book’s themes include the relationship between grandparent and child, the Library of Congress Subject Headings (LCSH) format is ‘Grandparents and child’ and the theme will be added to the spreadsheet exactly like that. While librarians are familiar with LCSH, other members of the team may not be. The project director will be responsible for instructing the other members about LCSH.
Gathering the data will be the most time consuming part of the project; therefore, the project team will use existing software to display their results. Once the team has organized the data, they will use Tableau Public to create a data visualization of their findings. Tableau Public is a free service that allows users to create and publish data visualizations. Tableau Public users do not need programming experience, and there are many tutorials and a dedicated community available to assist the project team. Published visualizations are available to the public, and can easily be shared through email, social media and on websites. Once the visualization is completed the project team will analyze the findings. Once the visualization is completed, the project team will analyze the findings. The designer and outreach specialist will lead the team in creating a model to share with the public, striving specifically to inform librarians and educators. We also hope to approach the Association for Library Service to Children–the organization who grants the annual award.

Environmental Scan
Finding similar projects has been difficult, as projects tend to focus on analyzing diversity in the most recent children’s books published or creating a book list that focuses on a particular group or theme (gender or race for example). This visualization project will be unique in that it analyzes all four hundred and fifteen Newbery Honorees and that it will be an interactive visualization where users can search for specific information on authors, themes, and main characters. It is important to note these projects because they will provide guidance on how the project team will design the visualizations.
As mentioned above, the lack of diversity in children’s books has been revealed by individuals like Sarah Park Dahlen and David Huyck in their article for the School Library Journal and organizations such as the Cooperative Children’s Book Center. There are journals—both online and in print—that investigate diversity, such as the Research on Diversity in Youth Literature (RYDL), a peer-reviewed online journal hosted by St. Catherine University’s Master of Library and Information Science Program and University Library. Librarians are also aware of the lack of diversity in literature and will often create public programming to highlight books on diversity or create LibGuides, like Michigan State University Libraries.
The publishing community has also recognized the general lack of diversity and has started new initiatives to tackle the issue. Scholastic created the catalog The Power of Story that offers recommendations for books representing diversity of race, sexual orientation, gender identity, and physical and mental abilities. By creating the catalog, Scholastic hopes that young people will have the opportunity to “see themselves and their communities reflected, to read widely, and to understand and expand their world.” Book publisher Lee & Low created The Open Book Blog, a blog on race and diversity in children’s books. The blog will often have guest contributors discussing current issues, as well as promotions of books published by Lee & Low.
In regards to digital projects focusing on diversity in children’s books, the Diverse Book Finder, is a site that collects information on picture books that feature black and indigenous people and people of color (BIPOC) from 2002 to the present. The themes given on the site are Genre; Categories; Settings; Tribal Affiliation/Homelands; Immigration; Gender; and Race/Culture. An issue with the site is that it only tracks fiction and narrative nonfiction picture books from 2002 and only books with suggested reading levels kindergarten through grade three.
The only digital project found that features diversity and the Newbery Award Books is Lisa Bartle’s Database of Award-Winning Children’s Literature. The database has over 14,000 records from 158 awards worldwide. Bartle is a reference librarian and researches award winners and regularly adds them to the database. The page that lists database updates also includes how many of the books Bartle read. As of November 8, 2019, there were 14,397 records in which Bartle read 3,373. Visitors can search by keyword for books or by certain fields like award won or author’s gender.
While there are many people and organizations focusing on the issue of diversity in children’s literature, there is not an interactive data visualization project that focuses on diversity in Newbery Medal and Honor Books.

Work Plan
The project will consist of three stages: gathering the data, organizing the data into the pre-approved format, and then analyzing the data using the visualization software Tableau Public. Gathering the data from the four hundred and fifteen Newbery books will take the longest and involve the entire team. With guidance from the project director, the team will organize the data into the following eight categories: Year; Winner/Honor; Title; Author; Author’s Gender; Author’s Race; Main Character(s); and Themes. The first four categories are available on the ALA’s Newbery site. The team will have to find the author’s gender and race either on the authors’ websites, publishers’ sites, or an internet search (author interviews, etc.). The books’ main characters and themes will be found with the Library of Congress’ and New York Public Library’s bibliographic records. The programmer will expand her understanding of Python to scrape data where possible.
The team will then organize and input the data using a pre-approved format into an Excel spreadsheet. The third stage is to enter all data into Tableau Public, which is a free service that allows users to create and publish data visualizations. The team will experiment with Tableau Public and create visualizations that answer the research questions and share with the project’s intended audience of librarians, educators, DH community, and the Association for Library Service to Children.

Final Product and dissemination
This project will produce a website and digital visualization that explores diversity in the Newbery Medal and Honor Books. The website will outline the issues with other projects investigating diversity, the need for this project, and the overall results. The website and visualization, created with Tableau Public, will be shared throughout the library and information science, education and digital humanities communities. Tableau offers easy sharing of the visualization through social media, web pages, blogs, and emails, so it will be easily accessible to all potential viewers. The hope is that the project will be published in the online publications of the American Library Association (and its subdivisions), peer-reviewed journals hosted by MLS programs like RDYL, and independent publications like the School Library Journal. The project team also plans to submit proposals to conferences like the annual ALA conference and the Association for Library Service to Children’s Midwinter Meetings and/or National Institute, to present their findings.

Kelly’s Reflections: Week of 2/11 – 2/18

This week for me was all about orienting myself in greater detail to the scope of the project and my role in it. As our team edited the proposal via Google Docs, I gained a greater appreciation for the potential good our project could do. Georgette had referenced in the proposal a great data repository that could expand the scope of our work: The Database of Award-Winning Children’s Literature (DAWCL). Her find reminds me of one of the many benefits of teamwork. As a non-librarian, I easily might have overlooked the site myself, as its design is a bit dated and amateurish by today’s standards. Yet, a little research confirmed that the information is thorough, accurate, and up-to-date—a real treasure.

As the programmer for our team, I wanted to figure out this week how we might automate data gathering with Python. Georgette had already gathered some initial data on the 98 Newbery Medal winners themselves. DAWCL offers additional data we hope to investigate such as the 400+ titles designated as Newbery Honorees, as well as avenues for broader analysis, such as the other awards Newbery books have won and award-winning titles that haven’t received Newbery recognition.

New to scraping websites with Python, I failed with the DAWCL spectacularly. But, I learned a ton in the process, and I hope to make some breakthroughs in the coming week. The DAWCL makes it easy to sift through thousands of children’s books by award. Yet, it returns its search results via an ASP, so Python can’t simply request the page contents by using the URL. After combing the web for help, I learned to leverage Chrome’s developer tools to dig beyond the first layer of code usually revealed by the Inspect command. I was able, ultimately, to follow the network requests made by the search form as I performed the search for Newbery winners (very cool), and I finally found the html behind the displayed search results. The html is not terribly sophisticated, which actually isn’t great as far as scraping goes. I’d have preferred designated classes for titles, authors, other awards, rather than just paragraph and break tags. So, this week, I’ll need to get creative, either treating the code I found like text and using Python parse it or changing tools to scrape from the ASP.

Lest I held the project up with that exploration, I turned to data cleaning, making some tweaks to the original set that Georgette posted, in hopes of showing up to our first Skype session with something concrete to offer. First, I split author names using Excel’s text-to-columns feature, so that ultimately our users could investigate by an author’s first or last name. Second, using what I learned from a data viz investigation of authors of New York Times bestsellers, I made a note of data that may cause trouble down the road, such as the tilde in Matt de la Peña’s name or the accent grave in William Pène du Bois’s. I also noted ethical issues we’ll need to tackle: keeping track of our data sources and having discussions about what constitutes ethnicity. Not only are these decisions essential to make in the early stages, but they are important to document and convey to our ultimate users.

Having wrapped up our first Skype session earlier this evening, I’m heading into the second week with great optimism. Each member of our team is elevating my thinking about our approach and is spurring me on to find an expedient way to get our data!

NYCDH Week – Tome Collaborative Course Publications

Friday morning, I attended the Tome Collaborative Course Publication workshop which was part of NYCDH week in New York City. The Workshop was led by Alexei Taylor, a digital creative and instructor at NYU. Tome is a digital publication platform created for academic publishing which can be designated as a personal blog or a collaborative workspace for academic projects. Tome was built as an easier to use WordPress platform and is often used by academics for publishing and course development.

For graduate and undergraduates, one of the biggest challenges they face from the pedagogical model is that often classroom projects are created without thinking about the potential audience beyond the classroom. This model incentives students working in isolation and only having a real academic interaction with the professor and not their peers. Another challenge is that students often don’t have a well-kept record of their academic accomplishments other than the course credit they receive. Alexei build tome with the student in mind. Tome really helps students think of themselves as public content creators, writers, and project builders. Students can use tome for assembling a personal portfolio of their work and use it as a collaborative platform for building a project with peers. One of the projects that I was shown as an example was a creative writing course that was published as a digital anthology. Every student had their own page and their own page design but they were part of the same tome project which could be navigated from page to page.

Tome has many features that makes it very useful for people interested in publishing who may not have the skills to build their own platform. When you register for a tome account, you get a link to the front page which has a default minimal look with a small menu of a welcome page, a gallery, syllabus, and bibliography. You have the ability to delete any of the pages you don’t want through the back page. Tome assumes you may need certain pages for publishing. Through the back page you have the ability to add users to your project, create new pages, add content, edit code to customize the look of your Tome, control your analytics, and many more features. Tome makes it very easy to annotate and cite your work, offering many ways to add endnotes, links, captions and descriptions for borrowed materials. In addition, there are many formatting tools to tailor your work to the look you want it to have.

As I was learning about Tome and all the different publishing features it has, I thought about the project I am currently a part of and how Tome could potentially be used to host our project. This is a new tool that I will definitely be sharing with my project team. It is super easy to learn and super easy to use.

NYCDH Week- SpatioScholar Workshop

On Friday afternoon, I attended the SpatioScholar (Unity) workshop at NYU. SpatioScholar is an application, built in Unity, developed by a group of academics for scholarly work that requires 3-D and time-based processing and visualization. This tool allows scholars to explore spatial and temporal datasets with a unique set of functionalities. One of its most useful features is a timeline slider; it demonstrates change over time, showing how a certain building or location evolved over a set period of time. Additionally, through the simulation feature, the viewer can experience the space in first-person. They are dropped into the middle of the model and can control which direction to walk in order to explore the model. Another beneficial feature is the ability to connect primary materials to the model. Viewers can browse information like photos, drawings or textual documents that relate to a specific location in time. Finally, SpatioScholar lets viewers and users leave notes and view others’ notes.

I have never worked with 3-D models before so it was very cool to see how it can be rendered and manipulated in Unity. During the workshop we imported a generic model that the instructors created. We then added a timeline bar to the model; as you slid along the timeline, buildings appeared and disappeared over time. Next, we explored the different ways to view the model. When you are working in the edit mode, you can zoom in and out of the model and spin it around to view from different angles. Entering into the simulation mode, we were dropped into the model itself and had the ability to walk around and view the model in first-person. While in the first-person view, you can use the timeline bar to move across time to see how the space changes. We then learned how to view and add notes to the model. You can pin a note to a specific place or building as well as add to a specific time. Finally, we imported some primary material. Like the notes, you can pin photos or documents to specific buildings or to a specific time period. We pinned a few photos to a specific building and as we slid the timeline bar along, the photos appeared and disappeared with the building.

Overall SpatioScholar was pretty easy to use once I understood the type of data we were importing and how it corresponded to the data fields within Unity. The instructors also showed us some finished models created with this tool and that helped solidify the possibilities and capabilities available with this application.

Getting Started with TEI

For DH week, I attended the ‘Getting Started with TEI’ workshop hosted by Filipa Calado at the Grad Center. TEI, or Text Encoding Initiative is a markup schema for representing the structural, renditional, and conceptual features of texts. For anyone familiar with HTML, the two look similar. However, HTML encodes how a text should appear on a page while TEI encodes the context of a text. Filipa gave a brief introduction to TEI, the guidelines for using it, and then we practiced encoding with pages from the Picture of Dorian Gray manuscript.

Since TEI has its roots in XML, many of the same rules in XML apply when encoding in TEI, such as proper nesting. Meaning, you must always close the last tag you opened before closing a tag you opened previous to the last one. Ex: <sentence> <emphasis> </emphasis> </sentence>. Structurally, every TEI document begins with an XML declaration, or DTD (Document Type Definition). This declaration is necessary for a computer to read TEI. TEI documents will consist of two parts, the Head and the Body. The Head describes the source text’s metadata and includes the following elements: <TEI>, <teiHeader>, <fileDesc>, <titleStmt>, <title>, <publicationStmt>, and <sourceDesc>. The Body is the main section of the document and is what you will see when the TEI document is transformed. The Body section begins with <sourceDoc> and will feature such elements as <add> <del> and <line>.

Although it does sound like a lot of work, the end product is definitely worth it. Filipa shared pages from Mary Shelley’s Frankenstein manuscript from the Shelley-Godwin Archive. Using TEI, the project team was able to decipher Mary Shelley’s draft, while including Percy Shelley’s revisions in red.

Before starting an encoding project, Filipia advises that it is important to think about your goals of the project and consider not just the document, but your audience, your research goals, and how you wish to represent textual data. She offered some guiding questions from the Women Writers Project (WWP) for preliminary document and project analysis:

As far as you can tell, how is the document structured?
Are there any kinds of regularization or editorial amendment you will perform as you transcribe the text?
How much information about the appearance of the document do you need to capture?

Keeping these questions in mind, we then evaluated pages from the Picture of Dorian Gray manuscript, and tried to encode pages 20 and 21 in a text editor. We uploaded to a Google Drive and compared our results. I really enjoyed this process, but found myself getting stuck on the crossed out sections, trying to decipher the writing underneath. Something that Filipa recommended to me was to use the ‘<gapreason=’illegible’>’ when encoding. Filipa then showed the section she has been working on to check our work. There were a number of revisions on the page, and Filipa was able to get most of them. Some writing is still illegible and may need to undergo a similar process that palimpsests do to decipher the text.

If anyone is interested in TEI, Filipa shared a number of links, like the DARC wiki on Text Editing, and The Letters of Vincent van Gogh project.

NYCDH Week 2020

One of my favorite workshops from this years NYCDH Week was the Helen Keller Archive: A Fully Accessible Digital Archive at Pace University Tuesday afternoon. Four of the women who have been working on the project were present to discuss their work on the site and they were proud to state that the Helen Keller Archive was once only in one place, but now the archive is everywhere!

The road map to digitizing the archive focused on accessibility, then digitization and curriculum creation. Before explaining the accessibility aspect of the site, an important point one of the speakers made was that 1 in 5 American adults have a disability. So, accessibility is important if you want visitors to be able to view and read all the content your site has to offer. Also, Section 508 was brought up due to ADA compliance applying to the internet, but this is not necessarily a bad thing since this also helps with discoverability.

An important question that was brought up was, how do you know what to write for the description of the images on the site? One of the members of the audience noticed aspects for the descriptions of the photographs to be lacking in some of the features represented in the images. The speaker informed the audience member that she had a background in fashion, so she tended to focus more on the dress of the people in the photographs more than anything else. Another aspect of the site that was brought up was the transcription of the videos. The videos have a text transcription under the video as well as text transcription over the dialogue in the video and when something important on-screen happens that is not in the dialogue a voice-over informs of what is happening.

While utilizing the site the TAB key can be easily used to traverse through the site. Menus were placed in the same place throughout the site for easy maneuverability for people using screen readers. However, at one point, the developers tried something new with the display on a page and when someone with a screen reader ran into it, they questioned the process. The developers immediately changed the page to be in unison with the rest of the website. So, a major point that was made was consistency. If you have most of your pages in a similar design, continue to use that design because someone who can’t see the page and is used to a certain set up can become disoriented if things shift from page to page. It was also brought up that popups are evil! So, avoid whenever possible.

DHUM 70002 Digital Humanities: Methods and Practices (Spring 2020)