Kelly’s Reflection week of 3/9 – 3/16

Ode to Tableau, Part II

Last week, I extolled the virtues of Tableau, but I forgot one of my favorites: the open nature of Tableau Public. Thanks to the learn-with-us ethos of the site, the brilliant vizzes shared on the galleries are available to download and deconstruct. So, this week, I downloaded the public notebook of the site our team found most inspiring and germane to our work. I got to see how the data visualizer added her own text around the visualizations—something I had never done. I also got to explore the structure of her tooltips, which will allow me to include (eventually) some of the data worth exploring beyond our first glance.

This user’s work reminded me how simple and powerful waffle charts can be in conveying part-of-whole information to users. So, I used hers as models for how to share our data on author and protagonist gender and race/ethnicity. Then, in the playful nature of children’s literature and our audience of teachers, parents, and librarians, I created some icons in Illustrator to heighten engagement. (I did an online tutorial in Illustrator last summer, and was surprised what came back to me. Still, I’m very new.) And, I borrowed the color scheme from Emily’s great choice of graphics on the WordPress site she created.

Here’s what I have at this point:

Newbery Award Waffle Charts

The waffle charts weren’t easy for me, even though I had created one once before. I had to watch the first few minutes of a tutorial to refresh my memory. Even then, I had typos, which made for some holey waffles. I also miscalculated mathematically, thinking that a 10 x 10 waffle wouldn’t show less than 10% well, so I made a 20 x 20 grid—more work than I needed to do (and that I will most likely undo this week).

Further, because I knew the data from the bar charts, I was able to detect some flaws in the code that creates the calculated fields behind waffle charts. For example, the nonwhite protagonist percent is way too high. Right now, the code I created that decides if a protagonist is nonwhite reads:

IF [Protagonist Race/Ethnicity] != “white”
THEN “1”

Our intent with that chart is to show the number of human protagonists that aren’t white, but the code includes a nonhuman protagonist or a book with no protagonist in the count, inflating the percentage. The inverse is true with the female protagonist percentage which is currently too low, as the code counts only books with single protagonists that are female, not those with multiple protagonists that include a female. So, I’ll be tweaking this week, fixing these errors, incorporating feedback from my team, and exploring additional visualizations.

Kelly’s Reflection: Week of 3/2 – 3/9

This week’s post is an ode to Tableau, or at least to data visualization. If you haven’t gotten to play with process yet, let me share why I’ve come to love automated data viz.

The Magical
It doesn’t matter how well you know your data: the viz can surprise you. The data we scraped with Python (title, author, year), we didn’t know very well. But, the gender and race/ethnicity info, for both authors and protagonists, we had gathered painstakingly by hand—researching, categorizing, rethinking. We knew that data like we knew our own selves, and we had already drawn two basic conclusions: first, that the slight majority of Newbery authors were female, and second, that a greater majority of authors were white. But, once popped into the simplest of bar graphs in Tableau, the data stunned me. The awards are suffocatingly white—authors and protagonists alike. There’s no chance a kid of color could find herself in many of the books—in the pages or on the spine. Worse, there are more protagonists of color than authors of color, and we’ve already started to rely on Tableau’s tool tips to call attention to surprisingly recent white authors speaking to a nonwhite experience.

Newbery authors and protagonists by race/ethnicity

Screenshot of authors and protagonists by Census Bureau race/ethnicity. Note that yellow merely indicates Newbery Medal status, while gray indicates Honor status. White authors occupy the top row of the upper graph; white protagonists occupy the top from of the lower graph.

The early vizzes also pointed out that while there are more female authors, there are more male protagonists than female, so boys can see themselves more as the center of a story where the women are the spinners of them—something we hadn’t thought about.

The early visualizations also spoke, sadly, to the truth that it didn’t matter which of the racial/ethnic lenses we chose (though we went with the Census Bureau’s, so that we could make a point about its limitation). No lens would change the fact that white is the reality of the Newbery authors and characters.

The Mundane
Even if your first vizzes don’t yield these startling insights, automating visualizations in programs like Tableau can help identify errors in your data. And boy, did we have errors. Some were because we are human and make typos. Because we recorded our data in separate Google sheets, I made different decisions than Georgette and Emily when it came to issues that arose beyond the data’s architecture we had planned. While I thought I had caught and adjusted for them all as I merged their medal winners with my honors winners, Tableau said otherwise. Its agnostic eye registered white, wite, and White as three distinct races. We had other variances, such as how we registered our own doubt (?, ??, ???, and not sure), how we managed nonhuman protagonists’ genders and races/ethnicities (n/a, NA, and none), and how we categorized books with multiple or no protagonists (multiple, many, family, or male and female).

Yes, each exposed mistake meant more work. But often, the solution required returning to the very purpose of our project. What do we want to communicate, for example, about books with multiple protagonists? Is it more important to highlight the identities of center-stagers or ensemble casts? Did slicing the data of books with more than one protagonist into types of multiplicity clarify or fracture our findings? These are all great questions that we can start to answer with our user in mind.

The Message
This week, while I address the last of the data errors, I’ll begin the next fun (and scary) part: moving from simple visualizations to those that can really grip our audience, inviting them to explore for themselves. I browsed Tableau’s galleries for inspiration and found two in particular that I love: a woman’s personal reading history and a breakdown of gender and political affiliation in the House of Representatives. The former appeals to me because it is both interactive and powerful as a set of static images. Our group hopes to share with our users a printable poster to hang on a library or faculty room wall or to take to their principals, so this model looked pretty good. The government visualization was intriguing for its display of the same data in different ways, each geared to raise a different set of questions.

As we prepare to provide content for the website Emily is building, we know that the decisions we’ve made with the data and the early vizzes in Tableau will help shape our message and our direction. Already, in our weekly meeting, we’re discussing what other data we might gather—perhaps about other awards—that can help our users make good decisions, whether that’s the American Library Association who bestows the annual awards or librarians, educators, or parents as they make purchasing decisions.



The key to our social media strategy was identifying our primary audience. We are building a database to hold VR and digital reconstructions of sites and structures in peril. We knew that our topic was micro-targeting a very important issue where the primary audience was limited to people who already work in VR and digital reconstruction as well as people who are actively concerned about the environmental threat to archeological and natural structures. So the initial step of our outreach efforts was simply to find the online community where these groups of people engaged with one another. The most widely used platform for this type of  public academic engagement was twitter. So we created an email to register for a twitter account. The email will also serve inquiries and subscriptions from our website. This allowed us to discover a lot about the communities that will be the primary audience for our project and to think ahead about which institutional need our project fulfills in those spaces.

Social media strategy 

Our social media strategy is to engage with the 3D and Virtual reconstruction community and the environmental conversation community of scholars already on twitter by interacting with their work through likes, retweets and comments on a regular weekly basis. We want to make sure we are on top of news, discussions, and breakthroughs where a lot of conversation is being generated. We also generate conversation through new posts and add minor comments based on new articles that come into our inbox from the google alerts we set up. We regularly update our audience on our project. We also make strategic use of hashtags to bring people in the search tags to our page. We will also create a small marketing blurb to post about our projects on relevant facebook groups that will identify as the project nears completion.


Email strategy

Our email strategy is to organize a list of email addresses of public digital scholars and people working in the field of our projects into a sizable list.  We will gather these email addresses from academic communities that we have access to as well as public Linkedin and Twitter profiles. By the end of this week, we will have a draft email introducing our project that we will share with these members.We will have an automatic response email that links inquirers to our social media page where we update our project’s milestones.

Communication and Website

People will be able to reach out to us directly on twitter where our email address is listed and through our website where there will be a contact page that allows the public to ask questions and submit other requests. Our Website will also contain an about page that introduces our team members and gives specific details about the goal of our project. In addition our twitter feed will appear on the side of our site to allow people to follow us on twitter after they visit our site. The public will also be able to subscribe to our site to receive email updating them about the project.

We will also create a promotional  flyer with relevant information about our project to attach to email or give out in person.

Search Engine Optimization

A technical aspect of our outreach is to optimize pages of our website for google’s ranking algorithm. I’ve previously worked with SEO when I did marketing for a jewelry company and I helped place close to 7000 pieces of jewelry online. Although I used proprietary software to optimize our pages. I think it is worth trying to play with the algorithm in hopes of reaching a secondary audience. It certainly can’t hurt to try. Backlinks are proven to be one of the most effective ways of increasing google ranking, other than keyword optimization. Part of our strategy must be finding scholars who may be willing to link to our website from their blogs and other online public accounts (facebook/twitter)

Newbery Outreach Plan

    • We will create a website for our project in addition to an Instagram and Twitter account. The website will include some “traditional” academic writing to contextualize the findings and point toward further readings. It will also feature sections “for librarians,” “for prize committees,” and “for educators,” among others 
    • We will reach out to members of our target audience (Librarians, Educators, DH community), Advisors to the Project (Diverse Book Finder, ALA and their sub organizations like the ALSC), Publications like School Library Journal and LIS Programs. 
    • Twitter and Instagram are going to be the best way for us to share our work. 
      • Twitter: this is the best platform for having conversations with established figures in the field and for the website to get shared the most. We would get the most interacting and followers there. 
      • Instagram: we can do infographics that could be widely shared using the story mode especially if they are graphically compelling. 
      • Facebook is just so overrun these days… and also sharing websites on there is pretty seamless… like we can submit to pages that would have interest in our content rather than having our own page. 
    • We can immediately begin sharing the critical lenses we are using while developing the project and the “importance” of diversity and prizes (I think I put quote there because I mean relative importance… rather that we will be evaluating these claims)
  • Creating a schedule for posting is important, at least once a week in the beginning and then increasing posts once the site is active and we are closer to launching. Outreach main person to post, but team can suggest content.
  • Worth thinking about retweeting vs. posting original content. Maybe an equal amount of both in the beginning, building relationships through retweets and commenting on posts. Then increasing original content.
  • Should set up a gmail account for the project, and include it on the website. Also a contact form directing to the gmail. Important for people who are not on social media, have many questions, or organizations/institutions that would like to discuss collaborations.
  • Who will be in charge of handling questions and comments from email address and contact form? One person will filter the messages and forward to the appropriate person.
  • Outreach goals: 
    • Bring awareness to the research topic to our audience
    • Increase traffic to the website and our research
    • Establish connections with those in our field to help support/endorse our project and for potential collaboration
  • Resources to help bring awareness
    • DH Now: an experimental, edited publication that highlights and distributes informally published digital humanities scholarship and resources from the open web.
    • Dh+lib: aims to provide a communal space where librarians, archivists, LIS graduate students, and information specialists of all stripes can contribute to a conversation about digital humanities and libraries.

Kelly’s Reflections: Week of 2/24 – 3/2

This week my primary task was to finish collecting the identity data on Newbery Honor authors and protagonists while Emily tackled the Medal winners—tasks that can’t (or at least shouldn’t) be automated given the sensitive and interpretive nature of determining gender, race, and religion. Unlike the web-scraped details of dates and titles and authors, determining race and gender often included consulting illustrations or original cover art, author websites or obituaries, or reading the first chapter or two of a book.

As I gathered details, I also read some articles, both academic and popular, about the categorization of race and ethnicity. The reading highlighted for me that our project is really about whether the Newbery awards, as influential as they are in personal and institutional purchasing decisions and readership, have offered a variety of authors and protagonists in which kids could see themselves in literature. While, in some ways, we entered this project assuming that they didn’t, I found that, as I recorded hundreds of white protagonists and authors, the question of race and ethnicity was more about white versus nonwhite than distinctions of nonwhiteness. What was startling to me (and won’t be to anyone pursuing critical race studies) is that being white in these books or in author bios is often a matter simply of not being something else. Jewish authors were identified as such. Black protagonists, miserably, were often revealed through nonstandard English, African settings, or drug-dealing American neighborhoods. Black writer’s bios touted “the first African American to…” unlike their white counterparts who boasted interests in knitting or meditation or gardening. For the majority of the history of the awards, being white and non-Jewish is, in essence, the default character or author; being anything else is the variation.

This experience led us to decide that we will play with the data in two ways. Option one is to categorize the authors and protagonists by identifying whiteness and nonwhiteness, with layers of nonwhiteness available to our audience through further exploration with our interactive model. The second is to present the data through the flawed lens of the US Census Bureau, a model that allows us to talk about the slippery concept and powerful, lived realities of the social construction of race.

Some of our research pushed against our expectations. In some cases, Newbery Honorship did not, as we had assumed, guarantee immortality. Some early titles are out of print, and some authors or titles are not yet granted popular regeneration through Wikipedia. By contrast, the system sometimes seems to replicate itself, as the American Library Association, which created the awards, has created new awards in honor of some of its honorees. For example, the ALA granted Laura Ingalls Wilder, who received five Newbery Honors, an award of her own, which many Newbery authors, such as E.B. White and Beverly Cleary, have since received. (Her honor was, in 2018, renamed the Children’s Literature Legacy Award amidst controversy about her depictions of indigenous and black Americans, but until then, it propelled her—a Newbery author—as yet another standard of children’s literature. The renaming might have signaled to the ALA the flaw in their replicated machine.)

One deeply disturbing trend seems clear: Newbery status for diverse books seems sacrosanct for some. Take the case of Jamake Highwater, an honoree boasting Native American heritage who wrote Anpao: An American Indian Odyssey. In the 1980s, Highwater was exposed as a fraud—not indigenous at all—and he lost all federal funding as a result. Yet his book still sells, for $7.98, on Amazon, the cover brandishing its Newbery Honor, despite the wealth of incredibly fine work for kids by indigenous authors such as Joseph Bruchac and Tim Tingle.

While we are still a few days away from our first Tableau experiments with the data, one thing is sure: we’ll want to find ways (perhaps through Tableau’s storyboards or tool tips or perhaps through other sections on the website) to highlight the many unusual stories. Some winning authors are married to or are children of other winners. Some winning stories are about children from countries authors have only visited. Some winning authors share prestigious illustrators, such as Maurice Sendak.

Regardless of our direction in the next month, I realized this week that I wish that our course started with a literature review. I found, midway through this research, a book cataloging Newbery Medalists and Honorees (replete with author and plot info) that I’ll borrow from the CUNY library to double-check our research. Had Emily and I known the book existed, we might have saved ourselves some precious time. A literature review might also have saved us time in terms of deciding racial and ethnic categories in our data ahead of time, which we’ll now have to take time cleaning again.

One issue on my mind for the coming weeks is that of our collaborator’s agreement. The four of us are kind, flexible people, and so far, we’ve been trying to help each other a little more than owning a task and sticking to it. We’ve battled illnesses and tech issues, full time jobs and trips, and while we are making good progress, I’m not sure we’re functioning as leanly as we had anticipated when we first divvied out roles and expectations. We’re torn between wanting to learn all aspects of this project (as we hope to grow as DHers from this project) and wanting to meet weekly project goals to produce something powerful and fine. We’re already on our way to communicating more clearly: we’ve adjusted our weekly meeting time to better suit the group’s schedules. That’s a big step for aw-shucks warm-and-fuzzy people like us.

Margael’s bio

Margael St Juste is a digital humanities student at the CUNY graduate Center. She earned her undergraduate degree in History and a minor in Economics with honors. She completed her independent research thesis on the Hitler-Stalin Non-Aggression Pact of 1939 detailing the reciprocity of influence between the two regime leaders in the interim world war period. She also completed relevant projects on black identity, black scholarship, and middle east studies. In her current academic pursuit Margael is focusing on 20th and 21st century neocolonial systems and practices through the political paradigms of self-determination and sovereignty. Margael hopes to use digital tools and platforms to enrich scholarly conversation on these topics and to build technologies of access and visibility for other digital humanists with neocolonial barriers.

Margael is the outreach coordinator for the Heritage Reconstructed project which is dedicated to digital and virtual reconstruction of sites in peril. She is responsible for all external communication with academics and digital scholars, including those whose work are the foundation of our project. In addition, she promotes public conversation about the project on social media platforms as well as within digital humanities spaces. She is also responsible for optimizing pages for online search engines.

Newbery Data Management Plan

Data Management Plan Draft

Our data will be collected and stored in .csv files through web scraping programs we create in python. In addition, we will manually collect diversity data using Wikipedia and author pages. Our data is replicable, should it become lost or unusable. Our dataset is temporally restricted, from 1922 – 2020, with incremental changes made only once a year. So, the data we gather this spring will be to date until January of 2021. We will store the data on our laptops, publishing them to Asana, and expanding them via Google Sheets. We will analyze the data in Tableau, storing locally and on Tableau Public.

We will document our data collection procedures by having a document of data issues available to our group through google docs and our project management software, Asana. We will also share our python code, used for scraping data, so it is available to the public. We will state where we find our additional data not initially scraped. Specifically for information related to Wikipedia, we will include our research collection period. If any new sites are used, they will be added to this list of sources. We will ensure good project and data documentation by having a data document available for the group to reference. Kelly and Emily will be responsible for implementing our data management plan. We will use common sense when naming our files and we will conduct a heading review before bringing our data into Tableau. We will use community standards when defining race and ethnicity in our data. We will use the standard of entering data in lowercase characters to help keep the data readable and uniform. 

Our data, taken from already public sources, do not require any steps to ensure privacy or confidentiality. While we are required to share this data by virtue of our course, we also feel bound ethically to share our work with our audience, which we imagine will be librarians, educators, the American Library Association (who grants the Newbery Award), parents, and researchers like us interested in the diversity of the powerful honor. As a result, we will include a page on our site that openly shares our data in two formats: .csv to promote longevity and open-source access, and .xls to aid perhaps less tech-savvy constituents such as parents.

Our data will be permanently retained in an academic repository, the CUNY Academic Works. The data will be available in .csv. This format will be sustainably accessible because it is an open source format. We will also have the data in a .xls format for those unfamiliar with .csv. We understand that this format is proprietary and for that reason we have the .csv format available. The CUNY Academic Works will maintain our data long term. Our data is appropriate for the repository mentioned above. 

Meg Williams Bio

Meg Williams is a student at the CUNY Graduate Center Program for Digital Humanities and holds an MFA in Poetry from Hunter College, where she worked as an adjunct lecturer and a substitute administrative coordinator. With this background, Meg explores the intersection of art and technology through digital poetics. She is currently working as a project coordinator for the New York Public Interest Group at Queens College, where she works on issues such as higher education affordability and campus sustainability.

Her main role in the Newbury project is outreach. She is particularly interested in issues around the importance of minority representation, the concept of whiteness, the economy of prizes and their ancient origins. Meg hopes her explorations of these topics will contextualize her groups finding in a critical analysis of the Newbury Prize.

Christofer’s Bio

Christofer Gass is a fulltime student in the Digital Humanities Masters Program at CUNY, The Graduate Center. His capstone project is a text-based game of the 1964-1965 World’s Fair. While an undergrad at Columbus State University he received his B.A. in Art History with a minor in Geography. While attending CSU, Christofer was deeply involved with the Arts in Columbus. As an intern, he worked with the Bo Bartlett Center, the Columbus Museum and CSU’s Illges Gallery. As an employee, he worked for Bo Bartlett Studios and Alan Rothschild’s Do Good Fund, a collection of Contemporary Photography of the American South. As a volunteer, Christofer directed Bartlett’s Home Is Where The Art Is: an art program for the homeless community and assisted Bartlett with Art in Jails: an art program for inmates in Muscogee County Jail. He was also on the board of the Historic District Preservation Society.

Christofer’s role within VRD: Virtual Reconstruction Database is developer and UX designer. In addition to these roles, Christofer will also assist with research and wherever else needed within the group.

Georgette’s Bio

Georgette Keane is the Library and Archives Curator at the American Irish Historical Society. She received her Masters of Library Science and Certificate in Archives and Preservation of Cultural Materials from CUNY Queens College in 2015, and is currently enrolled in the M.A. Digital Humanities Program at The Graduate Center, CUNY. Her research background is in Digital Palimpsests and promoting access to archival collections at historical societies and museums. She has experience with WordPress, Omeka, ArchivesSpace, and CONTENTdm.

Georgette serves as the Project Manager, which includes managing the Newbery project and creating a detailed project schedule to ensure completion by the established deadline. She is also responsible for providing support to the other team members, including assisting with data collection and establishing connections with children’s literature professionals and librarians.