Tag Archives: visualization

Mapping NYC subway traffic: an interactive

Ever wondered if you could count how many people go through the subway every day?

Okay, probably not. But bear with me here. No code this time.

For our first project in the Metis Data Science Bootcamp, we were given a hypothetical data science project by a company. Our team was asked to use data to help a nonprofit. In an email from an organization created to advocate women in tech, we got our assignment. A quote:

Where we’d like to solicit your engagement is to use MTA subway data, which as I’m sure you know is available freely from the city, to help us optimize the placement of our street teams, such that we can gather the most signatures, ideally from those who will attend the gala and contribute to our cause.

Basically, calculate where people are. But it wasn’t so simple. Our team, made of Ingrid, Ben, Ken and myself, thought through it like this:

1. The busiest turnstiles aren’t necessarily the best. We’re looking for a demographic here – young, progressive and interested in tech. Thousands of pissed off people at Penn Station won’t be any good. We crunched census, income and community data to identify the best neighborhoods.

2. Sometimes the data you’re given isn’t enough. We had to look for lots of extra resources beyond simple MTA turnstile data. Some of this helped us make the map below.

3. When you’re doing data science, make something useful. It’s easy to get lost in “We could do this…” and “But what if…” What if what your client actually cares about is something they can use, not all the stuff you discovered? Never forget your end goal.

And so, what we presented to the company is the below map. We selected five places for their street teams to hang out. The heat flashes show the busiest subway stops over the dates you can see in the bottom corner. Notice how they change throughout the week?

Next step is to plot hourly movements over a day.

Mizzou and bold new frontiers

Interactive data visualizations are all the rage these days, with major news organizations like the WSJ and the New York Times setting up interactive desks that churn out engrossing, compelling visualizations.

Mike Jenner, the Houston Harte Chair in Journalism at Mizzou and data visualization extraordinaire, set up a workshop back in October where dataviz journalists Chris Canipe (WSJ), Andrew Garcia-Phillips (Chartball) and Leah Becerra (Omaha World-Herald) came and taught us all D3, a Javascript library of data visualization, in one speedy weekend.

Check out that last link for some awesome data visualizations that capture the power of D3.

In one hot-and-heavy 16 hour sprint, we got the basics in Data Viz. Nobody left as data experts, but the class exposed lots of students to the future of digital journalism.

This was huge for one reason: while Chris, Andrew and Leah were all self-taught in data viz, they brought what they learned to an academic environment.  

After the class, Madi Alexander and myself organized the Mizzou Data Viz Club, where we met and tried to hang on to the skills we learned. (Madi recently got an internship at the NYT as a digital reporting intern. Hooray!)

In a conversation with Mike, I discovered he wanted to make a longer-term class, he just needed pledges that students would take it.

Our Dataviz club had the students and he had the resources. It was like the planets aligned.

Mike moved swiftly, organizing an 8-week course in D3 with Chris for this spring semester. I helped him design the posters to advertise it and recruited a bunch of students to join the class. Chris drives in occasionally from his home in Saint Louis, where he works remotely for the WSJ.

Our skills levels are all over the board, from accomplished programmers to brand new students. The class is open and modular with each student working at their own pace. It’s pitched together and sometimes challenging, but I want to outline a list why this Data Visualization class is wildly important to the future of Mizzou journalism academics.

1. Data visualization skills are in high demand. The success of Mizzou’s CAR and Data Reporting classes are testament to this. We teach the students how to find the data and how to pull stories from it, but now we’re on the cutting edge of visualizing it.

2. Most people who know this stuff were self taught, and our class is the foundation for rigorous academic improvement of the subject. By turning this into an academic affair, we make it easier for students to learn the basics quickly. Once people are learning it, they can move beyond and improve it, developing new techniques and taking those to industry publications.

3. It’s confusing, challenging and uneven – but it’s happening and we’re moving forward, setting standards for future dataviz classes. After this is over, we’ll know what kind of classes should be required for prerequisites. We’ll understand gaps in the digital knowledge of the journalists we’re training. We’ll know what kind of classes we need to establish a powerful data journalism sequence courses. This is us surging into a new frontier for science, know what I mean?

As we move forward, I’ll inevitably have more to say about this venture, so stay tuned.