great political thinking 2020

Great Political Thinking 2020 is a podcast whose script is generated by a computer, but is performed and produced by humans as if it is material for a regular political podcast.

The result is a podcast that sounds unmistakably like a typical political podcast, with hosts and guests discussing candidates' performance, bringing in archived recordings of their speechers, and arguing about polling methodology. At the same time, however, something doesn't sound quite right - sentences double back on themselves unexpectedly, the antecedent of a pronoun becomes hopelessly lost, and suddenly one's place is lost completely, adrift in a sea of "nonsensical gibberish, occasionally broken up by the names of candidates, polling data, or the word 'narrative'."

You can listen to it here, as well as on Spotify, iTunes, Google Play, etc.

Technical Detail

The source material for the machine learning system is a batch of podcast transcripts that I scraped from NPR Politics, FiveThirtyEight, Slate Political Gabfest, and the Daily from the New York Times. This was heavily facilitated by the use of the Beautiful Soup Python library which made it much, much easier to navigate web pages programatically. A handful of python and shell scripts were all that was needed to amass a decently sized (14Mb) corpus of political podcasts, each marked with special start and end tokens to ease the processing in the next step.

You can find the code for the scraping here.

The heaviest lifter in this process is of course the language model that generates the scripts. I trained a released version of OpenAI's 355M-parameter GPT-2 text model using Google Co-Lab and the gpt2-simple package. The first iteration of scripts was generated using 10,000 fine-tune training steps on the 14MB of extant podcasts.

Once the scripts were created, I took some editorial liberties by assigning actors to lines and cutting out some words or sections. For the most part, the generated scripts are fairly close to the ultimate product. The original transcripts are linked for each episode on the podcast website.

Acknowledgements

Saturday Morning Breakfast Cereal comic.

Emily Zhao, who provided excellent directing feedback, Christina Dacanay for her voice and logo design contribution, and my many classmates who volunteered their time to read scripts that didn't make any sense - Patrick Warren, Sam Krystal, Daniel Fries, Julian Mathews, Nikhil Kumar, and Schuyler DeVos; and NYU ITP for their recording equipment and space.