Noctes Mathematicae - Generating Questionable Pandemic Advice with GPT-2

I have created a site that shares highly questionable advice for life during a pandemic. The advice was generated using OpenAI's GPT-2, which you may remember as the AI that was initially hyped as being too dangerous to release.

Some of the advice is reasonable:

Remain calm.
Avoid spreading germs.
Protect areas of the body from contamination.
Consider taking a long leave of absence from work.
When entering or exiting a venue, keep your hands off people.

On the other hand, some of the advice is definitely not a good idea:

Wrap your mouth with tape.
Drink water with table salt.
Avoid using any sort of soap.

And some advice is very odd:

Stay away from close contact with death-like objects.
Remove fingerprints from your hands if possible.
Report any unexpected sound to the appropriate authorities.

What's Going On Here?

A message from the future (October 2020): A while after writing this article, GPT-3 was revealed. It's like GPT-2, only more so. This time around the press focused on the few-shot learning thing a lot more. Keep that in mind for context as you read this outdated article.

GPT-2 is a language model, meaning that when given a chunk of English text it is trained to predict what comes next. By repeatedly predicting the next word¹, the model can be used to generate text.

One of my goals with this blog is to only write about things that haven't already been covered by every other computer-science-y blogger on the internet. I won't write about GPT-2 in detail, because it's been done repeatedly, but I do want to mention the aspect that I find most interesting.

The pre-trained model released by OpenAI was trained on a massive dataset (consisting of every outgoing link on Reddit with at least 3 karma), and as a result it understands many different domains of English text. This means that I didn't need to do any "fine-tuning" (training the model on domain-specific data) to generate my pandemic advice. I simply showed the model a list of a 3 examples of pandemic advice, prefaced by the text "In the event of a global pandemic, remember:", and then asked it to predict what comes next.

This ability to perform a specific task (generating pandemic advice) without fine-tuning is very convenient, since training a model this large requires a lot of computing power, not to mention an appropriate dataset.

In fact, attempting to accomplish specific tasks without additional task-specific training is the focus of the paper introducing GPT-2 by Radford et al., which is titled "Language models are unsupervised multitask learners." They show that a general language model can perform ok-ishly on tasks including reading comprehension, text summarization, and even French-to-English translation, despite the fact that the model was only trained on English texts².

This is known as "few-shot" learning³, and I think it's neat that language models can do this ⁴. I'm kinda surprised that most of the media coverage of GPT-2 didn't mention this at all, mainly focusing on how plausible the examples of generated text looked⁵ and the whole "too dangerous to release" angle. This is especially strange considering that they could have spun it as a big step towards general AI and "the Singularity"; then again, I guess there's a limit to how much sensationalist spin you can put on one story.

Try It Yourself

You can experiment with GPT-2 at talktotransformer.com, and if you're more technically inclined⁶ you can take a look at my script for generating lists of short sentences/phrases (which in turn uses huggingface's transformer library).

GPT-2 actually operates on sub-word units. For instance, "-ing" or "-able" might be predicted. Specifically, Byte Pair Encoding is used. Also, the model doesn't "predict" a single next token, instead estimating the probability distribution for the next token. ↩
Of course, some English texts contain occasional French phrases, allowing the model to learn a bit of French. ↩
Or at least in the same ballpark as few-shot learning. ↩
Although a language model that can solve arbitrary tasks well would probably need to be unimaginably massive. Note from the future: GPT-3 is much better at this than GPT-2, but it does accomplish this mainly by being truly massive. ↩
In particular, most articles heavily featured a silly text about unicorns that only appears in the appendix of the paper. ↩
and have gone to the trouble of getting CUDA working properly on your system ↩