Skip to main content
Engineering, Uber AI

Controlling Text Generation with Plug and Play Language Models

5 December 2019 / Global
Featured image for Controlling Text Generation with Plug and Play Language Models
Figure 1. How to steer a mammoth. Many attribute models used in PPLM are 100,000 times smaller than the language model (LM), roughly the weight ratio of a field mouse to a wooly mammoth. The PPLM method is plug and play: it can combine any generative neural language model (mammoth) and any differentiable attribute model or models (mouse) representing the desired steering objective(s). It is also resource efficient: the LM is used as-is without training or updating any of its weights (mammoths are hard to train, after all).
Figure 2. The PPLM approach to controlled text generation can be decomposed into three steps, shown above and described in the text of this article.
Figure 3. This simplified cartoon diagram why steps must be taken to maximize both p(a|x) and p(x) en-route to generating PPLM samples. The sentence under consideration is shown as a black dot, which is first pushed in the direction of maximizing p(a|x) and then in the direction of maximizing p(x). In practice, rather than two separate steps, gradients of both terms are combined to compute the single step corresponding to the net displacement.
Table 1. PPLM-BoW samples, under the [Space] topic, generated with different prefixes. Words contained in the bag are highlighted in bright color and relevant words (judged by humans) softer color. The last two examples condition on the same topic and prefix, and the different samples show the diversity of text produced by PPLM sampling.

We can switch to other topics, such as [Military] and [Science], shown below in Table 2, all with the same prefix. Here we see that increasing the probability of generating words in the bag also increases the probability of generating related topical words not in the BoW (e.g., in the [Science] sample shown below, note that “question” and “philosophers” are sampled before the first BoW word, “laws“). This is because shifting the latents coherently shifts the topic in the direction desired; as we’ll see later, this works better than directly promoting the set of desired keywords.
Table 2: Samples from the baseline LM, GPT-2 (top row) and PPLM-BoW (other rows) corresponding to different topics (e.g. Science), all conditioned on a single prefix: The issue focused.
Table 3. PPLM-BoW samples guided toward the [Politics] topic starting from incongruous prefixes. In these examples, to have effect despite the odd prefix, we used three times the regular step size that was used to generate other samples. Some samples (last) start to degenerate by repeating keywords.
Figure 4: Ablation study of the effect PPLM has on controlled language generation for topics based on a BoW. In the figure above, “B” corresponds to generation with a pre-trained GPT-2 LM (345M parameters); “BR” corresponds to generating multiple samples with a pre-trained GPT-2 LM and then choosing the one that maximizes p(a|x); “BC” corresponds to generated with a pre-trained GPT-2 LM with updated latents; and “BCR” refers to generating multiple BC samples with modified latents and then choosing which maximizes p(a|x).
Table 4. Sentence samples in triplets, generated by baseline GPT-2, PPLM-Discrim steering positive, and PPLM-Discrim steering negative, all conditioned on prefix “The chicken”. Words related to the sentiment (as judged by humans) are highlighted.
Figure 5. Ablation study of the effect PPLM-Discrim has on steering language generation for desired sentiment and style, with ‘B’, ‘BR’, ‘BC’, and ‘BCR’ variants following those described in Figure 4.
Table 5. Controlled text generation with multiple attribute models. Each attribute model is noted with a specific color and the relevant words highlighted with the corresponding color. While multiple topics are expressed, the coherence of the passage does seem to suffer somewhat as the model struggles to merge topics.
Rosanne Liu

Rosanne Liu

Rosanne is a senior research scientist and a founding member of Uber AI. She obtained her PhD in Computer Science at Northwestern University, where she used neural networks to help discover novel materials. She is currently working on the multiple fronts where machine learning and neural networks are mysterious. She attempts to write in her spare time.

Sumanth Dathathri

Sumanth Dathathri

Sumanth Dathathri is currently a graduate student at Caltech, and is interested in problems at the intersection of control theory, formal methods and machine learning. He was a summer 2019 intern at Uber AI exploring language processing.

Andrea Madotto

Andrea Madotto

Andrea Madotto is a third-year PhD student at The Hong Kong University of Science and Technology studying Electronics and Computer Engineering and he was a summer 2019 intern with Uber AI. He works on natural language understanding and conversational AI.

Piero Molino

Piero Molino

Piero is a Staff Research Scientist in the Hazy research group at Stanford University. He is a former founding member of Uber AI where he created Ludwig, worked on applied projects (COTA, Graph Learning for Uber Eats, Uber’s Dialogue System) and published research on NLP, Dialogue, Visualization, Graph Learning, Reinforcement Learning and Computer Vision.

Jason Yosinski

Jason Yosinski

Jason Yosinski is a former founding member of Uber AI Labs and formerly lead the Deep Collective research group.

Posted by Rosanne Liu, Sumanth Dathathri, Andrea Madotto, Piero Molino, Jason Yosinski