On Language Generation
26 Mar 2019 |
Language might be the holy grail when it comes to understanding intelligence. The ability to produce and understand language allows humans to have complex thoughts and communicate those thoughts to one another. The importance of this cannot be understated; through these functions, language has given us the means to collectively design and build things, to shape cultures with explicit norms and values, and to record history and learn from our ancestors. Without language, the society we experience today could not have originated.
In this first KIMO blog post, we dive into the topic of Natural Language Processing (NLP). We’ll discuss how computers understand language, dive into the latest language model we have today, and discuss the opportunities and risks involved. The reason for this blog post is simple: this is one of those moments in history where the speed of innovation outpaces the speed with which policies can be written. A proper discussion on the power of, proper use and risks associated with language models needs to happen today.
Computers vs. human language: finding the signal in the noise
Computers have always struggled to understand human language. The classic associated apparatus that we associate with computer use (e.g. mouse, keyboard) are there mostly to accommodate this weakness in technology. This paradigm is changing quickly due to new techniques in the field, as shown in recent applications text-to-voice applications like Google Translate, Siri, Alexa, smart chatbots like Mitsuku and Cakechat, as well as the latest invention: language generation models (e.g. Open AIs GPT-2 model).
Although language may feel natural to use, human language is deeply complex. Where millions of years of evolution have hardwired us to automatically process nuances the come with day-to-day use of language, every computer in 2019 has to start afresh. The reasons for the complexity of our language are many: human language is often ambiguous (the meaning of words depends on the context in which they appear), not all communication is verbal (e.g. non-verbal communication, intonation), there are many languages out there that don’t all share the same structure, and some words seem to contain more meaning than others (e.g. stopwords are often left out of computer language). In short, when it comes to language, the task for the computer is to filter the signal out of the noise, and this is by no means an easy task. In practice, we’ve found that the hardest part for a computer is understanding the context of the conversation, which can have many dimensions (emotions, location, history etc.). Funnily enough, that comes completely naturally to (most) humans.
Modeling language: pre-processing, feature extraction, prediction
Given the complex nature of language, it should come as no surprise that most high-performing language models are challenging to build. As a computer understands 1s and 0s better than our language, the story usually starts with what is generally called ‘pre-processing’. This step consists of feeding the raw language obtained (e.g. movie scripts) to a ‘pipeline’ of sequential steps that makes the raw text easier to process. Techniques used in such pipelines are stemming, lemmatization, tokenization, one-hot-encoding, and sometimes dimensionality reduction (like LDA). The raw text is then fed through, and the pipe spits out an updated dataset that better fits the model. The model, often build with recurrent networks (see here for a good introduction) to extract the relevant features in the text. Finally, when well trained, the model can be used to make predictions (e.g. fill in the word: ‘I usually drink my coffee ___ in the morning’). Interesting side note: as these models technically model sequences of information, they can also be used for other sequences like weather models or stock prices.
The latest and greatest of these sequential (language) models are built with LSTM cells. These LSTMs are a type of recurrent neural network that excel in keeping a sense of context of the word being checked (e.g. it keeps track of the last 5 words before the relevant word). In addition, they can come with an additional module called ‘attention’, which allows them to understand which words are relevant to focus on and which less so. These LSTM attention models are mostly responsible for the increased accuracy we see today in language models. They are the fundamental building block of most successful chatbots, translation engines, text-to-speech systems. They are also the main reason why the accuracy of those models has increased so rapidly over the last few years. Although complex to build from scratch, packages like Pytorch and TensorFlow allow for relatively quick steps into the world of recurrent networks.
Language generation: dystopia or utopia ahead
Now that you have a solid background into natural language and computer science, we’re ready to jump to 2019. While the era of understanding natural language is still ongoing, the next big thing is in natural language generation. Language generation models generate new texts based on a couple of features that the user feeds in. For example, when a user feeds in the sentence ‘The weather in Amsterdam is amazing this time of the year’, the model will write a completely new (and note: completely made-up!) paragraph based on this. You can try the latest model here below, thanks to Open AI. The model, known as GPT-2, is able to generate a paragraph of related text based on a simple sentence. It will take ~30 seconds to run on a 2018 Macbook Pro (much longer if you’re checking on your mobile). Note that – in contrast to our video chat with KIMO which runs in the cloud – this model runs locally on your computer.
The potential upside of such models is obvious. Writing blog posts (see Articoolo) can be left to your MacBook, students could automate their essay writing duties (see the Sentence Fairy), script writers for tv episodes can hire new intelligent creativity, sing-a-song writers could have an AI compose their music (listen here to a first attempt, modeled after The Beatles), and authors might find that publishing a book is no longer the 2-year journey it once used to be.
However, there is also a potential downside to all this. Fake news becomes easier to generate as no human creativity is needed, hate speech can be generated automatically based on a couple of evil statements (just try it for yourself, you’ll be surprised), and stock price manipulation through online blog/Twitter posts might enter a whole new era. For these reasons, the GPT-2 model wasn’t fully released by Open AI (contrary to its mission) and the model shown in this blog post is only a partial version of the full model. That said, language generation is on its way to perfection. Policy makers, it’s time to pay attention.
‹ back to blog