Writing Stories Using AI

Edward Wang
5 min readAug 1, 2021

Recently I started reading a lot of stories. Fantastical quests, mysterious murders, and adventures far away, each new world became more entertaining than the next. However, each story also had its flaws cliches, tropes, and plot imperfections naturally I began to think what if an AI could write stories?

So I decided to build a text generator using GPT-2. Lemme explain.

GPT2 stands for Generative Pre-trained Transformer 2. Now that may seem like a whole lotta words I just threw at you but let me explain. Let’s start with generative

Generative

Generative means that the model uses unsupervised learning models to predict the next token. In our scenario think of a token as the next word. And think of unsupervised learning as a way to find patterns in data (in our case patterns in linguistic structure) all to find the best answer.

The “patterns” are the clusters here that are identified. One the left by the algorithm on the right by input

I say “best answer” and not “right answer” because there is no right answer in unsupervised learning. If something like a “right answer” exists you would need to train the model on every single possibility, remember it's AI — not doctor strange.

That being said in order to find the “best answer” we need to feed the algorithm enough parameters in order for it to finally start creating coherent sentences. So how many parameters does GPT-2 need? Well to answer that let’s get into pre-training

Pre-trained

It turns our GPT-2 is trained on more than a couple of parameters. In fact, GPT-2 is trained on 8 million web pages for a total of 1.5 billion parameters.

Called transfer learning the philosophy behind pre-training extracts knowledge from a source setting and applies it to a target setting in order to get the best answer. We use dataset 1 to accomplish task 1, where it accumulates knowledge that can be transferred to task 2. This process stops us from just relying on dataset 2.

In our scenario, dataset 1 is the 1.5 billion parameters is pre-trained with a task resulting in accumulated knowledge. This knowledge can then be transferred to our task of predicting what the next word is going to be, and the next, and the next, and so on.

Transformer

Lastly and perhaps most importantly we have the Transformer architecture.

Transformers are traditionally great for sequence to sequence which is suited for language translation. This is because of its decoder to encoder method.

In which given a sentence the…

Encoder: accepts an element of the input sequence at each time step, processes it, collects information, and propagates it forward.

while the…

Decoder: given the entire sentence, predicts an output at each time step.

If we look back at the decoder's functions it actually doing exactly what we want it to do. We feed it a sentence and it predicts an output at each time step. This is why GPT-2 takes out the encoder part of transformers and only uses the decoder in order to predict the next output.

What makes GPT-2 able to generate entire paragraphs is its most important recipe its auto-regressive nature. In which the most recent output word becomes the new input later. For example, if I inputted “Never gonna give” and the next word predicted was “you” the next input would be “Never gonna give you”. And by constantly iterating this process I’m able to generate complete paragraphs.

End result

After inputting the sentence “I was walking down the road when” the following story took place.

I was walking down the road when I came across a tall, green wall of trees.

"What is this?" asked I.

"I'm standing right here, and it seems to be like a bunch of white trash," he said.

I didn't notice the other villagers, or of the others who sat behind me, but I do think they got their first idea of what I look like.

The trees that I stood next to were white, white with green leaves, but when I saw those bushes it was obvious that it was tall. This was probably due to some sort of wind or some sort of rainfall. If I took these bushes out, I would have realized that the top of the trees in the middle were quite some time ago.

It looked like something had happened after everything the villagers had given me. After all this time, I still didn't have the faintest hope that I would find out about anything. Since I was sitting next to nothing, I couldn't help but begin to think of something.

In this world, there is no other life besides death.

"Let's get this under control," I said.

He was probably talking to someone who had been killed by villagers or people who had died from a bad deal or something like that; it would seem they had just started to feel the effects of this world.

I turned my head, but I could never get close enough.

Although seemingly nonsensical. The overall structure of dialogue and narration makes sense. The AI even offers some seemingly profound statements such as “In this world, there is no other life besides death.’

Though far from perfect we can still see GPT-2’s mastery. In fact, this isn’t even the full capacity of GPT-2. The model I trained on used only 345 million parameters compared to the complete 1.5 billion parameters normally used. And with advancements such as GPT-3 with 175 billion parameters, the future of AI storytelling seems bright.

🔑 Takeaways:

  • GPT-2 stands for Generative Pretrained Transformer -2
  • Generative means that the model uses unsupervised learning to predict/ generate the next token
  • Pre-trained means that the model has already been trained with GPT-2 trained on 1.5 billion parameters
  • Transformers are adjusted to only use decoders which given a sequence predicts an output
  • GPT-2’s autoregressive nature means new outputs become new inputs
  • With GPT-3 the future of story generation becomes even brighter.

--

--

Edward Wang

Artificial Intelligence, Deep learning enthusiast.