Background

As I try to piece together how these deep learning models work I’m becoming more aware of the tension between what we expect from them as creative tools and their actual mechanics. That is to say I don’t really think they are necessarily anti-creative, but for a model to generalize what it has learnt from training data, it needs to reduce the influence of outliers and aim for the most probable output - but isn’t creativity in the outliers?

So in my creative practice with deep learning, I’d like to play around with probability as a creative tool. This is a first attempt in that direction.

Reference

I’m thinking of is the infinite improbability drive from the hitchhikers guide to the galaxy. In short, it’s a device that takes the characters through “every conceivable and inconceivable point in every conceivable and inconceivable galaxy”. In the book, improbability it turns nuclear weapons into aquatic animals and bowls of flowers, protagonists into sofas - could it also turn language models into creativity inducing and surprising tools?

This code is based on this guide and chapter 11 of Deep Learning with Python. Used CoPilot to debug here and there, and GPT as stack overflow, but didn’t use any “give me this function prompt”

So What Is It?

I’ve trained a toy language model to do word next word prediction. The catch? You can also insert a probability, so instead of automatically giving you the most probable word - it will give you the word that matches the probability of your request. Does it work? Define “work”.

Data Processing

I used Google’s pre-trained word2vec embedding layer that was trained on millions of news articles. For the training and test set I used the IMDB dataset builtin to keras. The dataset has 800,000 movie reviews. The model I had time to train uses the 15,000 most common words.

Processing Steps

  1. Decode the data and remove labels
  2. Encode a token dictionary
  3. Turn tokenized data into sequences
  4. Create n-grams for training

Model Architecture

I tried to keep this simple and use the most straightforward type of modern model used for NLP which are simple Recurring Neural Networks (RNN). As you’ll see in the section I describe the training - I learnt why the newer versions are so valuable. I ended up using a GRU, which is an RNN with some extra bells and whistles.

I also used an embedding layer. At first I tried to train my own but learnt why everyone is talking about “compute” the hard way. Ended up using google’s pre-trained word2vec embedding layer. To avoid over-training which was killing me, I added dropout and kernel regularizers.

Here is the model build in Keras

self.model = keras.Sequential([
            layers.Embedding(
                input_dim=self.vocabulary_size,
                output_dim=embedding_dim,
                weights=[embedding_matrix],
                input_length=sequence_length,
                trainable=False  
            ),
            layers.Dropout(0.2),
            layers.GRU(100,
                dropout=0.2,
                recurrent_dropout=0.2,
                kernel_regularizer=regularizers.l2(0.001)
            ),
            layers.Dropout(0.2),
            layers.Dense(self.vocabulary_size, activation='softmax')
        ])