Turning my computer into Van Gogh

Creating art with Neural Style Transfer

Eesha Ulhaq
7 min readMar 25, 2020

Guess which artist painted this?

None, my computer did!

Here’s how I taught it with the help of a deep learning algorithm called Neural Style Transfer (NST).

NST allows us to mash the content of an image and the style of another image to generate a baby image.

Content + Style = Baby image

At a high level, we have two images and our generated image. We find the style differences between our style image and the generated image, while also the content differences of our content image and the generated image.

We then tweak our generated image until there’s as little difference between both the style and content images as possible.

Neural Style Transfer was first outlined in this paper by Gatys et al. Udacitiys Intro to Deep Learning course helped me implement this algorithm.

Neural networks mimic neural connections in the brain; each node or neuron passes its output as input for the next neuron. For NST we’re going to be using a special type of neural network:

Covenulational Neural Network (CNN) built for images

CNN’s specialize in visual information; they’re great for finding patterns in images. CNN’s have layers that extract different features from an image.

We’re going to be using a pre-trained CNN called VGG-19, which can find content from thousands of classes. Whatever objects we’re looking for in our content image will likely fit into one of these classes.

Each layer of units is like an image filter looking for a specific feature. A bird recognizer would look for wings, a beak, legs, etc.

If zoom into layer one, a unit is trying to find nine images patches that will maximize its activation. Each image patch is looking for specific features like lines, edges or colours.

The lower layers detect globs or lines, while the deeper layers detect more complex features like shapes and objects.

Note, in style transfer we’re not training the neural network. Instead, we’re trying to minimize a loss function by changing individual pixels of our generated image.

When we train a neural network, we tweak weights and biases. Instead, in NST we keep the same weights and biases while we tweak our image.

We can break down the algorithm into 4 parts:

  1. Calculate the loss function of the content
  2. Calculate the loss function of the Style
  3. Combine weighted loss function of both the style + content
  4. Use backpropagation to reduce the overall loss function

We start with a messy canvas by generating a random image. Let’s call it G.

Random Noise(G)

Content: what we paint

Content has to do with what we’re painting rather than how it’s painted, different artists can paint a cat in different styles, but the object is found in both images.

We run our content image through VGG-19 and collect its activation at a layer that’s not too deep but not too shallow in our network or else it’ll have the same pixel values of our image.

Let’s use a layer in the middle like the 4_2 layer (second part of the 4th layer).


We then feedforward pass our noise image G

We want to minimize the difference between our content in our content image and our generated image. We can measure this difference using a loss function like mean square error. We’ll call it J(G).

Mean square error loss function

Where l is a layer of the network, F is the style image, and P is the generated image. F^l and P^l are the feature representations of the images at layer l.

Then we check how similar their content is by checking if both have similar C and Gs activations functions; if they do, then they contain similar content.

Style: how we paint

Van Gogh can paint hundreds of different objects but all in a similar style.

To extract style, we’ll use the highlighted layers.

How does our network know what style is?

Our CNN will look for style by keeping an eye out for constant relationships of texture, colours and lines in-between multiple layers, creating a feature map.

Feature maps at layer l

we look for activation correlations between different features maps eg. say the red feature map represents colour and the yellow represents texture. If they’re strongly correlated, then where that specific texture is found on the image we’ll also find that correlated colour (e.g., where straight lines are on the image so, is the colour blue.)

Just because a layer picks up on one feature doesn’t mean the entire image has that feature. To ensure the entire image has that general style, we need to check multiple layers for that pattern.

To find correlations in different feature maps, we find the dot product of the feature maps activations. When we multiply the features in each channel, we get a gram matrix.

finding dot products of the activations to ger gram matrix

If the gram matrix is large, then those two feature maps are correlated.

The gram matrix finds correlations between feature maps regardless of what part of the image it’s looking at. This is important because style is throughout the image, not just in one specific area.

We’ll call our gram matrixes Gkk’

gram matrix for the Style image (S)
gram matrix for the Generated image (G)

We find the differences in the style of our images by using the mean square error loss function with the gram matrixes of the style image and the generated image.

the loss function for style

Overall loss = Content loss + Style loss

The goal of the algorithm is to minimize the loss functions of both style and content.

We combine the loss functions for style and content to get an overall loss function, which tells us how good our baby image is.

Overall loss function

But notice how we calculated their loss functions differently. To balance this out, we scale them differently with weights (Alpha and Beta). Typically we’d make the weight for style much heavier than content.

We can tweak these weights to have more content or style than the other. Say we make our content weight significantly heavier than style, then our resulting image would have much less style and focus more on content.

To minimize this loss, we’ll use gradient descent. We find the gradient of our loss function and apply backpropagation, changing and optimizing our image until we minimize our loss.

We keep on repeating this process until we can’t get our loss function to be any smaller; this is when we have a good baby image.

Executing Neural Style Transfer

  1. Find activations at 4_2 layer of our content image, when passing it through VGG-19
  2. Find gram matrixes and activations at multiple layers of our style image, when passing it through VGG-19
  3. Generate a random image
  4. Run random image through VGG-19, repeating 1&2 for the generated image, repeatedly run backpropagation until loss function is minimized.

Now, Ai can recreate artwork of some of the greatest artists in minutes!

They might not sell for thousands of dollars, nor do they have the human touch and the hours of effort that make them worthwhile. It proves that art has more to do with the artist than the art.

Machines can compute much faster than humans, and everything breaks down to simple calculations, even art. As computing power increases who knows what we’ll be able to create next!

Will continue to dive into deep learning, keep an eye out for my next article on YOLO.

I hope you learned something new and as always, feel free to reach out on LinkedIn or subscribe to my newsletter! 😊



Eesha Ulhaq

an archive of blogs from when i was 17 - was very often wrong