Pixel Pal (PART 5) : Data augmentation

Posted on Sat 25 April 2020 in Linux, HiDPI, PixelPal

Pixel Pal is a project I have started whose purpose is to bring new life to old icon themes. Many old icon themes are in raster format and are not available for HiDPI screens (their resolution are too small). Currently the only way to handle old icon themes is to resample them by the desired factor, but the icons look pixelated. The goal of this project is to use deep learning to upsample small icons into icons for HiDPI screens.

This is the fifth article about this project. Here are the other four:

One of the big challenges in deep learning is getting enough data. Neural networks can easily have millions of parameters and it requires large quantity of data to fine tune these parameters. Getting enough data is a challenge but what makes it worse that data itself is not sufficient, you need labelled data, and if possible quality labelled data.

When you don't have enough data you have a problem. There are many ways to tackle this issues:

Reducing the number of parameters to learn. One such way is to reuse the weights (parameters) from an already learned task and then only retrain parts of the model. This is a form of transfer learning. For instance VGG-16 is taught to do image classification for general public purpose. You can take this model and only retrain parts of it, assuming the other parts are good enough to generalise to your task.
Training parts of the model on the input data alone using auto-encoders. This is fairly technical. The idea is that you build a model which compresses the input data and then decompress the data back with no loss of quality. Such a model is easier to train as you usually have lots of input data (but not necessarily output data). Then you use the trained auto-encoder and you split in two : the compression network and the decompression network. You can then work with the compressed features coming out of the compression network as a replacement of the original input data. The benefit here is that the compressed features are smaller and as a result the network you need to train mapping the compressed features to the output data is smaller and thus simpler to teach.
Use data augmentation. This method consists of building new data from the currently existing one. This method is especially good as a way to produce neural network that generalise well. Through data augmentation you can tell what a neural network should not rely on. For instance one way to augment image data is to flip the image horizontally and vertically, this way you can easily quadruple the amount of data available and tell the network it should be not rely on the image on being a certain way like having shadows at the bottom.

For the purpose of this project I will rely on data augmentation. My training data set is around 9 000 images, which is fairly good for networks having 10 of thousands of parameters. But through data augmentation my neural network can be better and by simply flipping horizontally and vertically I can easily get around 36 000 images. Thus allowing me to investigate deeper neural networks.

modification	no horizontal flip	horizontal flip
no vertical flip
vertical flip