Most appropriate method for training convolutional neural networks with grayscale images?

WMU 97 :

Using the Keras API to train a convolutional neural network, I normally use 2D convolution layers when training using color png images (of input size (height, width,4)). However I now wish to train a network using grayscale png images and I am wondering what is the best way to do this?

I have come up with three possible methods:

  1. Input the image as if it were a color image.
  2. Input only the first channel of the image so that the input size is (height, width, 1).
  3. Use a 1D convolution of the first channel of the image so that the input size is (height, width).

Since my grayscale images in png format have the same values in the all RGBA channels, I thought that I would be able to decrease computation time whilst achieving the same test accuracies by training and testing using only the first color channel of the image as the data would essentially be the same. However this was not the case.

Using method one, I was able to achieve a testing accuracy of 91.95% with training taking ~3s per epoch.

With method two, an accuracy of 89.66% with ~2s per epoch.

With method three, an accuracy of 86.21% with <1s per epoch.

All networks were trained with the same architecture, kernel sizes and pool sizes so I'm wondering what could be causing the discrepancies in accuracies and which one should I trust the most?

desertnaut :

Your method #3 is clearly not equivalent with the other two, and not the way to tackle the problem (partially evident from the lower accuracy, too).

Now, in theory, your methods #1 and #2 should yield roughly similar results, which is not far from the case, according to the accuracy values you present.

None of the two methods is invalid. A possible explanation of the somewhat higher accuracy of #1 is that, although here you actually just repeat the info contained in a single channel x3, this practically serves as a kind of ensembling (amplified from the fact that the respective convolutional filters will start from different random initializations); this leads to more "sub-models" contributing to the output, hence to better performance, according to the general expectations for model ensembles. Of course, you should verify that this is the case by running multiple experiments and take the mean accuracy for each method (single-experiment results can always differ simply due to different random initializations).

Truth is, CNNs performance with single-channel images is a rather underexplored subject AFAIK. Do proceed with further experimentation yourself!

The inference times per image you report are consistent with the different approaches: 1D convolutions are faster than 2D ones, and processing of a single-channel image is also faster than a 3-channel one.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=382813&siteId=1