Convolutions
These are notes from lesson 8 of Fast AI Practical Deep Learning for Coders.
Recreate the convolutions in a spreadsheet which underpins the discussion below.
1. The Intuition Behind CNNs
1.1. Convolutional Layers
Convolutions slide a window of numbers, say 3x3, across our original image.
Depending on the values in the filter, it will be able to pick out different features like horizontal or vertical edges. Subsequent layers can combine these into more sophisticated layers, like corners. These can eventually be combined to detect complex features of the image.
An interactive example of this to see the intuition behind the sliding window is here.
1.2. Max Pooling
This is a technique to reduce the size of the input tensor.
A 2x2 max pool layer slides a 2x2 filter over the input and replaces each value with the max of the 4 values in the image.
Nowadays, using the stride length is generally preferred over max pool layers.
1.3. Stride Length
An alternative technique to reduce the size of the input is to skip pixels when we slide our filter over the image.
For example, a stride=2
convolution would apply to every second pixel and therefore halves the image size in each axis, having the same effect as a 2x2 max pool.
1.4. Dropout Layers
As a regularisation technique to make sure the model is not overly reliant on any single pixel or region, we can add a dropout layer.
Conceptually, this is the same as initialisng a random tensor the same size as the input and masking the input based on whether the random weight is above a threshold.
1.5. Final Dense Layer
We eventually want to reduce our input image size to output a tensor with one value per class.
One approach is to apply a dense layer when the image has been reduced “enough”, taking the dot product between the reduced image tensor and the dense layer. This is again deprecated in favour of the next approach…
1.6. Average Pooling
Nowadays, we use stride=2 convolutions until we get a small (7x7) tensor, then apply a single average pool layer to it.
This 7x7 tensor (for a bear detector, say) effectively gives a value quantifying “is there a bear in this part of the image?”. It then takes the average of all of these to determine if there is a bear in the overall photo.
This works fine if the bear occupies most of the photo, but less well if the bear occupies a small region in the corner of the photo. So the details of the model depend on the use case. If we want ot be able to detect small bears in the corner, max pooling would work better here.
Concat pool is a hybrid approach which does the max pool AND the average pool and concatentates the two results together.
2. Convolutions From Different Viewpoints
Convolutions can be reframed in different ways, as matrix multiplications or as systems of linear equations.
This article is a a helpful exploration of the topic.
3. Assorted Thoughts From Jeremy
A summary of questions to end the course.
- Read Meta Learning.
- Don’t try to know everything. Pick a bit that you’re interested and dig in. You’ll start to recognise the same ideas cropping up with slight tweaks.
- Does success in deep learning boil down to more compute power? No, we can be smarter about our approach. Also pick your problems to be ones that you can actually manage with smaller compute resources.
- Dragonbox Algebra 5+ can teach little kids algebra.
- Turning a model into a startup. The key to a legitimate business venture is to solve a legitimate problem, and one that people will pay you to solve. Start with the problem, not the prototype. The LEan Startup by Eric Ries: create the MVP as quick as possible (and fake the solution) then gradually make it “less fake” as more people use it (and pay you).
- Make the things you want to do easier, then you’ll want to do them more.