So, I'm kind of new to working with Neural Networks (I use Keras w/ the TensorFlow backend). My background in math spans just deep enough to understand the concept behind Gradient Descent optimization. I'm not confident enough to work with numbers and symbolic math.
I was recently reading about PSO (another optimization technique called Particle Swarm Optimization). I've been building a CNN to classify lung disease types. So far, I've understood the following:
Gradient Decent:
- Minimizes the cost function (finds a minimum of the cost function)
- Starts at some randomly initialized position and looks for the steepest gradient
- Cost function must be differentiable (slopes = gradient)
- Usually settles down in one minimum which could be a local or global minimum
I understand Gradient Descent well but am confused on why PSO is a simpler approach. Here is what I know about PSO:
Particle Swarm Optimization:
- Minimizes cost function
- Multiple particles start at different locations on this cost function
- Particles look for minimums but each particle is affected by the swarm
- This means particles don't settle into a single local minimum and can move out of minimums based on swarm behavior
- Improves the chance of finding a global minimum
- Cost function DOES NOT have to be differentiable?
Why does this make sense? If the particles (my understanding of a particle is an instance of a model with randomly initialized weights, etc, which means it has a different position on the cost function). This essentially makes more model instances to train vs. gradient descent which trains one. Correct my understanding of a particle if what I just said is utter nonsense...
Why does the cost function not have to be differentiable? The particles are looking for a minimum and therefore need to go in direction of the steepest gradient downward.
How can one implement PSO in a CNN? I was looking at a library called Pyswarms which left me further frustrated since Pyswarms doesn't seem to be usable as an optimizer for CNNs.
(P.S. I am visualizing a cost function as a 3 variable function).