You can distinguish slow convergence from divergence by the graph of the function:
if the function changes more and more slowly, but even after 100–200 iterations it does not show significant changes, then the training has reached a plateau. It is necessary to either stop it or change the training step. Perhaps the algorithm converges, just slowly;
If the values of the loss function begin to grow sharply and tend to infinity, this indicates that the algorithm is diverging and that it will never converge.
Read also
“For me, Data Science is a hobby”: Skillfactory graduate senegal telegram data on the path from theology to IT, studies and a dream project
How to increase the speed of convergence
The higher the convergence rate, the faster the model will learn and start producing accurate results. So this rate is one of the most important parameters in training.
To speed up convergence
data normalization - allows you to reduce the number of gradient steps to find the minimum of the loss function;
modifications of the algorithm - for example, instead of the usual gradient descent, stochastic descent is often used, and this can speed up convergence by tens of times;
changing hyperparameters - which ones exactly, depends on the algorithm and the data. For example, you can change the learning rate, the batch size - the data package, and so on.
The simpler the algorithm, the faster it learns and reaches convergence — but accuracy may suffer. Complex algorithms with many parameters learn much more slowly, but often produce better results.