Gradient descent is an algorithm for finding values of parameters w and b that minimize the cost function J.

\; w &= w - \alpha \frac{\partial J(w,b)}{\partial w} \; \newline b &= b - \alpha \frac{\partial J(w,b)}{\partial b} \newline \rbrace \end{align*}$$ where, parameters $w$, $b$ are updated simultaneously. $\alpha$ is the learning rate. - Gradient:

\begin{align}
\frac{\partial J(w,b)}{\partial w} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})x^{(i)} \tag{4}\
\frac{\partial J(w,b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)}) \tag{5}\
\end{align}

- Learning Rate: How small changes are. - ![image.png](https://img.ynchen.me/2023/10/bc35ef00e5d71d24c189bf68de869b1a.webp) - Convex Function: Bowl shape ## Learning Rate - Draw a graph of the cost function to see - if it converge (**done**) - if it has some up and down (**bad learning rate**) ## Stochastic Gradient Descent Stochastic Gradient Descent (SGD) is an optimization algorithm commonly used in machine learning and deep learning for minimizing the loss function. It is a variation of the standard Gradient Descent algorithm that uses only a single or a small random subset of the data (mini-batch) to compute the gradient at each step.