Skip to content

Commit c9f21c4

Browse files
committed
add. page. math - list and derivative.
1 parent dbe1db2 commit c9f21c4

File tree

75 files changed

+8695
-143
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

75 files changed

+8695
-143
lines changed

blog/config.yaml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,18 @@ markup:
1414
goldmark:
1515
renderer:
1616
unsafe: true
17+
extensions:
18+
passthrough:
19+
enable: true
20+
delimiters:
21+
block:
22+
- - '\['
23+
- '\]'
24+
- - '$$'
25+
- '$$'
26+
inline:
27+
- - '\('
28+
- '\)'
1729

1830
languages:
1931
en:
@@ -29,6 +41,7 @@ languages:
2941

3042
params:
3143
yandexMetrika: 107089384
44+
math: true
3245

3346
sidebar:
3447
# emoji: 🍥

blog/content/page/playlists/index.en.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,14 @@ menu:
2525
- 🏷️ Terms: scaling, reward hacking, world model, guardrail, AGI, anthropomorphism
2626
- 📊 Difficulty: intermediate
2727
- 📋 Prerequisites: AI basics
28+
29+
## 📐 Essential Mathematics
30+
31+
- **[Essential Mathematics — Overview](/en/p/math-essentials-overview/)**
32+
- 📋 Overview: which math areas are needed for AI/ML and why
33+
- 🏷️ Linear algebra, calculus, probability and statistics
34+
- 📊 Difficulty: basic
35+
- **[Math Analysis — Derivatives](/en/p/math-derivatives/)**
36+
- 📋 Derivative, gradient, chain rule — backbone of neural network training
37+
- 🏷️ Gradient descent, backpropagation
38+
- 📊 Difficulty: basic

blog/content/page/playlists/index.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,14 @@ menu:
2525
- 🏷️ Термины: масштабирование, reward hacking, world model, guardrail, AGI, антропоморфизм
2626
- 📊 Сложность: средняя
2727
- 📋 Необходимые знания: база по ИИ
28+
29+
## 📐 Необходимая математика
30+
31+
- **[Необходимая математика — обзор](/p/math-essentials-overview/)**
32+
- 📋 Обзор: какие разделы математики нужны для ИИ/МО и зачем
33+
- 🏷️ Линейная алгебра, анализ, вероятность и статистика
34+
- 📊 Сложность: базовая
35+
- **[Мат Анализ — Производные](/p/math-derivatives/)**
36+
- 📋 Производная, градиент, правило цепочки — основа обучения нейросетей
37+
- 🏷️ Градиентный спуск, backpropagation
38+
- 📊 Сложность: базовая
Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
---
2+
title: "Math Analysis — Derivatives"
3+
description: "Derivative, gradient, and chain rule — the backbone of neural network training"
4+
date: "2025-03-12"
5+
slug: "math-derivatives"
6+
tags:
7+
- Machine Learning
8+
- Mathematics
9+
---
10+
11+
Second article in the «Essential Mathematics» series — on derivatives, gradient, and chain rule. Without this you can't understand how neural networks learn.
12+
13+
## What is a derivative
14+
15+
### Simple explanation
16+
17+
**The speedometer** — that's essentially a derivative: how fast the distance traveled changes. Accelerating — speed goes up. Braking — it drops. So the derivative answers: "how much does one quantity change when you change another one a little?"
18+
19+
**A hill.** The steepness of a slope — "how far down you go when you take a step forward." Steep slope — large derivative. Gentle — small. Flat road — zero.
20+
21+
**A function graph.** If you have a graph \( y = f(x) \), the derivative at a point is the **slope** of the graph at that point. For a straight line \( y = kx + b \) the slope is the familiar coefficient \( k \). For a curve, each point has its own slope — that's the derivative.
22+
23+
- Graph steeply going up → positive derivative
24+
- Going down → negative derivative
25+
- Flat (top or bottom) → derivative equals zero
26+
27+
### Formal definition
28+
29+
The derivative of a function at a point shows the **rate of change**: how fast the function grows (or decreases) for a small shift in the argument.
30+
31+
For a single-variable function \( f(x) \), the derivative \( f'(x) \) is the limit of the ratio of the change in \( f \) to the change in \( x \) as the change goes to zero:
32+
33+
\[
34+
f'(x) = \lim_{\Delta x \to 0} \frac{f(x + \Delta x) - f(x)}{\Delta x}
35+
\]
36+
37+
Here \( \Delta x \) is the increment of the argument (a small step). In the examples below we use \( h \) for the same quantity.
38+
39+
Geometrically — the slope of the tangent line at \( x \):
40+
- \( f'(x) > 0 \) — function is increasing
41+
- \( f'(x) < 0 \) — function is decreasing
42+
- \( f'(x) = 0 \) — possible minimum or maximum
43+
44+
### Numerical example: linear function \( g(x) = 3x - 1 \)
45+
46+
For a straight line the slope is the same everywhere. At any point, e.g. \( x = 5 \):
47+
48+
\[
49+
\frac{g(5 + h) - g(5)}{h} = \frac{(3(5+h) - 1) - g(5)}{h} = \frac{14 + 3h - 14}{h} = \frac{3h}{h} = 3
50+
\]
51+
where \( g(5) = 3 \cdot 5 - 1 = 14 \).
52+
53+
For any \( h \neq 0 \) we get **3** — the derivative is constant, same as the coefficient of \( x \) in the line equation.
54+
55+
| \( x \) | \( g(x) \) | \( g(x+0.1) - g(x) \) | \( \frac{g(x+0.1)-g(x)}{0.1} \) |
56+
|---------|------------|------------------------|--------------|
57+
| 5 | 14 | 0.3 | 3 |
58+
| 10 | 29 | 0.3 | 3 |
59+
| -2 | -7 | 0.3 | 3 |
60+
61+
<div style="margin: 1em 0; aspect-ratio: 320/160; min-height: 160px; max-height: 320px; overflow: hidden; border-radius: 8px; background: #1a1a1a;">
62+
<iframe src="/games/derivative-graph-linear.html" style="width: 100%; height: 100%; border: none; display: block; background: #1a1a1a;" title="Graph of y = 3x − 1" scrolling="no"></iframe>
63+
</div>
64+
65+
### Numerical example: quadratic function \( f(x) = x^2 \)
66+
67+
Approximate the derivative at \( x = 2 \) using "change in \( f \) over change in \( x \)". Take a small step \( h \):
68+
69+
\[
70+
\frac{f(2 + h) - f(2)}{h} = \frac{(2+h)^2 - 4}{h}
71+
\]
72+
73+
**At \( x = 2 \)** (\( f(2) = 4 \)):
74+
75+
| \( h \) | \( f(2+h) \) | \( f(2+h) - f(2) \) | \( \frac{f(2+h)-f(2)}{h} \) |
76+
|--------|--------------|---------------------|-----------------------------|
77+
| 1 | 9 | 5 | 5 |
78+
| 0.1 | 4.41 | 0.41 | 4.1 |
79+
| 0.01 | 4.0401 | 0.0401 | 4.01 |
80+
| 0.001 | 4.0004… | 0.004… | 4.001 |
81+
82+
The ratio tends to **4**\( f'(2) = 4 \).
83+
84+
**At \( x = 4 \)** (\( f(4) = 16 \)):
85+
86+
\[
87+
\frac{f(4 + h) - f(4)}{h} = \frac{(4+h)^2 - 16}{h} = \frac{8h + h^2}{h} = 8 + h \to 8
88+
\]
89+
90+
| \( h \) | \( f(4+h) \) | \( f(4+h) - f(4) \) | \( \frac{f(4+h)-f(4)}{h} \) |
91+
|--------|--------------|---------------------|-----------------------------|
92+
| 1 | 25 | 9 | 9 |
93+
| 0.1 | 16.81 | 0.81 | 8.1 |
94+
| 0.01 | 16.0801 | 0.0801 | 8.01 |
95+
96+
The ratio tends to **8**\( f'(4) = 8 \). By \( (x^2)' = 2x \): at \( x = 2 \) we get 4, at \( x = 4 \) — 8. For a curve, the derivative is different at each point.
97+
98+
<div style="margin: 1em 0; aspect-ratio: 320/240; min-height: 240px; max-height: 480px; overflow: hidden; border-radius: 8px; background: #1a1a1a;">
99+
<iframe src="/games/derivative-graph-parabola.html" style="width: 100%; height: 100%; border: none; display: block; background: #1a1a1a;" title="Graph of y = x² and tangent" scrolling="no"></iframe>
100+
</div>
101+
102+
**Why it matters in ML:** training is minimization of the loss function. We need to know which direction to change the weights so that the loss decreases. The derivative points in the direction of increase; we go the opposite way so the loss drops.
103+
104+
## Partial derivative
105+
106+
A neural network has many weights — the loss depends on thousands or millions of variables. We need to know how the loss changes when we change **each** weight separately.
107+
108+
A **partial derivative** \( \frac{\partial f}{\partial x} \) — the derivative with respect to one variable, treating the others as constants.
109+
110+
Example: \( f(x, y) = x^2 + xy \)
111+
- \( \frac{\partial f}{\partial x} = 2x + y \)
112+
- \( \frac{\partial f}{\partial y} = x \)
113+
114+
## Gradient
115+
116+
The **gradient** \( \nabla f \) — the vector of all partial derivatives:
117+
118+
\[
119+
\nabla f = \left( \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \ldots, \frac{\partial f}{\partial x_n} \right)
120+
\]
121+
122+
The gradient points in the direction of **steepest increase** of the function. So \( -\nabla f \) is the direction of steepest **decrease**.
123+
124+
**Gradient descent** — iterative weight updates:
125+
126+
\[
127+
w_{new} = w_{old} - \alpha \cdot \frac{\partial L}{\partial w}
128+
\]
129+
130+
where \( \alpha \) is the learning rate (step size), \( L \) is the loss. We move in the direction opposite to the gradient so the loss decreases.
131+
132+
## Chain rule
133+
134+
A neural network is a chain of layers: each layer's output is fed to the next. The loss depends on the outputs, and those depend on the weights. To get \( \frac{\partial L}{\partial w} \), we need to «propagate» the derivative backward through this chain.
135+
136+
**Chain rule** for composed functions: if \( z = f(y) \) and \( y = g(x) \), then
137+
138+
\[
139+
\frac{dz}{dx} = \frac{dz}{dy} \cdot \frac{dy}{dx}
140+
\]
141+
142+
For many variables it works the same way: the derivative with respect to an earlier layer's weight is the product of derivatives along the path from that weight to the loss.
143+
144+
**Backpropagation** — the algorithm that does this efficiently: one backward pass through the network computes all needed gradients at once.
145+
146+
## Example: linear regression
147+
148+
Loss — mean squared error: \( L = \frac{1}{2}(y - \hat{y})^2 \), where \( \hat{y} = wx + b \).
149+
150+
\[
151+
\frac{\partial L}{\partial w} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial w} = -(y - \hat{y}) \cdot x
152+
\]
153+
154+
The larger the error \( (y - \hat{y}) \) and the larger \( x \), the more we adjust \( w \). Makes sense: a big error and an «important» input require a larger update.
155+
156+
## Summary
157+
158+
- **Derivative** — rate of change, direction of increase
159+
- **Partial derivative** — with respect to one variable, others fixed
160+
- **Gradient** — vector of partial derivatives, direction of steepest increase
161+
- **Chain rule** — how to compute derivatives along a chain of layers
162+
- **Gradient descent** — update weights in the \( -\nabla L \) direction
163+
164+
PyTorch, TensorFlow, etc. compute gradients automatically (autograd) — but understanding what happens under the hood helps when you hit «exploding gradients» or «model won't learn».

0 commit comments

Comments
 (0)