Back

/ 4 min read

My Messy Journey Building Tensors from Scratch

My background

It started randomly, like most detours in my life. Back in uni, I was knee-deep in a degree that sounded good on paper but left me staring at screens, hating every equation. Math? Forgot about it after high school teachers turned it into a nightmare—stressful pop quizzes, zero explanations. First MathAcademy, heavy math, a lot of exercises, but no programming exercises. I stopped learning theory (for a few weeks) and dove into programming instead. Self-taught, trial-by-fire. Deep down, those neural net tutorials kept nagging: “What’s really happening under the hood with backprop?”

I can hack together models with PyTorch, feeling like a tourist. “Autograd just works,” I’d say. But I wanted to own it. To feel the click of understanding, not just copy-paste. One late night scrolling X, seeing folks share their from-scratch frameworks, envy hit. I wanted that win too. A good example of that is chibigrad by @sumitdotml.

Now

Fast-forward to the beginning of September. Enough. I cracked open a notebook, sketched a Tensor class on a whim. No grand plan, just “make addition backprop.” With a little help of others work and a chat with LLM, a simple chain worked. Gradients flowed. I urged LLM to not output code for me. I wanted to learn. Since I was so focused on learning I turned off also Cursor Tab cause it was freaking annoying.

I’ve been posting snippets on X—mostly for accountability. Seeing replies, “Tried this, failed that,” keeps me going. No more fading interest. After ~20 hours (and one fried brain), here’s my raw take. Learning math is worth it. Learning backprop was worth it. Kinda obvious, but I need to repeat that to myself from time to time.

Review

Building autograd from scratch? Brain-bender at first, but the gradient flow “aha” moments make it gold. Worth every hour if you crave owning the math behind DL.

Bullet-pointing the chaos, clarity and math nuggets that stuck.

  • Chain rule revival: Seeing the output in a console. That recursive Lθ=Lznznzn1z1θ\frac{\partial L}{\partial \theta} = \frac{\partial L}{\partial z_n} \cdot \frac{\partial z_n}{\partial z_{n-1}} \cdot \ldots \cdot \frac{\partial z_1}{\partial \theta} in action? I had NO derivatives at school. Test: z=(x2+y)2zx=4x(x2+y)z = (x^2 + y)^2 \rightarrow \frac{\partial z}{\partial x} = 4x(x^2 + y). Nailed 56 for x=2,y=3x=2, y=3. Felt like reclaiming lost territory.
  • Op-by-op wins: No overwhelm—add first (zx=1\frac{\partial z}{\partial x}=1, zy=1\frac{\partial z}{\partial y}=1), then ReLU (ReLUx=1\frac{\partial \text{ReLU}}{\partial x} = 1 if x>0x>0 else 00). Clear staircase: forward graphs it, backward multiplies Jacobians. Pow tripped me ((xy)x=yxy1\frac{\partial(x^y)}{\partial x} = y x^{y-1}), but deriving it fresh? Therapeutic.
  • X crowd boost: Posted my first backward fail; replies poured in—“Visited set, dummy!” Community’s zero-judgment, all “what’s your next op?” vibes. Motivates like seeing MathAcademy streaks.
  • Hooks you deep: Set “one deriv per day” goal. Gamified with tests: XOR MLP hit 98% accuracy. Missed a day debugging matmul shapes? Back stronger. Plus, rediscovered broadcasting—NumPy’s silent hero for vectorized (Wx)W=xgrad\frac{\partial(Wx)}{\partial W} = x \otimes \text{grad}.
  • Shape shame: Matmul broadcasting? ∂(AB)/∂A needs reshapes I forgot. Hours lost to “dimensions mismatch.” And time? If you’re not solo-coding marathons, it drags—feels like those old math quizzes, but self-inflicted.
  • TDD helped: I’m a big fan of this approach. I constantly set the rules: first test, understand the math on paper, try to connect the dots. No code implementation. Once I felt like I’m slowly getting it, I started adding code. First asking LLM for pseudocode and explanations. Then looking at other autograds. I got confident in a few days that another operation is within my reach.

From chain rule chains to ReLU’s, it’s less “hate subject” now, more “build with it.”

Useful Resources

Closing words

Random urges pull you back sometimes. That old love for tinkering, buried under “too late” excuses. But nah—too late for what? Code a tensor today; tomorrow, it powers your net.