from bits to intelligence

how many lines of code do you need to train gpt 2?

let’s consider the default stack. from loss.backward() down to the hardware.

this totals up to about 50 million loc, give and take a couple million³.

while this gets you performance and reliability, it’s not exactly educational. understanding this fully would be impossible.

the 100,000 line machine learning stack

i propose an alternate stack, one aimed not at raw performance, but instead interpretability. consider this a from the transistor, but for ml⁴.

in total, this would be 64950 lines of code. but lets round that up to 100000.

that fits in a single repo. a single person could probably write all of this.

if you’re interested, start here.

yes i know gpt 2 was originally written in tensorflow ↩
all lines of code were collected with loc from the repos pytorch, python, gcc linux ↩
drivers, apis and the hdl for the gpus are closed source, so they’ve been omitted. but pulling ~ couple million lines out of a hat might not be too far off ↩
and made by someone who suffers from severe skill issues ↩
tiny-gpu ↩
you’ve heard of co-recursion, but have you heard of co-self-hosting? ↩
micropython ↩
tinygrad ↩