Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
1.
The maths you need to start understanding LLMs (gilesthomas.com)
616 points
gpjt
9 months ago
120 comments
2.
LLM from scratch, part 28 – training a base model from scratch on an RTX 3090 (gilesthomas.com)
540 points
gpjt
6 months ago
121 comments
3.
Writing an LLM from scratch, part 8 – trainable self-attention (gilesthomas.com)
380 points
gpjt
a year ago
31 comments
4.
Writing an LLM from scratch, part 13 – attention heads are dumb (gilesthomas.com)
351 points
gpjt
a year ago
67 comments
5.
It’s still worth blogging in the age of AI (gilesthomas.com)
333 points
gpjt
a year ago
222 comments
6.
The benefits of learning in public (gilesthomas.com)
311 points
gpjt
a year ago
97 comments
7.
Writing an LLM from scratch, part 22 – training our LLM (gilesthomas.com)
254 points
gpjt
8 months ago
10 comments
8.
10Gb/s Ethernet: what I did to get it working in my home (gilesthomas.com)
232 points
gpjt
a month ago
177 comments
9.
Writing an LLM from scratch, part 10 – dropout (gilesthomas.com)
90 points
gpjt
a year ago
8 comments
10.
Writing an LLM from scratch, part 20 – starting training, and cross entropy loss (gilesthomas.com)
41 points
gpjt
8 months ago
3 comments
11.
Using DistributedDataParallel to train a base model from scratch in the cloud (gilesthomas.com)
10 points
ibobev
5 months ago
discuss
12.
Writing an LLM from scratch, part 17 – the feed-forward network (gilesthomas.com)
8 points
gpjt
10 months ago
discuss
13.
IT headhunters considered harmful (gilesthomas.com)
7 points
j_baker
16 years ago
1 comment
14.
Writing an LLM from scratch, part 32h – Interventions: full fat float32 (gilesthomas.com)
7 points
gpjt
2 months ago
discuss
15.
Writing an LLM from scratch, part 15 – from context vectors to logits (gilesthomas.com)
7 points
gpjt
a year ago
discuss
16.
Writing an LLM from scratch, part 32f – Interventions: weight decay (gilesthomas.com)
6 points
gpjt
2 months ago
discuss
17.
Writing an LLM from scratch, part 32d – Interventions: adding attention bias (gilesthomas.com)
6 points
gpjt
4 months ago
discuss
18.
LLM from scratch, part 33 – what I learned from the appendices (gilesthomas.com)
5 points
gpjt
a month ago
discuss
19.
Pam-unshare: a PAM module that switches into a PID namespace (gilesthomas.com)
5 points
gpjt
10 years ago
discuss
20.
Writing an LLM from scratch, part 26 – evaluating the fine-tuned model (gilesthomas.com)
4 points
gpjt
7 months ago
discuss
21.
Writing an LLM from scratch, part 9 – causal attention (gilesthomas.com)
4 points
gpjt
a year ago
discuss
22.
Does #EUVAT make charging Bitcoin impossible for EU digital services businesses? (gilesthomas.com)
3 points
gpjt
11 years ago
discuss
23.
10Gb/s Ethernet: using mini-heatsinks with a 10GBASE-T SFP+ module (gilesthomas.com)
3 points
gpjt
18 days ago
discuss
24.
How an LLM becomes more coherent as we train it (gilesthomas.com)
3 points
gpjt
2 months ago
discuss
25.
Writing an LLM from scratch, part 32e – Interventions: the learning rate (gilesthomas.com)
3 points
ibobev
3 months ago
discuss
26.
Writing an LLM from scratch, part 32e – Interventions: the learning rate (gilesthomas.com)
3 points
gpjt
3 months ago
discuss
27.
Writing an LLM from scratch, part 32a – Interventions: training a baseline model (gilesthomas.com)
3 points
ibobev
4 months ago
discuss
28.
Retro Language Models: Rebuilding Karpathy's RNN in PyTorch (gilesthomas.com)
3 points
gpjt
7 months ago
discuss
29.
Leaving PythonAnywhere (gilesthomas.com)
3 points
gpjt
a year ago
discuss
30.
Writing an LLM from scratch, part 12 – multi-head attention (gilesthomas.com)
3 points
gpjt
a year ago
discuss
More