Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
1.
▲
The maths you need to start understanding LLMs
(gilesthomas.com)
616 points
gpjt
9 months ago
120 comments
2.
▲
LLM from scratch, part 28 – training a base model from scratch on an RTX 3090
(gilesthomas.com)
540 points
gpjt
6 months ago
121 comments
3.
▲
Writing an LLM from scratch, part 8 – trainable self-attention
(gilesthomas.com)
380 points
gpjt
a year ago
31 comments
4.
▲
Writing an LLM from scratch, part 13 – attention heads are dumb
(gilesthomas.com)
351 points
gpjt
a year ago
67 comments
5.
▲
It’s still worth blogging in the age of AI
(gilesthomas.com)
333 points
gpjt
a year ago
222 comments
6.
▲
The benefits of learning in public
(gilesthomas.com)
311 points
gpjt
a year ago
97 comments
7.
▲
Writing an LLM from scratch, part 22 – training our LLM
(gilesthomas.com)
254 points
gpjt
8 months ago
10 comments
8.
▲
10Gb/s Ethernet: what I did to get it working in my home
(gilesthomas.com)
232 points
gpjt
a month ago
177 comments
9.
▲
Writing an LLM from scratch, part 10 – dropout
(gilesthomas.com)
90 points
gpjt
a year ago
8 comments
10.
▲
Writing an LLM from scratch, part 20 – starting training, and cross entropy loss
(gilesthomas.com)
41 points
gpjt
8 months ago
3 comments
11.
▲
Using DistributedDataParallel to train a base model from scratch in the cloud
(gilesthomas.com)
10 points
ibobev
5 months ago
discuss
12.
▲
Writing an LLM from scratch, part 17 – the feed-forward network
(gilesthomas.com)
8 points
gpjt
10 months ago
discuss
13.
▲
IT headhunters considered harmful
(gilesthomas.com)
7 points
j_baker
16 years ago
1 comment
14.
▲
Writing an LLM from scratch, part 32h – Interventions: full fat float32
(gilesthomas.com)
7 points
gpjt
2 months ago
discuss
15.
▲
Writing an LLM from scratch, part 15 – from context vectors to logits
(gilesthomas.com)
7 points
gpjt
a year ago
discuss
16.
▲
Writing an LLM from scratch, part 32f – Interventions: weight decay
(gilesthomas.com)
6 points
gpjt
2 months ago
discuss
17.
▲
Writing an LLM from scratch, part 32d – Interventions: adding attention bias
(gilesthomas.com)
6 points
gpjt
4 months ago
discuss
18.
▲
LLM from scratch, part 33 – what I learned from the appendices
(gilesthomas.com)
5 points
gpjt
a month ago
discuss
19.
▲
Pam-unshare: a PAM module that switches into a PID namespace
(gilesthomas.com)
5 points
gpjt
10 years ago
discuss
20.
▲
Writing an LLM from scratch, part 26 – evaluating the fine-tuned model
(gilesthomas.com)
4 points
gpjt
7 months ago
discuss
21.
▲
Writing an LLM from scratch, part 9 – causal attention
(gilesthomas.com)
4 points
gpjt
a year ago
discuss
22.
▲
Does #EUVAT make charging Bitcoin impossible for EU digital services businesses?
(gilesthomas.com)
3 points
gpjt
11 years ago
discuss
23.
▲
10Gb/s Ethernet: using mini-heatsinks with a 10GBASE-T SFP+ module
(gilesthomas.com)
3 points
gpjt
18 days ago
discuss
24.
▲
How an LLM becomes more coherent as we train it
(gilesthomas.com)
3 points
gpjt
2 months ago
discuss
25.
▲
Writing an LLM from scratch, part 32e – Interventions: the learning rate
(gilesthomas.com)
3 points
ibobev
3 months ago
discuss
26.
▲
Writing an LLM from scratch, part 32e – Interventions: the learning rate
(gilesthomas.com)
3 points
gpjt
3 months ago
discuss
27.
▲
Writing an LLM from scratch, part 32a – Interventions: training a baseline model
(gilesthomas.com)
3 points
ibobev
4 months ago
discuss
28.
▲
Retro Language Models: Rebuilding Karpathy's RNN in PyTorch
(gilesthomas.com)
3 points
gpjt
7 months ago
discuss
29.
▲
Leaving PythonAnywhere
(gilesthomas.com)
3 points
gpjt
a year ago
discuss
30.
▲
Writing an LLM from scratch, part 12 – multi-head attention
(gilesthomas.com)
3 points
gpjt
a year ago
discuss
More