Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
61.
▲
Using Safetensors with Flax
(gilesthomas.com)
1 point
ibobev
2 days ago
discuss
62.
▲
10Gb/s Ethernet: using mini-heatsinks with a 10GBASE-T SFP+ module
(gilesthomas.com)
1 point
ibobev
19 days ago
discuss
63.
▲
LLM from scratch (32l) – Interventions: updated instruction fine-tuning results
(gilesthomas.com)
1 point
gpjt
2 months ago
discuss
64.
▲
An LLM becomes more coherent as we train it
(gilesthomas.com)
1 point
ibobev
2 months ago
discuss
65.
▲
Interventions: Trying to train a better model in the cloud
(gilesthomas.com)
1 point
ibobev
2 months ago
discuss
66.
▲
Writing an LLM from scratch, part 32i – Interventions: what is in the noise?
(gilesthomas.com)
1 point
gpjt
2 months ago
discuss
67.
▲
Writing an LLM from scratch, part 32B – Interventions: gradient clipping
(gilesthomas.com)
1 point
ibobev
4 months ago
discuss
68.
▲
Writing an LLM from scratch, part 32c – Interventions: removing dropout
(gilesthomas.com)
1 point
ibobev
4 months ago
discuss
69.
▲
Writing an LLM from scratch, part 32d – Interventions: adding attention bias
(gilesthomas.com)
1 point
ibobev
4 months ago
discuss
70.
▲
Writing an LLM from scratch, part 32c – Interventions: removing dropout
(gilesthomas.com)
1 point
gpjt
4 months ago
discuss
71.
▲
Writing an LLM from scratch, part 32a – Interventions: training a baseline model
(gilesthomas.com)
1 point
gpjt
4 months ago
discuss
72.
▲
Getting a Custom PyTorch LLM onto the Hugging Face Hub
(gilesthomas.com)
1 point
ibobev
4 months ago
discuss
73.
▲
Getting a Custom PyTorch LLM onto the Hugging Face Hub
(gilesthomas.com)
1 point
gpjt
4 months ago
discuss
74.
▲
Writing an LLM from scratch, part 31 – the models are now on Hugging Face
(gilesthomas.com)
1 point
ibobev
5 months ago
discuss
75.
▲
Digging into the LLM-as-a-Judge Results
(gilesthomas.com)
1 point
ibobev
5 months ago
discuss
76.
▲
Digging into the LLM-as-a-Judge Results
(gilesthomas.com)
1 point
ibobev
5 months ago
discuss
77.
▲
Writing an LLM from scratch, part 30 – digging into the LLM-as-a-judge results
(gilesthomas.com)
1 point
gpjt
5 months ago
discuss
78.
▲
Writing an LLM from scratch, part 27 – what's left, and what's next?
(gilesthomas.com)
1 point
gpjt
7 months ago
discuss
79.
▲
Writing an LLM from scratch, part 24 – the transcript hack
(gilesthomas.com)
1 point
gpjt
7 months ago
discuss
80.
▲
Retro Language Models: Rebuilding Karpathy's RNN in PyTorch
(gilesthomas.com)
1 point
ibobev
7 months ago
discuss
81.
▲
Writing an LLM from scratch, part 23 – fine-tuning for classification
(gilesthomas.com)
1 point
ibobev
7 months ago
discuss
82.
▲
Writing an LLM from scratch, part 23 – fine-tuning for classification
(gilesthomas.com)
1 point
gpjt
7 months ago
discuss
83.
▲
Revisiting Karpathy's 'The Unreasonable Effectiveness of RNNs'
(gilesthomas.com)
1 point
ibobev
8 months ago
discuss
84.
▲
Writing an LLM from scratch, part 21 – perplexed by perplexity
(gilesthomas.com)
1 point
ibobev
8 months ago
discuss
85.
▲
Writing an LLM from scratch, part 21 – perplexed by perplexity
(gilesthomas.com)
1 point
gpjt
8 months ago
discuss
86.
▲
How Do LLMs Work?
(gilesthomas.com)
1 point
ibobev
9 months ago
discuss
87.
▲
The fixed length bottleneck and the feed forward network
(gilesthomas.com)
1 point
gpjt
10 months ago
discuss
88.
▲
Writing an LLM from scratch, part 16 – layer normalisation
(gilesthomas.com)
1 point
gpjt
a year ago
discuss
89.
▲
Writing an LLM from scratch, part 14 – the complexity of self-attention at scale
(gilesthomas.com)
1 point
gpjt
a year ago
discuss
90.
▲
Adding /Llms.txt
(gilesthomas.com)
1 point
gpjt
a year ago
discuss
More