Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
61.
Using Safetensors with Flax (gilesthomas.com)
1 point
ibobev
2 days ago
discuss
62.
10Gb/s Ethernet: using mini-heatsinks with a 10GBASE-T SFP+ module (gilesthomas.com)
1 point
ibobev
19 days ago
discuss
63.
LLM from scratch (32l) – Interventions: updated instruction fine-tuning results (gilesthomas.com)
1 point
gpjt
2 months ago
discuss
64.
An LLM becomes more coherent as we train it (gilesthomas.com)
1 point
ibobev
2 months ago
discuss
65.
Interventions: Trying to train a better model in the cloud (gilesthomas.com)
1 point
ibobev
2 months ago
discuss
66.
Writing an LLM from scratch, part 32i – Interventions: what is in the noise? (gilesthomas.com)
1 point
gpjt
2 months ago
discuss
67.
Writing an LLM from scratch, part 32B – Interventions: gradient clipping (gilesthomas.com)
1 point
ibobev
4 months ago
discuss
68.
Writing an LLM from scratch, part 32c – Interventions: removing dropout (gilesthomas.com)
1 point
ibobev
4 months ago
discuss
69.
Writing an LLM from scratch, part 32d – Interventions: adding attention bias (gilesthomas.com)
1 point
ibobev
4 months ago
discuss
70.
Writing an LLM from scratch, part 32c – Interventions: removing dropout (gilesthomas.com)
1 point
gpjt
4 months ago
discuss
71.
Writing an LLM from scratch, part 32a – Interventions: training a baseline model (gilesthomas.com)
1 point
gpjt
4 months ago
discuss
72.
Getting a Custom PyTorch LLM onto the Hugging Face Hub (gilesthomas.com)
1 point
ibobev
4 months ago
discuss
73.
Getting a Custom PyTorch LLM onto the Hugging Face Hub (gilesthomas.com)
1 point
gpjt
4 months ago
discuss
74.
Writing an LLM from scratch, part 31 – the models are now on Hugging Face (gilesthomas.com)
1 point
ibobev
5 months ago
discuss
75.
Digging into the LLM-as-a-Judge Results (gilesthomas.com)
1 point
ibobev
5 months ago
discuss
76.
Digging into the LLM-as-a-Judge Results (gilesthomas.com)
1 point
ibobev
5 months ago
discuss
77.
Writing an LLM from scratch, part 30 – digging into the LLM-as-a-judge results (gilesthomas.com)
1 point
gpjt
5 months ago
discuss
78.
Writing an LLM from scratch, part 27 – what's left, and what's next? (gilesthomas.com)
1 point
gpjt
7 months ago
discuss
79.
Writing an LLM from scratch, part 24 – the transcript hack (gilesthomas.com)
1 point
gpjt
7 months ago
discuss
80.
Retro Language Models: Rebuilding Karpathy's RNN in PyTorch (gilesthomas.com)
1 point
ibobev
7 months ago
discuss
81.
Writing an LLM from scratch, part 23 – fine-tuning for classification (gilesthomas.com)
1 point
ibobev
7 months ago
discuss
82.
Writing an LLM from scratch, part 23 – fine-tuning for classification (gilesthomas.com)
1 point
gpjt
7 months ago
discuss
83.
Revisiting Karpathy's 'The Unreasonable Effectiveness of RNNs' (gilesthomas.com)
1 point
ibobev
8 months ago
discuss
84.
Writing an LLM from scratch, part 21 – perplexed by perplexity (gilesthomas.com)
1 point
ibobev
8 months ago
discuss
85.
Writing an LLM from scratch, part 21 – perplexed by perplexity (gilesthomas.com)
1 point
gpjt
8 months ago
discuss
86.
How Do LLMs Work? (gilesthomas.com)
1 point
ibobev
9 months ago
discuss
87.
The fixed length bottleneck and the feed forward network (gilesthomas.com)
1 point
gpjt
10 months ago
discuss
88.
Writing an LLM from scratch, part 16 – layer normalisation (gilesthomas.com)
1 point
gpjt
a year ago
discuss
89.
Writing an LLM from scratch, part 14 – the complexity of self-attention at scale (gilesthomas.com)
1 point
gpjt
a year ago
discuss
90.
Adding /Llms.txt (gilesthomas.com)
1 point
gpjt
a year ago
discuss
More