Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
1.
Show HN: TraceRoot – Open-source agentic debugging for distributed services (github.com/traceroot-ai)
40 points
xinweihe
10 months ago
22 comments
2.
TraceRoot: Find the Root Cause in Your Code's Trace (github.com/traceroot-ai)
2 points
djhu9
10 months ago
discuss
3.
Show HN: "htop" for PyTorch training, see stalls, memory and step time live
3 points
traceopt
5 months ago
discuss
4.
Ask HN: Why does single-node DDP sometimes get slower with more GPUs?
2 points
traceopt-ai
4 months ago
discuss
5.
Ask HN: Should training bottleneck detection be a product or just a feature?
1 point
traceopt-ai
3 months ago
discuss
6.
Show HN: Distributed Training Observability for PyTorch (TraceML) (github.com/traceopt-ai)
3 points
traceml-ai
4 months ago
discuss
7.
Show HN: Finding stragglers in multi-GPU PyTorch (DDP) training (github.com/traceopt-ai)
1 point
traceopt-ai
4 months ago
1 comment
8.
Show HN: TraceML, a tool to trace live memory usage in PyTorch training (github.com/traceopt-ai)
1 point
traceopt-ai
8 months ago
1 comment