Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
451.
Dataset for 22 years of arXiv citation links (github.com/paperscape)
1 point
robjk
12 years ago
discuss
452.
Rdfdiff -- Scalable Tool To Detect Changes in Billion Triple Data Sets (github.com/paulhoule)
1 point
PaulHoule
13 years ago
discuss
453.
A full-stack Last.fm 1k dataset insights page using Go/ClickHouse/React (github.com/el10savio)
1 point
ugabuga
4 days ago
discuss
454.
Show HN: Cohort Visualizer - A handy tool for browsing cohort datasets (bslatkin.github.com)
1 point
bslatkin
14 years ago
discuss
455.
Swedish Construction FAQ: 503 bilingual Q&A dataset, CC BY 4.0 (github.com/zaragoza-ab)
1 point
DecDEPO
2 months ago
discuss
456.
Show HN: Fastdedup – Rust dataset deduplication (2:55 vs. 7:55 688MB vs. 22GB) (wapplewhite4.github.io)
1 point
wapplewhite4
3 months ago
discuss
457.
GABRIEL – turn messy qualitative corpora into analysis-ready datasets (github.com/openai)
1 point
michaelsbradley
4 months ago
discuss
458.
Show HN: Vietnam Elections (open, source-linked datasets and site) (bamboo-filing-cabinet.github.io)
1 point
vietthan
4 months ago
discuss
459.
The Guardian Headline Entailment Training Dataset (github.com/daoudclarke)
1 point
daoudc
14 years ago
discuss
460.
Fasttfidf: High-performance TF-IDF vectorization for large-scale text datasets (github.com/purijs)
1 point
jspuri
5 months ago
discuss
461.
Show HN: AI tool that walks citation graph and extracts data to create datasets (github.com/eamag)
1 point
eamag
5 months ago
discuss
462.
Training YOLO vision models on Kaggle datasets (github.com/mfranzon)
1 point
walterbell
7 months ago
discuss
463.
Show HN: Gaggle – A DuckDB extension for working with Kaggle datasets
1 point
habedi0
7 months ago
discuss
464.
Show HN: I built a tool to sort a Northern Lights dataset for a CV model (picsort.coolapso.sh)
1 point
coolapso
7 months ago
discuss
465.
Show HN: Django PostgreSQL Anonymizer – prod → safe dev datasets (beta) (github.com/CuriousLearner)
1 point
sanyam-khurana
8 months ago
discuss
466.
A toolkit for improving the quality of your LeRobot datasets (github.com/RoboticsData)
1 point
machinelearning
8 months ago
discuss
467.
A new RAG algorithm to self-heal damaged datasets and query them on a graph (github.com/iblameandrew)
1 point
scraper02
8 months ago
discuss
468.
Show HN: Tensorpack a CLI tool for semantic discovery across datasets
1 point
AyodeleFikayomi
8 months ago
discuss
469.
Procedural Reasoning Datasets (github.com/open-thought)
1 point
t55
10 months ago
discuss
470.
Reasoning Gym – Procedural RL reasoning datasets (github.com/open-thought)
1 point
t55
10 months ago
discuss
471.
Mochi Programming Language v0.6.0 – LINQ syntax for querying datasets (github.com/mochilang)
1 point
scapbi
a year ago
discuss
472.
Reasoning Gym: Procedural Dataset Generation for Reinforcement Learning (github.com/open-thought)
1 point
starzmustdie
a year ago
discuss
473.
Datasets Are All You Need (LLM Learns to Prompt from Data) (github.com/intellectronica)
1 point
intellectronica
a year ago
discuss
474.
A multi-view video behavior monitoring dataset of wild mammals in the Swiss Alps (github.com/eceo-epfl)
1 point
moatmoat
a year ago
discuss
475.
RKaggle: Bring Kaggle Datasets Straight into the R console (github.com/benyamindsmith)
1 point
SuperMint
a year ago
discuss
476.
Logic R1: Reproduce DeepSeek R1 Zero on 2K Logic Puzzle Dataset (github.com/Unakar)
1 point
limoce
a year ago
discuss
477.
Drawdata: Draw Datasets from Within Jupyter (github.com/koaning)
1 point
yamrzou
a year ago
discuss
478.
Facebook Uncommon Objects in 3D Dataset (github.com/facebookresearch)
1 point
taikon
a year ago
discuss
479.
LENS: A Leo Satellite Network Measurement Dataset (github.com/clarkzjw)
1 point
teleforce
2 years ago
discuss
480.
Transform and optimize datasets for fast AI model training (github.com/Lightning-AI)
1 point
shcheklein
2 years ago
discuss
More