Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
451.
▲
Dataset for 22 years of arXiv citation links
(github.com/paperscape)
1 point
robjk
12 years ago
discuss
452.
▲
Rdfdiff -- Scalable Tool To Detect Changes in Billion Triple Data Sets
(github.com/paulhoule)
1 point
PaulHoule
13 years ago
discuss
453.
▲
A full-stack Last.fm 1k dataset insights page using Go/ClickHouse/React
(github.com/el10savio)
1 point
ugabuga
4 days ago
discuss
454.
▲
Show HN: Cohort Visualizer - A handy tool for browsing cohort datasets
(bslatkin.github.com)
1 point
bslatkin
14 years ago
discuss
455.
▲
Swedish Construction FAQ: 503 bilingual Q&A dataset, CC BY 4.0
(github.com/zaragoza-ab)
1 point
DecDEPO
2 months ago
discuss
456.
▲
Show HN: Fastdedup – Rust dataset deduplication (2:55 vs. 7:55 688MB vs. 22GB)
(wapplewhite4.github.io)
1 point
wapplewhite4
3 months ago
discuss
457.
▲
GABRIEL – turn messy qualitative corpora into analysis-ready datasets
(github.com/openai)
1 point
michaelsbradley
4 months ago
discuss
458.
▲
Show HN: Vietnam Elections (open, source-linked datasets and site)
(bamboo-filing-cabinet.github.io)
1 point
vietthan
4 months ago
discuss
459.
▲
The Guardian Headline Entailment Training Dataset
(github.com/daoudclarke)
1 point
daoudc
14 years ago
discuss
460.
▲
Fasttfidf: High-performance TF-IDF vectorization for large-scale text datasets
(github.com/purijs)
1 point
jspuri
5 months ago
discuss
461.
▲
Show HN: AI tool that walks citation graph and extracts data to create datasets
(github.com/eamag)
1 point
eamag
5 months ago
discuss
462.
▲
Training YOLO vision models on Kaggle datasets
(github.com/mfranzon)
1 point
walterbell
7 months ago
discuss
463.
▲
Show HN: Gaggle – A DuckDB extension for working with Kaggle datasets
1 point
habedi0
7 months ago
discuss
464.
▲
Show HN: I built a tool to sort a Northern Lights dataset for a CV model
(picsort.coolapso.sh)
1 point
coolapso
7 months ago
discuss
465.
▲
Show HN: Django PostgreSQL Anonymizer – prod → safe dev datasets (beta)
(github.com/CuriousLearner)
1 point
sanyam-khurana
8 months ago
discuss
466.
▲
A toolkit for improving the quality of your LeRobot datasets
(github.com/RoboticsData)
1 point
machinelearning
8 months ago
discuss
467.
▲
A new RAG algorithm to self-heal damaged datasets and query them on a graph
(github.com/iblameandrew)
1 point
scraper02
8 months ago
discuss
468.
▲
Show HN: Tensorpack a CLI tool for semantic discovery across datasets
1 point
AyodeleFikayomi
8 months ago
discuss
469.
▲
Procedural Reasoning Datasets
(github.com/open-thought)
1 point
t55
10 months ago
discuss
470.
▲
Reasoning Gym – Procedural RL reasoning datasets
(github.com/open-thought)
1 point
t55
10 months ago
discuss
471.
▲
Mochi Programming Language v0.6.0 – LINQ syntax for querying datasets
(github.com/mochilang)
1 point
scapbi
a year ago
discuss
472.
▲
Reasoning Gym: Procedural Dataset Generation for Reinforcement Learning
(github.com/open-thought)
1 point
starzmustdie
a year ago
discuss
473.
▲
Datasets Are All You Need (LLM Learns to Prompt from Data)
(github.com/intellectronica)
1 point
intellectronica
a year ago
discuss
474.
▲
A multi-view video behavior monitoring dataset of wild mammals in the Swiss Alps
(github.com/eceo-epfl)
1 point
moatmoat
a year ago
discuss
475.
▲
RKaggle: Bring Kaggle Datasets Straight into the R console
(github.com/benyamindsmith)
1 point
SuperMint
a year ago
discuss
476.
▲
Logic R1: Reproduce DeepSeek R1 Zero on 2K Logic Puzzle Dataset
(github.com/Unakar)
1 point
limoce
a year ago
discuss
477.
▲
Drawdata: Draw Datasets from Within Jupyter
(github.com/koaning)
1 point
yamrzou
a year ago
discuss
478.
▲
Facebook Uncommon Objects in 3D Dataset
(github.com/facebookresearch)
1 point
taikon
a year ago
discuss
479.
▲
LENS: A Leo Satellite Network Measurement Dataset
(github.com/clarkzjw)
1 point
teleforce
2 years ago
discuss
480.
▲
Transform and optimize datasets for fast AI model training
(github.com/Lightning-AI)
1 point
shcheklein
2 years ago
discuss
More