Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
151.
▲
Dataset of Linus Torvalds' rants ranked by hate
(github.com/corollari)
42 points
fctorial
5 years ago
17 comments
152.
▲
ClickHouse Obfuscator – A tool for dataset anonymization
(github.com/ClickHouse)
39 points
rrampage
3 years ago
3 comments
153.
▲
DeepMind's machine-reading question/answer dataset
(github.com/deepmind)
37 points
andrewtbham
11 years ago
3 comments
154.
▲
Madlad-400: A Multilingual and Document-Level Large Audited Dataset
(github.com/google-research)
37 points
the_bookmaker
3 years ago
1 comment
155.
▲
A dataset of crimes committed in Buenos Aires
(github.com/ramadis)
34 points
ramadis
8 years ago
4 comments
156.
▲
Show HN: I used streaming to skip downloading my 45GB dataset
(github.com/DagsHub)
31 points
npRandom
4 years ago
discuss
157.
▲
Toxicity Dataset
(github.com/surge-ai)
25 points
CarrieLab
4 years ago
32 comments
158.
▲
Structured Etymology Dataset
(github.com/droher)
24 points
downboots
a year ago
3 comments
159.
▲
Washington Post publishes dataset of 52,000 criminal homicides
(github.com/washingtonpost)
24 points
danso
8 years ago
2 comments
160.
▲
I have trained StyleGAN2 from scratch with a dataset of female portraits
(github.com/l4rz)
20 points
EvgeniyZh
5 years ago
20 comments
161.
▲
VoxelCNN: Order-Aware Generative Modeling Using the 3D-Craft Dataset
(github.com/facebookresearch)
20 points
ingve
6 years ago
discuss
162.
▲
Show HN: I made this tool for navigating pandas datasets
(github.com/man-group)
20 points
leehcksource
6 years ago
discuss
163.
▲
Show HN: SemHash – Fast Semantic Text Deduplication for Cleaner Datasets
(github.com/MinishLab)
19 points
Pringled
a year ago
6 comments
164.
▲
Show HN: Version code, models, & datasets together in GitHub
19 points
skadamat
3 years ago
6 comments
165.
▲
NLP: A new datasets and metrics library from Hugging Face
(github.com/huggingface)
19 points
julien_c
6 years ago
discuss
166.
▲
Show HN: Dataset of Linus Torvalds' rants sorted by hate
(github.com/corollari)
17 points
corollari
7 years ago
4 comments
167.
▲
GitHub: Awesome-reasoning, a curated list of datasets for reasoning AIs
(github.com/neurallambda)
17 points
neurallambda
2 years ago
discuss
168.
▲
ICLR 2026 – Institutional Affiliations Dataset and Analysis
(github.com/DmytroLopushanskyy)
15 points
stared
21 days ago
2 comments
169.
▲
Datasetq: jq for Datasets; Polars-powered Parquet/JSON/CSV query lang/cli
(github.com/datasetq)
15 points
djb-at-durable
6 months ago
2 comments
170.
▲
Easy way to load, create, version, query and visualize computer vision datasets
13 points
morpheusme
4 years ago
discuss
171.
▲
Show HN: Dataset of 125k Medium Blog Post Titles and Subtitles (With Categories)
(github.com/turbo)
13 points
minxomat
7 years ago
discuss
172.
▲
Show HN: Create datasets more simply and improve AI model with unstructured data
(github.com/adansons)
12 points
KenichiHiguchi
4 years ago
3 comments
173.
▲
Fast and scalable dataset preparation and curation tool from Nvidia
(github.com/NVIDIA)
12 points
shcheklein
2 years ago
discuss
174.
▲
Show HN: Dataset of Sarcastic HN Comments
(github.com/traghav)
11 points
raghavtoshniwal
5 years ago
6 comments
175.
▲
Dimensionality reduction in large data sets using Siamese Networks
(github.com/beringresearch)
11 points
pickleMeTimbers
7 years ago
discuss
176.
▲
Show HN: Download HuggingFace Models/Datasets easily and super fast
(github.com/bodaay)
10 points
qqqbodaayqqq
3 years ago
2 comments
177.
▲
Show HN: Training synthetic models on highly complex datasets
(github.com/gretelai)
10 points
repeat_or
4 years ago
2 comments
178.
▲
Show HN: React-like Declarative DSL for building synthetic LLM datasets
(github.com/qforge-dev)
10 points
arturwala
7 months ago
discuss
179.
▲
Texthero – Python module to analyze any text dataset in seconds
(github.com/jbesomi)
9 points
BertAndErnie
6 years ago
6 comments
180.
▲
Kangas: Explore Multimedia Datasets at Scale
(github.com/comet-ml)
9 points
dmoura
4 years ago
2 comments
More