Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
151.
Dataset of Linus Torvalds' rants ranked by hate (github.com/corollari)
42 points
fctorial
5 years ago
17 comments
152.
ClickHouse Obfuscator – A tool for dataset anonymization (github.com/ClickHouse)
39 points
rrampage
3 years ago
3 comments
153.
DeepMind's machine-reading question/answer dataset (github.com/deepmind)
37 points
andrewtbham
11 years ago
3 comments
154.
Madlad-400: A Multilingual and Document-Level Large Audited Dataset (github.com/google-research)
37 points
the_bookmaker
3 years ago
1 comment
155.
A dataset of crimes committed in Buenos Aires (github.com/ramadis)
34 points
ramadis
8 years ago
4 comments
156.
Show HN: I used streaming to skip downloading my 45GB dataset (github.com/DagsHub)
31 points
npRandom
4 years ago
discuss
157.
Toxicity Dataset (github.com/surge-ai)
25 points
CarrieLab
4 years ago
32 comments
158.
Structured Etymology Dataset (github.com/droher)
24 points
downboots
a year ago
3 comments
159.
Washington Post publishes dataset of 52,000 criminal homicides (github.com/washingtonpost)
24 points
danso
8 years ago
2 comments
160.
I have trained StyleGAN2 from scratch with a dataset of female portraits (github.com/l4rz)
20 points
EvgeniyZh
5 years ago
20 comments
161.
VoxelCNN: Order-Aware Generative Modeling Using the 3D-Craft Dataset (github.com/facebookresearch)
20 points
ingve
6 years ago
discuss
162.
Show HN: I made this tool for navigating pandas datasets (github.com/man-group)
20 points
leehcksource
6 years ago
discuss
163.
Show HN: SemHash – Fast Semantic Text Deduplication for Cleaner Datasets (github.com/MinishLab)
19 points
Pringled
a year ago
6 comments
164.
Show HN: Version code, models, & datasets together in GitHub
19 points
skadamat
3 years ago
6 comments
165.
NLP: A new datasets and metrics library from Hugging Face (github.com/huggingface)
19 points
julien_c
6 years ago
discuss
166.
Show HN: Dataset of Linus Torvalds' rants sorted by hate (github.com/corollari)
17 points
corollari
7 years ago
4 comments
167.
GitHub: Awesome-reasoning, a curated list of datasets for reasoning AIs (github.com/neurallambda)
17 points
neurallambda
2 years ago
discuss
168.
ICLR 2026 – Institutional Affiliations Dataset and Analysis (github.com/DmytroLopushanskyy)
15 points
stared
21 days ago
2 comments
169.
Datasetq: jq for Datasets; Polars-powered Parquet/JSON/CSV query lang/cli (github.com/datasetq)
15 points
djb-at-durable
6 months ago
2 comments
170.
Easy way to load, create, version, query and visualize computer vision datasets
13 points
morpheusme
4 years ago
discuss
171.
Show HN: Dataset of 125k Medium Blog Post Titles and Subtitles (With Categories) (github.com/turbo)
13 points
minxomat
7 years ago
discuss
172.
Show HN: Create datasets more simply and improve AI model with unstructured data (github.com/adansons)
12 points
KenichiHiguchi
4 years ago
3 comments
173.
Fast and scalable dataset preparation and curation tool from Nvidia (github.com/NVIDIA)
12 points
shcheklein
2 years ago
discuss
174.
Show HN: Dataset of Sarcastic HN Comments (github.com/traghav)
11 points
raghavtoshniwal
5 years ago
6 comments
175.
Dimensionality reduction in large data sets using Siamese Networks (github.com/beringresearch)
11 points
pickleMeTimbers
7 years ago
discuss
176.
Show HN: Download HuggingFace Models/Datasets easily and super fast (github.com/bodaay)
10 points
qqqbodaayqqq
3 years ago
2 comments
177.
Show HN: Training synthetic models on highly complex datasets (github.com/gretelai)
10 points
repeat_or
4 years ago
2 comments
178.
Show HN: React-like Declarative DSL for building synthetic LLM datasets (github.com/qforge-dev)
10 points
arturwala
7 months ago
discuss
179.
Texthero – Python module to analyze any text dataset in seconds (github.com/jbesomi)
9 points
BertAndErnie
6 years ago
6 comments
180.
Kangas: Explore Multimedia Datasets at Scale (github.com/comet-ml)
9 points
dmoura
4 years ago
2 comments
More