Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
91.
Show HN: I made this tool for navigating pandas datasets (github.com/man-group)
20 points
leehcksource
6 years ago
discuss
92.
Show HN: SemHash – Fast Semantic Text Deduplication for Cleaner Datasets (github.com/MinishLab)
19 points
Pringled
a year ago
6 comments
93.
Show HN: Version code, models, & datasets together in GitHub
19 points
skadamat
3 years ago
6 comments
94.
NLP: A new datasets and metrics library from Hugging Face (github.com/huggingface)
19 points
julien_c
6 years ago
discuss
95.
GitHub: Awesome-reasoning, a curated list of datasets for reasoning AIs (github.com/neurallambda)
17 points
neurallambda
2 years ago
discuss
96.
Datasetq: jq for Datasets; Polars-powered Parquet/JSON/CSV query lang/cli (github.com/datasetq)
15 points
djb-at-durable
6 months ago
2 comments
97.
Easy way to load, create, version, query and visualize computer vision datasets
13 points
morpheusme
4 years ago
discuss
98.
Show HN: Create datasets more simply and improve AI model with unstructured data (github.com/adansons)
12 points
KenichiHiguchi
4 years ago
3 comments
99.
Show HN: Download HuggingFace Models/Datasets easily and super fast (github.com/bodaay)
10 points
qqqbodaayqqq
3 years ago
2 comments
100.
Show HN: Training synthetic models on highly complex datasets (github.com/gretelai)
10 points
repeat_or
4 years ago
2 comments
101.
Show HN: React-like Declarative DSL for building synthetic LLM datasets (github.com/qforge-dev)
10 points
arturwala
7 months ago
discuss
102.
Kangas: Explore Multimedia Datasets at Scale (github.com/comet-ml)
9 points
dmoura
4 years ago
2 comments
103.
Open Thoughts: Curating the best reasoning datasets (github.com/open-thoughts)
8 points
madiator
a year ago
discuss
104.
Show HN: Automate Variable Selection for Research on Big Datasets (Open-Source) (github.com/MalikHarrisAhm)
8 points
mha23
2 years ago
discuss
105.
Our classifier outperforms CatBoost, XGBoost, LightGBM on 5 benchmark datasets (github.com/LinearBoost)
6 points
hamid9
2 years ago
5 comments
106.
DatasetGPT – an open-source command line tool for generating datasets with LLMs (github.com/radi-cho)
6 points
radicho123
3 years ago
1 comment
107.
Show HN: FiftyOne – Explore, Analyze and Curate Visual Datasets (github.com/voxel51)
6 points
benjaminpkane
6 years ago
1 comment
108.
Show HN: Xray: N-D labeled arrays and datasets in Python (github.com/xray)
6 points
shoyer
12 years ago
discuss
109.
Show HN: SemHash – Fast Semantic Text Deduplication for Cleaner Datasets (github.com/MinishLab)
6 points
stephantul
a year ago
discuss
110.
Show HN: Interactively explore unstructured datasets from your dataframe (github.com/Renumics)
6 points
sps44
3 years ago
discuss
111.
Kangas: Pandas for Multimedia Datasets (github.com/comet-ml)
6 points
synergy20
3 years ago
discuss
112.
The fastest command-line tools for querying large JSON datasets (github.com/dcmoura)
6 points
zX41ZdbW
4 years ago
discuss
113.
Resampling Unbalanced Datasets (github.com/fmfn)
5 points
hrb1979
12 years ago
discuss
114.
Curated list of language modeling researches for code, plus related datasets (github.com/codefuse-ai)
5 points
Bluestein
a year ago
discuss
115.
Show HN: Byte-Pair Encoding tokenizer for training LLMs on large datasets (github.com/jmaczan)
5 points
yu3zhou4
2 years ago
discuss
116.
DataDM – Search and analyze datasets with LLMs (github.com/approximatelabs)
5 points
cle
3 years ago
discuss
117.
Show HN: Create APIs for static datasets without writing a single line of code (github.com/roapi)
5 points
houqp
5 years ago
discuss
118.
Show HN: Transform Unstructured Data into Usable Datasets (github.com/wizenheimer)
4 points
wizenheimer
2 years ago
1 comment
119.
Show HN: pqry – A fast, lightweight CLI tool to diagnose Parquet datasets (github.com/symblic)
4 points
setzeno
4 months ago
discuss
120.
Show HN: Lance – Open lakehouse format for multimodal AI datasets (github.com/lance-format)
4 points
criexe
5 months ago
discuss
More