Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
181.
▲
Show HN: Create datasets more simply and improve AI model with unstructured data
(github.com/adansons)
12 points
KenichiHiguchi
4 years ago
3 comments
182.
▲
Fast and scalable dataset preparation and curation tool from Nvidia
(github.com/NVIDIA)
12 points
shcheklein
2 years ago
discuss
183.
▲
Show HN: Dataset of Sarcastic HN Comments
(github.com/traghav)
11 points
raghavtoshniwal
5 years ago
6 comments
184.
▲
Show HN: Download HuggingFace Models/Datasets easily and super fast
(github.com/bodaay)
10 points
qqqbodaayqqq
3 years ago
2 comments
185.
▲
Show HN: Training synthetic models on highly complex datasets
(github.com/gretelai)
10 points
repeat_or
4 years ago
2 comments
186.
▲
Show HN: React-like Declarative DSL for building synthetic LLM datasets
(github.com/qforge-dev)
10 points
arturwala
7 months ago
discuss
187.
▲
Texthero – Python module to analyze any text dataset in seconds
(github.com/jbesomi)
9 points
BertAndErnie
6 years ago
6 comments
188.
▲
Kangas: Explore Multimedia Datasets at Scale
(github.com/comet-ml)
9 points
dmoura
4 years ago
2 comments
189.
▲
Nvidia open sources the synthetic data framework used to build Nemotron datasets
8 points
alexwatson405
6 months ago
1 comment
190.
▲
Show HN: Using DSPy to enrich a dataset of the Nobel laureate network
(blog.kuzudb.com)
8 points
laminarflow027
10 months ago
discuss
191.
▲
Open Thoughts: Curating the best reasoning datasets
(github.com/open-thoughts)
8 points
madiator
a year ago
discuss
192.
▲
Show HN: Automate Variable Selection for Research on Big Datasets (Open-Source)
(github.com/MalikHarrisAhm)
8 points
mha23
2 years ago
discuss
193.
▲
Show HN: GitHub Typo Corpus: Largest Dataset of Misspellings and Grammar Errors
(github.com/mhagiwara)
8 points
mhagiwara
7 years ago
discuss
194.
▲
Show HN: Open-source LLM and dataset for sports forecasting (Pro Golf)
(huggingface.co)
7 points
bturtel
3 months ago
discuss
195.
▲
Show HN: Bridge-ds – Dataset handling for any modality a la Pandas
(github.com/guybuk)
7 points
guyuz
2 years ago
discuss
196.
▲
Show HN: Interactively explore your Hugging Face dataset with one line of code
(huggingface.co)
7 points
sps44
3 years ago
discuss
197.
▲
Our classifier outperforms CatBoost, XGBoost, LightGBM on 5 benchmark datasets
(github.com/LinearBoost)
6 points
hamid9
2 years ago
5 comments
198.
▲
Show HN: Terminal-Wrench, a dataset of 331 realistic hackable environments
(github.com/few-sh)
6 points
neversupervised
2 months ago
2 comments
199.
▲
Show HN: FiftyOne – Explore, Analyze and Curate Visual Datasets
(github.com/voxel51)
6 points
benjaminpkane
6 years ago
1 comment
200.
▲
Show HN: Open Covid-19 Dataset
(github.com/open-covid-19)
6 points
omtinez
6 years ago
1 comment
201.
▲
Show HN: Xray: N-D labeled arrays and datasets in Python
(github.com/xray)
6 points
shoyer
12 years ago
discuss
202.
▲
Show HN: Generate Fine-tunning dataset using deep research in terminal
(github.com/Datalore-ai)
6 points
FineTuner42
10 months ago
discuss
203.
▲
Show HN: SemHash – Fast Semantic Text Deduplication for Cleaner Datasets
(github.com/MinishLab)
6 points
stephantul
a year ago
discuss
204.
▲
Show HN: Interactively explore unstructured datasets from your dataframe
(github.com/Renumics)
6 points
sps44
3 years ago
discuss
205.
▲
Kangas: Pandas for Multimedia Datasets
(github.com/comet-ml)
6 points
synergy20
3 years ago
discuss
206.
▲
The fastest command-line tools for querying large JSON datasets
(github.com/dcmoura)
6 points
zX41ZdbW
4 years ago
discuss
207.
▲
Video Classification Starter Code for Working with the YouTube-8M Dataset
(github.com/google)
6 points
tylerwhipple
9 years ago
discuss
208.
▲
Resampling Unbalanced Datasets
(github.com/fmfn)
5 points
hrb1979
12 years ago
discuss
209.
▲
Curated list of language modeling researches for code, plus related datasets
(github.com/codefuse-ai)
5 points
Bluestein
a year ago
discuss
210.
▲
Show HN: Byte-Pair Encoding tokenizer for training LLMs on large datasets
(github.com/jmaczan)
5 points
yu3zhou4
2 years ago
discuss
More