Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
181.
Show HN: Create datasets more simply and improve AI model with unstructured data (github.com/adansons)
12 points
KenichiHiguchi
4 years ago
3 comments
182.
Fast and scalable dataset preparation and curation tool from Nvidia (github.com/NVIDIA)
12 points
shcheklein
2 years ago
discuss
183.
Show HN: Dataset of Sarcastic HN Comments (github.com/traghav)
11 points
raghavtoshniwal
5 years ago
6 comments
184.
Show HN: Download HuggingFace Models/Datasets easily and super fast (github.com/bodaay)
10 points
qqqbodaayqqq
3 years ago
2 comments
185.
Show HN: Training synthetic models on highly complex datasets (github.com/gretelai)
10 points
repeat_or
4 years ago
2 comments
186.
Show HN: React-like Declarative DSL for building synthetic LLM datasets (github.com/qforge-dev)
10 points
arturwala
7 months ago
discuss
187.
Texthero – Python module to analyze any text dataset in seconds (github.com/jbesomi)
9 points
BertAndErnie
6 years ago
6 comments
188.
Kangas: Explore Multimedia Datasets at Scale (github.com/comet-ml)
9 points
dmoura
4 years ago
2 comments
189.
Nvidia open sources the synthetic data framework used to build Nemotron datasets
8 points
alexwatson405
6 months ago
1 comment
190.
Show HN: Using DSPy to enrich a dataset of the Nobel laureate network (blog.kuzudb.com)
8 points
laminarflow027
10 months ago
discuss
191.
Open Thoughts: Curating the best reasoning datasets (github.com/open-thoughts)
8 points
madiator
a year ago
discuss
192.
Show HN: Automate Variable Selection for Research on Big Datasets (Open-Source) (github.com/MalikHarrisAhm)
8 points
mha23
2 years ago
discuss
193.
Show HN: GitHub Typo Corpus: Largest Dataset of Misspellings and Grammar Errors (github.com/mhagiwara)
8 points
mhagiwara
7 years ago
discuss
194.
Show HN: Open-source LLM and dataset for sports forecasting (Pro Golf) (huggingface.co)
7 points
bturtel
3 months ago
discuss
195.
Show HN: Bridge-ds – Dataset handling for any modality a la Pandas (github.com/guybuk)
7 points
guyuz
2 years ago
discuss
196.
Show HN: Interactively explore your Hugging Face dataset with one line of code (huggingface.co)
7 points
sps44
3 years ago
discuss
197.
Our classifier outperforms CatBoost, XGBoost, LightGBM on 5 benchmark datasets (github.com/LinearBoost)
6 points
hamid9
2 years ago
5 comments
198.
Show HN: Terminal-Wrench, a dataset of 331 realistic hackable environments (github.com/few-sh)
6 points
neversupervised
2 months ago
2 comments
199.
Show HN: FiftyOne – Explore, Analyze and Curate Visual Datasets (github.com/voxel51)
6 points
benjaminpkane
6 years ago
1 comment
200.
Show HN: Open Covid-19 Dataset (github.com/open-covid-19)
6 points
omtinez
6 years ago
1 comment
201.
Show HN: Xray: N-D labeled arrays and datasets in Python (github.com/xray)
6 points
shoyer
12 years ago
discuss
202.
Show HN: Generate Fine-tunning dataset using deep research in terminal (github.com/Datalore-ai)
6 points
FineTuner42
10 months ago
discuss
203.
Show HN: SemHash – Fast Semantic Text Deduplication for Cleaner Datasets (github.com/MinishLab)
6 points
stephantul
a year ago
discuss
204.
Show HN: Interactively explore unstructured datasets from your dataframe (github.com/Renumics)
6 points
sps44
3 years ago
discuss
205.
Kangas: Pandas for Multimedia Datasets (github.com/comet-ml)
6 points
synergy20
3 years ago
discuss
206.
The fastest command-line tools for querying large JSON datasets (github.com/dcmoura)
6 points
zX41ZdbW
4 years ago
discuss
207.
Video Classification Starter Code for Working with the YouTube-8M Dataset (github.com/google)
6 points
tylerwhipple
9 years ago
discuss
208.
Resampling Unbalanced Datasets (github.com/fmfn)
5 points
hrb1979
12 years ago
discuss
209.
Curated list of language modeling researches for code, plus related datasets (github.com/codefuse-ai)
5 points
Bluestein
a year ago
discuss
210.
Show HN: Byte-Pair Encoding tokenizer for training LLMs on large datasets (github.com/jmaczan)
5 points
yu3zhou4
2 years ago
discuss
More