Search: github.com/datasetq | Heykuki News

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

181.

Show HN: Create datasets more simply and improve AI model with unstructured data (github.com/adansons)

12 points

4 years ago

182.

Fast and scalable dataset preparation and curation tool from Nvidia (github.com/NVIDIA)

12 points

2 years ago

183.

Show HN: Dataset of Sarcastic HN Comments (github.com/traghav)

11 points

raghavtoshniwal

5 years ago

184.

Show HN: Download HuggingFace Models/Datasets easily and super fast (github.com/bodaay)

10 points

3 years ago

185.

Show HN: Training synthetic models on highly complex datasets (github.com/gretelai)

10 points

4 years ago

186.

Show HN: React-like Declarative DSL for building synthetic LLM datasets (github.com/qforge-dev)

10 points

7 months ago

187.

Texthero – Python module to analyze any text dataset in seconds (github.com/jbesomi)

9 points

6 years ago

188.

Kangas: Explore Multimedia Datasets at Scale (github.com/comet-ml)

9 points

4 years ago

189.

Nvidia open sources the synthetic data framework used to build Nemotron datasets

8 points

6 months ago

190.

Show HN: Using DSPy to enrich a dataset of the Nobel laureate network (blog.kuzudb.com)

8 points

10 months ago

191.

Open Thoughts: Curating the best reasoning datasets (github.com/open-thoughts)

8 points

a year ago

192.

Show HN: Automate Variable Selection for Research on Big Datasets (Open-Source) (github.com/MalikHarrisAhm)

8 points

2 years ago

193.

Show HN: GitHub Typo Corpus: Largest Dataset of Misspellings and Grammar Errors (github.com/mhagiwara)

8 points

7 years ago

194.

Show HN: Open-source LLM and dataset for sports forecasting (Pro Golf) (huggingface.co)

7 points

3 months ago

195.

Show HN: Bridge-ds – Dataset handling for any modality a la Pandas (github.com/guybuk)

7 points

2 years ago

196.

Show HN: Interactively explore your Hugging Face dataset with one line of code (huggingface.co)

7 points

3 years ago

197.

Our classifier outperforms CatBoost, XGBoost, LightGBM on 5 benchmark datasets (github.com/LinearBoost)

6 points

2 years ago

198.

Show HN: Terminal-Wrench, a dataset of 331 realistic hackable environments (github.com/few-sh)

6 points

neversupervised

2 months ago

199.

Show HN: FiftyOne – Explore, Analyze and Curate Visual Datasets (github.com/voxel51)

6 points

6 years ago

200.

Show HN: Open Covid-19 Dataset (github.com/open-covid-19)

6 points

6 years ago

201.

Show HN: Xray: N-D labeled arrays and datasets in Python (github.com/xray)

6 points

12 years ago

202.

Show HN: Generate Fine-tunning dataset using deep research in terminal (github.com/Datalore-ai)

6 points

10 months ago

203.

Show HN: SemHash – Fast Semantic Text Deduplication for Cleaner Datasets (github.com/MinishLab)

6 points

a year ago

204.

Show HN: Interactively explore unstructured datasets from your dataframe (github.com/Renumics)

6 points

3 years ago

205.

Kangas: Pandas for Multimedia Datasets (github.com/comet-ml)

6 points

3 years ago

206.

The fastest command-line tools for querying large JSON datasets (github.com/dcmoura)

6 points

4 years ago

207.

Video Classification Starter Code for Working with the YouTube-8M Dataset (github.com/google)

6 points

9 years ago

208.

Resampling Unbalanced Datasets (github.com/fmfn)

5 points

12 years ago

209.

Curated list of language modeling researches for code, plus related datasets (github.com/codefuse-ai)

5 points

a year ago

210.

Show HN: Byte-Pair Encoding tokenizer for training LLMs on large datasets (github.com/jmaczan)

5 points

2 years ago