Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
181.
Nvidia open sources the synthetic data framework used to build Nemotron datasets
8 points
alexwatson405
6 months ago
1 comment
182.
Show HN: Using DSPy to enrich a dataset of the Nobel laureate network (blog.kuzudb.com)
8 points
laminarflow027
10 months ago
discuss
183.
Open Thoughts: Curating the best reasoning datasets (github.com/open-thoughts)
8 points
madiator
a year ago
discuss
184.
Show HN: Automate Variable Selection for Research on Big Datasets (Open-Source) (github.com/MalikHarrisAhm)
8 points
mha23
2 years ago
discuss
185.
Show HN: GitHub Typo Corpus: Largest Dataset of Misspellings and Grammar Errors (github.com/mhagiwara)
8 points
mhagiwara
7 years ago
discuss
186.
Show HN: Open-source LLM and dataset for sports forecasting (Pro Golf) (huggingface.co)
7 points
bturtel
3 months ago
discuss
187.
Show HN: Bridge-ds – Dataset handling for any modality a la Pandas (github.com/guybuk)
7 points
guyuz
2 years ago
discuss
188.
Show HN: Interactively explore your Hugging Face dataset with one line of code (huggingface.co)
7 points
sps44
3 years ago
discuss
189.
Our classifier outperforms CatBoost, XGBoost, LightGBM on 5 benchmark datasets (github.com/LinearBoost)
6 points
hamid9
2 years ago
5 comments
190.
Show HN: Terminal-Wrench, a dataset of 331 realistic hackable environments (github.com/few-sh)
6 points
neversupervised
2 months ago
2 comments
191.
DatasetGPT – an open-source command line tool for generating datasets with LLMs (github.com/radi-cho)
6 points
radicho123
3 years ago
1 comment
192.
Show HN: FiftyOne – Explore, Analyze and Curate Visual Datasets (github.com/voxel51)
6 points
benjaminpkane
6 years ago
1 comment
193.
Show HN: Open Covid-19 Dataset (github.com/open-covid-19)
6 points
omtinez
6 years ago
1 comment
194.
Show HN: Xray: N-D labeled arrays and datasets in Python (github.com/xray)
6 points
shoyer
12 years ago
discuss
195.
Show HN: Generate Fine-tunning dataset using deep research in terminal (github.com/Datalore-ai)
6 points
FineTuner42
10 months ago
discuss
196.
Show HN: SemHash – Fast Semantic Text Deduplication for Cleaner Datasets (github.com/MinishLab)
6 points
stephantul
a year ago
discuss
197.
Show HN: Interactively explore unstructured datasets from your dataframe (github.com/Renumics)
6 points
sps44
3 years ago
discuss
198.
Kangas: Pandas for Multimedia Datasets (github.com/comet-ml)
6 points
synergy20
3 years ago
discuss
199.
The fastest command-line tools for querying large JSON datasets (github.com/dcmoura)
6 points
zX41ZdbW
4 years ago
discuss
200.
Video Classification Starter Code for Working with the YouTube-8M Dataset (github.com/google)
6 points
tylerwhipple
9 years ago
discuss
201.
Select2: jQuery select boxes with search, remote data sets, infinite scrolling (ivaynberg.github.com)
5 points
soulclap
14 years ago
1 comment
202.
Resampling Unbalanced Datasets (github.com/fmfn)
5 points
hrb1979
12 years ago
discuss
203.
Curated list of language modeling researches for code, plus related datasets (github.com/codefuse-ai)
5 points
Bluestein
a year ago
discuss
204.
Show HN: Byte-Pair Encoding tokenizer for training LLMs on large datasets (github.com/jmaczan)
5 points
yu3zhou4
2 years ago
discuss
205.
DataDM – Search and analyze datasets with LLMs (github.com/approximatelabs)
5 points
cle
3 years ago
discuss
206.
DataDM: Open-source local-LLM code-interpreter with dataset search (github.com/approximatelabs)
5 points
bluecoconut
3 years ago
discuss
207.
Show HN: Multiobjective Large-Scale Fashion Dataset with Distributional Shifts (github.com/st-tech)
5 points
nanikano
5 years ago
discuss
208.
Show HN: H5records – simple large dataset for pytorch training (github.com/theblackcat102)
5 points
polymorph1sm
5 years ago
discuss
209.
Show HN: Create APIs for static datasets without writing a single line of code (github.com/roapi)
5 points
houqp
5 years ago
discuss
210.
Show HN: We made a dataset differ! (Free, Open source) (github.com/qri-io)
5 points
rgardaphe
7 years ago
discuss
More