Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
241.
▲
Show HN: Create simulated datasets in Python with Simulacrum
(github.com/jbrambleDC)
4 points
jbrambleDC
10 years ago
discuss
242.
▲
A Python tool that automatically cleans data sets and readies them for analysis
(github.com/rhiever)
4 points
felix_thursday
10 years ago
discuss
243.
▲
Show HN: Kiln - Interactive LLM fine-tuning, dataset collab & synthetic data gen
(github.com/Kiln-AI)
3 points
scosman
a year ago
2 comments
244.
▲
Large New Dataset 220k AI Art Text to Image Prompts
(github.com/lee101)
3 points
wrdsmsh321
2 years ago
2 comments
245.
▲
hfsearch: a fast cli tool to discover models and datasets on HuggingFace
(github.com/HenokB)
3 points
henok_ademtew
6 months ago
1 comment
246.
▲
Show HN: Torque – A declarative, typesafe DSL for LLM training datasets (MIT)
(github.com/qforge-dev)
3 points
michalwarda
7 months ago
1 comment
247.
▲
Hugging Face AI Sheets, open-source tool to vibe test models on your datasets
(github.com/huggingface)
3 points
dvilasuero
10 months ago
1 comment
248.
▲
Promptwright: Generate large synthetic datasets using a local LLM
(github.com/StacklokLabs)
3 points
trickleup
2 years ago
1 comment
249.
▲
Easily convert YouTube, Torrent and Enterprise videos into LLM datasets
(github.com/qet-lab)
3 points
m_2018
2 years ago
1 comment
250.
▲
CodeCapybara: Code Writing LLaMa Finetuned on Deepmind Dataset
(github.com/AI4Code-Research)
3 points
brucethemoose2
3 years ago
1 comment
251.
▲
UpliftML: An uplift modeling library that handles web scale datasets
(github.com/bookingcom)
3 points
TaXxEr
5 years ago
1 comment
252.
▲
A tool for creating deep learning datasets
(github.com/dicroce)
3 points
dicroce
5 years ago
1 comment
253.
▲
Show HN: A dataset of 40k professionally-written summaries of news articles
(github.com/curationcorp)
3 points
CurationCorp
6 years ago
1 comment
254.
▲
Crossfader: Autoencoders to find structure in arbitrary datasets
(github.com/bettermg)
3 points
vierja
11 years ago
discuss
255.
▲
ExCon is an R/JavaScript tool for exploring topographic-like data sets
(github.com/bryanhanson)
3 points
sebg
12 years ago
discuss
256.
▲
Machine Learning: Access Tiny Images Dataset with Python
(github.com/cioc)
3 points
cioc
13 years ago
discuss
257.
▲
Open Data Hub Data Browser – Explore and Query Open Datasets
(github.com/noi-techpark)
3 points
KadambariSuresh
3 months ago
discuss
258.
▲
JQuery dataset() Plugin
(github.com/realchaseadams)
3 points
nwienert
14 years ago
discuss
259.
▲
WebZFS Modern Web Management for ZFS Pools/Datasets/Snapshots/Smart Monitoring
(github.com/webzfs)
3 points
vermaden
5 months ago
discuss
260.
▲
Data-morph: Morph a dataset into select shapes, while preserving the statistics
(github.com/stefmolin)
3 points
ZeljkoS
9 months ago
discuss
261.
▲
Show HN: Synthetic dataset generator for NLP and tabular data
(github.com/VoxDroid)
3 points
voxdroid
a year ago
discuss
262.
▲
DataChain: Prepare and curate datasets for AI/ML
(github.com/iterative)
3 points
shcheklein
2 years ago
discuss
263.
▲
Reladiff: High-performance diffing of large datasets across databases
(github.com/erezsh)
3 points
PaulHoule
2 years ago
discuss
264.
▲
RNNoise 0.2 – now trained using only publicly available CC-licensed datasets
(github.com/xiph)
3 points
pabs3
2 years ago
discuss
265.
▲
ClickHouse-Obfuscator – a tool for dataset anonymization
(github.com/ClickHouse)
3 points
aeontech
3 years ago
discuss
266.
▲
CommaVQ: Dataset of 100k Driving Videos
(github.com/commaai)
3 points
kklisura
3 years ago
discuss
267.
▲
Img2dataset: Turns large sets of image URLs to an image dataset
(github.com/rom1504)
3 points
wildpeaks
3 years ago
discuss
268.
▲
Dataset with Vulgar and Offensive California Vanity License Plates
(github.com/veltman)
3 points
RamblingCTO
3 years ago
discuss
269.
▲
Parse research papers into a structured dataset
(github.com/neuml)
3 points
txtai
3 years ago
discuss
270.
▲
Legal NLP Dataset With Over 39,000 Examples
(github.com/TheAtticusProject)
3 points
optimalsolver
3 years ago
discuss
More