Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
91.
▲
DatasetGPT – an open-source command line tool for generating datasets with LLMs
(github.com/radi-cho)
6 points
radicho123
3 years ago
1 comment
92.
▲
Show HN: FiftyOne – Explore, Analyze and Curate Visual Datasets
(github.com/voxel51)
6 points
benjaminpkane
6 years ago
1 comment
93.
▲
Show HN: Xray: N-D labeled arrays and datasets in Python
(github.com/xray)
6 points
shoyer
12 years ago
discuss
94.
▲
Show HN: SemHash – Fast Semantic Text Deduplication for Cleaner Datasets
(github.com/MinishLab)
6 points
stephantul
a year ago
discuss
95.
▲
Show HN: Interactively explore unstructured datasets from your dataframe
(github.com/Renumics)
6 points
sps44
3 years ago
discuss
96.
▲
Kangas: Pandas for Multimedia Datasets
(github.com/comet-ml)
6 points
synergy20
3 years ago
discuss
97.
▲
The fastest command-line tools for querying large JSON datasets
(github.com/dcmoura)
6 points
zX41ZdbW
4 years ago
discuss
98.
▲
Resampling Unbalanced Datasets
(github.com/fmfn)
5 points
hrb1979
12 years ago
discuss
99.
▲
Curated list of language modeling researches for code, plus related datasets
(github.com/codefuse-ai)
5 points
Bluestein
a year ago
discuss
100.
▲
Show HN: Byte-Pair Encoding tokenizer for training LLMs on large datasets
(github.com/jmaczan)
5 points
yu3zhou4
2 years ago
discuss
101.
▲
DataDM – Search and analyze datasets with LLMs
(github.com/approximatelabs)
5 points
cle
3 years ago
discuss
102.
▲
Show HN: Create APIs for static datasets without writing a single line of code
(github.com/roapi)
5 points
houqp
5 years ago
discuss
103.
▲
Show HN: Transform Unstructured Data into Usable Datasets
(github.com/wizenheimer)
4 points
wizenheimer
2 years ago
1 comment
104.
▲
Show HN: pqry – A fast, lightweight CLI tool to diagnose Parquet datasets
(github.com/symblic)
4 points
setzeno
4 months ago
discuss
105.
▲
Show HN: Lance – Open lakehouse format for multimodal AI datasets
(github.com/lance-format)
4 points
criexe
5 months ago
discuss
106.
▲
A curated list of global electrical grid maps, datasets and resources
(github.com/open-energy-transition)
4 points
protontypes
7 months ago
discuss
107.
▲
The Well: A 15TB Collection of Physics Simulation Datasets
(github.com/PolymathicAI)
4 points
Anon84
9 months ago
discuss
108.
▲
Show HN: Mount remote repositories and datasets managed by Git LFS locally
(github.com/git-lfs-fuse)
4 points
rueian
a year ago
discuss
109.
▲
Awesome-Twitter-data: A list of Twitter datasets and related resources
(github.com/shaypal5)
4 points
shaypalachy
8 years ago
discuss
110.
▲
Pypixgrid: generate vector tiles for the exploration of spatio-temporal datasets
(translate.googleusercontent.com)
4 points
based2
9 years ago
discuss
111.
▲
Show HN: DataBrewer – A CLI-tool to search and discover datasets
(github.com/rolando)
4 points
darkrho
9 years ago
discuss
112.
▲
Show HN: Create simulated datasets in Python with Simulacrum
(github.com/jbrambleDC)
4 points
jbrambleDC
10 years ago
discuss
113.
▲
hfsearch: a fast cli tool to discover models and datasets on HuggingFace
(github.com/HenokB)
3 points
henok_ademtew
6 months ago
1 comment
114.
▲
Show HN: Torque – A declarative, typesafe DSL for LLM training datasets (MIT)
(github.com/qforge-dev)
3 points
michalwarda
7 months ago
1 comment
115.
▲
Hugging Face AI Sheets, open-source tool to vibe test models on your datasets
(github.com/huggingface)
3 points
dvilasuero
10 months ago
1 comment
116.
▲
Promptwright: Generate large synthetic datasets using a local LLM
(github.com/StacklokLabs)
3 points
trickleup
2 years ago
1 comment
117.
▲
Easily convert YouTube, Torrent and Enterprise videos into LLM datasets
(github.com/qet-lab)
3 points
m_2018
2 years ago
1 comment
118.
▲
UpliftML: An uplift modeling library that handles web scale datasets
(github.com/bookingcom)
3 points
TaXxEr
5 years ago
1 comment
119.
▲
A tool for creating deep learning datasets
(github.com/dicroce)
3 points
dicroce
5 years ago
1 comment
120.
▲
Crossfader: Autoencoders to find structure in arbitrary datasets
(github.com/bettermg)
3 points
vierja
11 years ago
discuss
More