Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
91.
DatasetGPT – an open-source command line tool for generating datasets with LLMs (github.com/radi-cho)
6 points
radicho123
3 years ago
1 comment
92.
Show HN: FiftyOne – Explore, Analyze and Curate Visual Datasets (github.com/voxel51)
6 points
benjaminpkane
6 years ago
1 comment
93.
Show HN: Xray: N-D labeled arrays and datasets in Python (github.com/xray)
6 points
shoyer
12 years ago
discuss
94.
Show HN: SemHash – Fast Semantic Text Deduplication for Cleaner Datasets (github.com/MinishLab)
6 points
stephantul
a year ago
discuss
95.
Show HN: Interactively explore unstructured datasets from your dataframe (github.com/Renumics)
6 points
sps44
3 years ago
discuss
96.
Kangas: Pandas for Multimedia Datasets (github.com/comet-ml)
6 points
synergy20
3 years ago
discuss
97.
The fastest command-line tools for querying large JSON datasets (github.com/dcmoura)
6 points
zX41ZdbW
4 years ago
discuss
98.
Resampling Unbalanced Datasets (github.com/fmfn)
5 points
hrb1979
12 years ago
discuss
99.
Curated list of language modeling researches for code, plus related datasets (github.com/codefuse-ai)
5 points
Bluestein
a year ago
discuss
100.
Show HN: Byte-Pair Encoding tokenizer for training LLMs on large datasets (github.com/jmaczan)
5 points
yu3zhou4
2 years ago
discuss
101.
DataDM – Search and analyze datasets with LLMs (github.com/approximatelabs)
5 points
cle
3 years ago
discuss
102.
Show HN: Create APIs for static datasets without writing a single line of code (github.com/roapi)
5 points
houqp
5 years ago
discuss
103.
Show HN: Transform Unstructured Data into Usable Datasets (github.com/wizenheimer)
4 points
wizenheimer
2 years ago
1 comment
104.
Show HN: pqry – A fast, lightweight CLI tool to diagnose Parquet datasets (github.com/symblic)
4 points
setzeno
4 months ago
discuss
105.
Show HN: Lance – Open lakehouse format for multimodal AI datasets (github.com/lance-format)
4 points
criexe
5 months ago
discuss
106.
A curated list of global electrical grid maps, datasets and resources (github.com/open-energy-transition)
4 points
protontypes
7 months ago
discuss
107.
The Well: A 15TB Collection of Physics Simulation Datasets (github.com/PolymathicAI)
4 points
Anon84
9 months ago
discuss
108.
Show HN: Mount remote repositories and datasets managed by Git LFS locally (github.com/git-lfs-fuse)
4 points
rueian
a year ago
discuss
109.
Awesome-Twitter-data: A list of Twitter datasets and related resources (github.com/shaypal5)
4 points
shaypalachy
8 years ago
discuss
110.
Pypixgrid: generate vector tiles for the exploration of spatio-temporal datasets (translate.googleusercontent.com)
4 points
based2
9 years ago
discuss
111.
Show HN: DataBrewer – A CLI-tool to search and discover datasets (github.com/rolando)
4 points
darkrho
9 years ago
discuss
112.
Show HN: Create simulated datasets in Python with Simulacrum (github.com/jbrambleDC)
4 points
jbrambleDC
10 years ago
discuss
113.
hfsearch: a fast cli tool to discover models and datasets on HuggingFace (github.com/HenokB)
3 points
henok_ademtew
6 months ago
1 comment
114.
Show HN: Torque – A declarative, typesafe DSL for LLM training datasets (MIT) (github.com/qforge-dev)
3 points
michalwarda
7 months ago
1 comment
115.
Hugging Face AI Sheets, open-source tool to vibe test models on your datasets (github.com/huggingface)
3 points
dvilasuero
10 months ago
1 comment
116.
Promptwright: Generate large synthetic datasets using a local LLM (github.com/StacklokLabs)
3 points
trickleup
2 years ago
1 comment
117.
Easily convert YouTube, Torrent and Enterprise videos into LLM datasets (github.com/qet-lab)
3 points
m_2018
2 years ago
1 comment
118.
UpliftML: An uplift modeling library that handles web scale datasets (github.com/bookingcom)
3 points
TaXxEr
5 years ago
1 comment
119.
A tool for creating deep learning datasets (github.com/dicroce)
3 points
dicroce
5 years ago
1 comment
120.
Crossfader: Autoencoders to find structure in arbitrary datasets (github.com/bettermg)
3 points
vierja
11 years ago
discuss
More