This is my first time sharing a project here - I built ripoff, a fake data generator for PostgreSQL that goes from templated yaml files to rows in your database.
In my personal and professional work I found that fake data was usually either generated in the application layer, which is awkward and slow(er than SQL), or way too focused on random generation resulting in a database full of data that’s unsavory for humans.
ripoff is built for cases where you know the shape of your data, like local development, integration testing, and setting up demos. Unlike other fake data generators ripoff isn’t aware of your schema or app, so it feels more like writing templated SQL than using a DSL.
The yaml format is a map of unique identifiers to column values, where column values can be literal strings, references to other rows, or functions that generate random data.
All random data generated by ripoff requires an explicit seed, which is neat because re-running ripoff will always generate the same content. That determinism enables ripoff to perform upserts when re-run on the same database, so you don't have to wipe your DB after editing fake data.
Here’s a complex but real world example of what a ripoff file might look like: https://github.com/mortenson/ripoff/blob/main/testdata/real_...
I just published it so there’s some rough edges, but you should be able to give it a try with any project that uses PostgreSQL! Information on installation and use can be found in the README.
Thanks and let me know what you think!