after several years of data projects, I came out with this library for defining data processing pipelines: SmartPipeline.
I consider it more like a paradigm, really simple, but tailored for all those small and medium projects where one needs some structure and scalability, beyond rotten scripts.
In the end, a pipeline is perfectly suited also for production, with a clean approach for concurrency, stateful stages, handling errors, and logging.
Yes, other frameworks exist for the same job, but this is unique in many ways, you can check yourself!
The paradigm is very elastic, the library is small, no dependencies, no over-computations for the simplest tasks.