Show HN: Open-source Data Anonymization tool nxs-data-anonymizer

4 points

2 years ago

Hey HN! We’re excited to share with you nxs-data-anonymizer - an open-sourced tool that helps developers and project teams that are dealing with production and test/dev/stage or dynamic namespaces with databases and need to ensure security and prevent data leaks.

It is no secret that the development of even a small project is closely related to the infrastructure, as any program requires a certain environment. Often several environments are required - one for production, and the rest - for different needs, such as testing. But where do we, as developers, actually get data? Use empty databases? If so, we won’t be able to check anything. Synthetic data? It still needs to be invented and generated correctly. And the application will not always work as we need.

The simplest option is to take data directly from production, which is not safe and secure. And that’s when nxs-data-anonymizer tunes in. The solution we came up with turned out to be quite flexible, and easy to use, and its core is based on the following ideas: Stream data processing. This means that you don’t have to do any pre-processing and save a dump of the original database somewhere on disk. nxs-data-anonymizer can change the data that is on its way to being passed on stdin. And output everything to stdout. I.e. you can build the tool directly in command between two pipes;

The values are described by Go templates. Everything you want to replace in desired cells in a table is defined by templates, similar to Helm, which is well-known to people. Of course, just like in Helm, you can use functions that are familiar to you, for example, to generate random strings or numbers;

Terms of use and data of other cells in the row. Filters can be flexible and make certain substitutions depending on the results of other (or even themselves) cells in the same row;

When a column's security policy is set to randomize cell values, the values are automatically generated based on their data types. We've categorized data types (e.g., for MySQL columns like date and datetime) and ensured that the randomized data aligns with the column's type, providing accurate pre-generated values.

Data consistency. Link block stores links with other columns across all the tables you described in the configuration. I.e. cells in specific columns that have the same values before will have equal values after anonymization.

Now there's also an ability to work with once-generated data through all anonymizations. The newly developed module provides the generation of once-generated data that can be used in filters.

If you have a dynamically developing project with a frequently changing database structure, you won’t have to adjust the anonymizer config every time with our recent feature. Depending on the type of entities in security settings our tool anonymizes the columns for tables with described rules in the filters section. nxs-data-anonymizer allows you to exclude undescribed data from the resulting dump. There were lots of requests about this feature + it’s actually a cool one, so we came up with it. We’re looking forward to improving our tool so we’d love to see any feedback, contributions, or report any issues you encounter! Please join our chat in telegram: https://t.me/nxs_data_anonymizer where you can discuss something about nxs-backup and ask any questions in the chat: https://t.me/nxs_data_anonymizer_chat