Data Masking for testing and tuning on realistic data

rade · May 3, 2017, 7:37pm

That would be OK, fro the start. My idea is to define the desired mode of masking, deterministic or random. We can put ‘det’ or ‘rand’ or a full formula in that column.
det should be used when consistency of values is needed - same values mask into same values.
An explicit formula can be used for text columns when some human readable value is desired, as my default det and rand functions produce just gibberish, within the length or the original data.
rand should be used when this kind of consistency is not needed, as it provides better obfuscation.
As all the primary keys in metasfresh are synthetic and so are the foreign keys, they probably do not need to be masked, so there will be no issues with integrity constraints.
I can change my script can read directly from this additional column in ad_columns. Currently, it is reading the database column description, so no modification is needed in ad_columns, and it is universal for any PostgreSQL database. It finds “msk"det” or “msk"rand” or “msk"some–SQL-expression” in the column description.