GithubHelp home page GithubHelp logo

cgi-fr / pimo Goto Github PK

View Code? Open in Web Editor NEW
37.0 6.0 11.0 6.93 MB

Private Input Masked Output - PIMO is a tool for data masking (anonymization, pseudonymization, ...).

Home Page: https://cgi-fr.github.io/lino-doc/

License: GNU General Public License v3.0

Dockerfile 0.51% Shell 0.45% Makefile 0.52% Go 85.85% JavaScript 9.27% Elm 3.40%
jsonlines rgpd anonymization testdata random-generation format-preserving-encryption fpe gdpr

pimo's People

Contributors

adrienaury avatar capkicklee avatar chao-ma5566 avatar dependabot[bot] avatar giraud10 avatar p0labrd avatar romandguillaume avatar youen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pimo's Issues

[REFACTORING] MaskFactoryConfiguration

pour injecter un paramètre supplémentaire dans une MaskFactory, par exemple chaches ou seed; on doit modifier toutes les fonctions MaskFactory ....

Je propose de faire un refactoring pour modifier l'interface MaskFactory qui prend un paramètre une nouvelle structure MaskFactoyConfiguration. De cette façon l'injection de nouveaux paramètres externe sera simplifier.

Originally posted by @youen in #139 (comment)

bug: jsonpath to array component not working

data.jsonl

{"elements":[{"persons": [{"phonenumber": "027"}]}]}

masking.yml

version: "1"
seed: 42
masking:
  - selector:
      jsonpath: "elements.persons.phonenumber"
    mask:
      regex: "0[1-7]( ([0-9]){2}){4}"

Result

$ pimo <data.jsonl >dataout.jsonl

panic: interface conversion: model.Entry is []model.Entry, not map[string]model.Entry

goroutine 1 [running]:
makeit.imfr.cgi.com/makeit2/scm/lino/pimo/pkg/model.ComplexePathSelector.Write(0xc00002c8a9, 0x7, 0x92c978, 0xc00001f730, 0xc0001ba300, 0x80e3e0, 0xc00001f7d0, 0xc00011dc20)
        /workspace/pkg/model/model.go:308 +0x6c5
makeit.imfr.cgi.com/makeit2/scm/lino/pimo/pkg/model.ComplexePathSelector.Write(0xc00002c8a0, 0x8, 0x92c938, 0xc00006e840, 0xc0001ba360, 0x807240, 0xc00000c870, 0xc00000c870)
        /workspace/pkg/model/model.go:302 +0x2de
makeit.imfr.cgi.com/makeit2/scm/lino/pimo/pkg/model.(*MaskEngineProcess).ProcessDictionary(0xc00006e880, 0xc0001ba360, 0x924260, 0xc00006e8a0, 0x0, 0x0)
        /workspace/pkg/model/model.go:460 +0x2dc
makeit.imfr.cgi.com/makeit2/scm/lino/pimo/pkg/model.(*ProcessPipeline).Next(0xc0000367c0, 0x0)
        /workspace/pkg/model/model.go:606 +0x96
makeit.imfr.cgi.com/makeit2/scm/lino/pimo/pkg/model.SimpleSinkedPipeline.Run(0x92bc50, 0xc0000367c0, 0x927e40, 0xc00001f740, 0x927e40, 0xc00001f740)
        /workspace/pkg/model/model.go:648 +0x89
main.run()
        /workspace/cmd/pimo/main.go:127 +0x65b
main.main.func1(0xc000128840, 0xba2298, 0x0, 0x0)
        /workspace/cmd/pimo/main.go:76 +0x25
github.com/spf13/cobra.(*Command).execute(0xc000128840, 0xc00001e210, 0x0, 0x0, 0xc000128840, 0xc00001e210)
        /go/pkg/mod/github.com/spf13/[email protected]/command.go:846 +0x2c2
github.com/spf13/cobra.(*Command).ExecuteC(0xc000128840, 0xb71910, 0x89c08d, 0xa)
        /go/pkg/mod/github.com/spf13/[email protected]/command.go:950 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
        /go/pkg/mod/github.com/spf13/[email protected]/command.go:887
main.main()
        /workspace/cmd/pimo/main.go:86 +0x433

[PROPOSAL] Reverse cache masking

Problem

Pimo is able to transpose original values to masked value using cache feature.

version: "1"
seed: 42
masking:
  - selector:
      jsonpath: "category"
    mask:
      incremental:
        start: 1
        increment: 1
    # Optional cache (coherence preservation)
    cache: "cacheCategory"

caches:
  cacheCategory:
    # Optional bijective cache (enable re-identification if the cache is dumped on disk)
    unique: true
$ pimo --dump-cache cacheCategory=categoryCache.jsonl <<EOF
{ "category": "Animal" }
{ "category": "Food" }
{ "category": "IT" }
EOF
{ "category": 1 }
{ "category": 2 }
{ "category": 3 }

Pimo create a cache file categoryCache.jsonl

{ "Animal": 1 }
{ "Food": 2 }
{ "IT": 3 }

No documentation explain how to restore original data from masked data and cache file.

{ "category": 1 }
{ "category": 2 }
{ "category": 3 }

[PROPOSAL] HTTP API service

HTTP Service

This is proposal to expose PIMO's pipeline as a HTTP service.

Stateless vs Statefull

PIMO's pipeline could be statefull (if it's using sequence mask or cache) or stateless. For this proposal PIMO HTTP API is stateless and each HTTP request simulate a PIMO's run. A statefull implementation will be propose in an other issue.

HTTP server

The command pimo --http start the http server on port 8000 . Port is configurable with the --port option.

HTTP API

The root path is /api/v1/

GET

HTTP GET method simulate the --empty-input option

POST

HTTP POST method send the input as a body to the server

[PROPOSAL] Luhn mask

Problem

The Luhn algorithm is a simple checksum formula, used by french national bureau of statistics (INSEE).

When a SIREN or SIRET code is anonymized, the last digit must be recalculated with the Luhn algorithm.

Solution

- selector:
    jsonpath: "siren"
  mask:
    # perform a luhn mod10 with the specified mapping on input string (char '0' = 0, char '1' = 1, ...)
    luhn:
      mod: 10
      map: "0123456789"

If len(map) != mod the mask will report a configuration error.

The previous example contains the default values for mod and map so it could be written as :

- selector:
    jsonpath: "siren"
  mask:
    # perform a luhn mod10 on a numeric string
    luhn: {}
$ echo '{"siren": "12345678"}' | pimo
{"siren": "123456782"}

[REFACTOR] global variables

Some global vars in package model should be encapsulated in a root object

var maskContextFactories []MaskContextFactory
var maskFactories []MaskFactory

[PROPOSAL] option Preserve for fromCacheProcess

Current situtation

FromCacheProcess presented in README.md

If no matching is found in the cache, fromCache block the current line and the next lines are processing until a matching content go into the cache.

Sometimes we might want to preserve the current value if no match found (data structure manipulation use)

masking.yml

version: 1
seed: 42
masking:
  - selector:
      jsonpath: id
    mask:
      fromCache: mycache
  - selector:
      jsonpath: name
    mask:
      randomChoiceInUri: "pimo://nameFR"
caches:
  mycache: {}

data.jsonl

{"id":1,"name":"Pierre"}
{"id":2,"name":"Paul"}
{"id":3,"name":"Jacques"}

cache.jsonl

{"key":1,"value":11}
{"key":3,"value":13}
cat data.jsonl | pimo --load-cache mycache=cache.jsonl    
{"id":11,"name":"Rolande"}
{"id":13,"name":"Matéo"}

Proposal

Using option --preserve (proposed in #56), we can notify the FromCacheProcess, and ignore the masking

masking.yml

version: 1
seed: 42
masking:
  - selector:
      jsonpath: id
    mask:
      fromCache: mycache
    preserve: notInCache
  - selector:
      jsonpath: name
    mask:
      randomChoiceInUri: "pimo://nameFR"
caches:
  mycache: {}
cat data.jsonl | pimo --load-cache mycache=cache.jsonl    
{"id":11,"name":"Rolande"}
{"id":2, "name":"Aaron"}
{"id":13,"name":"Matéo"}

[PROPOSAL] New mask : transcode

Transcode mask

A new mask that replace every character by a character of the another class.

By default, the following transcoding character classes are used

  • lowercase letters -> lowercase letters
  • UPPERCASE LETTERS -> UPPERCASE LETTERS
  • Digits -> Digits

Example

- selector:
    jsonpath: "id"
  mask:
    transcode: {}
$ echo '{"id": "12345-ABCD-6789"}' | pimo
{"id": "30274-RPDM-2883"}

Example 2 : Define custom classes

Convert a hexadecimal values, by random hexadecimal values using only lowercase letters

- selector:
    jsonpath: "id"
  mask:
    transcode:
      classes:
      - input: "0123456789abcdefABCDEF"
        output: "0123456789abcdef"
$ echo '{"id": "1ef619-90F"}' | pimo
{"id": "d8e203-a92"}

Mask some characters by a star

- selector:
    jsonpath: "pseudo"
  mask:
    transcode:
      classes:
      - input: "abcdefghijklmnopqrstuvwxyz"
        output: "*"
$ echo '{"pseudo": "mark_23"}' | pimo
{"id": "****_23"}

[PROPOSAL] Define masking in a one-liner

One line masking definition

Sometime, when fast iterating in a testing phase, it can be useful to run a pipeline without writing a masking.yml file.

Example

$ echo '{"value": ""}' | pimo --repeat 5 --mask "value=[{fluxUri: 'pimo://nameFR'}]"
{"value": "Aaron"}
{"value": "Abel"}
{"value": "Abel-François"}
{"value": "Abélard"}
{"value": "Abelin"}

This is equivalent to the use of the following masking.yml :

version: "1"
masking:
  - selector:
      jsonpath: "value"
    mask:
      fluxUri: "pimo://nameFR"

The syntax to define masks in one line can be a minified version of YAML, e.g. : https://onlineyamltools.com/minify-yaml

[BUG] Mask Replacement does not work with nested selectors

Problem

input.jsonl

{"fk":{"name1":"Pierre","name2":"Paul"}}

masking.yml

version: "v1"
masking:
  - selector:
      jsonpath: fk.name1
    mask:
      replacement: fk.name2

expected_output.jsonl

{"fk":{"name1":"Paul","name2":"Paul"}}

actual_output.jsonl

{"fk":{"name1":null,"name2":"Paul"}}

Solution

Change this:

type MaskEngine struct {
	Field string
}

Into this:

type MaskEngine struct {
	Field model.Selector
}

[PROPOSAL] Repeat until

Repeat until a condition is met

$ pimo --repeat-until '{{.value == 0}}'

Or for an infinite stream

$ pimo --repeat-until '{{false}}'

Combined with #17 this command can output all the names in the internal referential nameFR :

$ echo '{"value": ""}' | pimo --repeat-until '{{.value==""}}' --mask "value=[{fluxUri: 'pimo://nameFR'}]"
{"value": "Aaron"}
{"value": "Abel"}
{"value": "Abel-François"}
{"value": "Abélard"}
{"value": "Abelin"}
...

[PROPOSAL] Mask http

New mask HTTP

- selector:
    # the . (dot) selector should select the whole dictionary, so the http response will replace the input
    jsonpath: "."
  mask:
    http:
      method: get
      url: https://www.data.gouv.fr/api/1/users/{{.userid}}/
      # auth:
      # headers:
- selector:
    jsonpath: "first_name"
  mask:
    randomChoiceInUri: pimo://nameFR

[PROPOSAL] From JSON mask

A new mask that can convert an existing JSON string.

- selector:
    jsonpath: "targetfield"
  mask:
    fromjson: "sourcefield"

Examples

$ echo '{"sourcefield": "null", "targetfield": ""}' | pimo
{"sourcefield": "null", "targetfield": null}
$ echo '{"sourcefield": "1", "targetfield": ""}' | pimo
{"sourcefield": "1", "targetfield": 1}
$ echo '{"sourcefield": "1.2", "targetfield": ""}' | pimo
{"sourcefield": "1.2", "targetfield": 1.2}
$ echo '{"sourcefield": "{\"property\": \"hello\"}", "targetfield": ""}' | pimo
{"sourcefield": "{\"property\": \"hello\"}", "targetfield": {"property": "hello"}}

[PROPOSAL] Play : add crafted examples

Proposal to add embedded examples in the PIMO Play website. Examples will be organized by category : Generation, Anonymization, Pseudonymisation and Technical.

Generation examples

Generate first name, last name and email from an existing referential

TODO (using internal referentials nameFR and surnameFR)

Generate fake name, last name and email

TODO (using Markov mask)

Generate a fake phone number

TODO (using Regex mask)

Generate a valid NIR (french individual identification number)

TODO (using RandomDate for the birth date and Template mask for the key)

Generate a valid SIRET (french business identification number)

TODO (using Luhn mask)

Anonymization examples

We will reuse previous generation examples, but with focus on anonymization specifics.

Remove a value

TODO (using Remove mask)

Replace by a constant

TODO (using Constant mask)

Anonymize a value but preserve null, emty or blank values

TODO

Anonymize a technical ID (like a plate number)

TODO (with Transcode mask)

Pseudonymization examples

Add noise to existing data

TODO (using Random Duration mask for dates, Range mask and Template for other types)

Preserve coherence with a hash

TODO (using Hash, HashInURI or seed parameter)

Preserve coherence and enable reversibility with a cache

TODO (add a comment for reversibility)

Preserve coherence and enable reversibility with encryption

TODO (with FF1 mask)

Technical examples

Muliple mask with single selector

TODO

Multiple selector for a single mask

TODO

Preserve parameter

TODO

Seed parameter

TODO

Caches

TODO (unique, reverse, FromCache mask)

Change date formats

TODO (using DateParser mask)

Arrays

TODO (using TemplateEach mask)

Complex structure

TODO (using Pipe mask)

Parse raw JSON

TODO (using FromJson mask)

Temporary fields

TODO (using AddTransient mask)

Generate sequences

TODO (using Increment and FluxURI masks)

[PROPOSAL] Apply template on array

Input data

{"array": ["value1", "value2", "value3"]}

Expected output

{"array": ["Value1", "Value2", "Value3"]}

Problem

Existing masks don't work (pipe, template, ...)

version: "1"
masking:
  - selector:
      jsonpath: "array"
    mask:
      pipe:
        masking:
          - selector:
              jsonpath: "."
            mask:
              template: "{{toUpper .}}"

The result is:

panic: interface conversion: interface {} is string, not model.Dictionary

[BUG] use of fromCache after a mask which causes a change in the type of the value

When you have a field with a calculated value the fromCache mask does not work anymore.

Input file :

{"sexe": 2.25}

Below are examples of masking.yml files where fromCache does not work :

version: "1"
seed: 42
masking:
  - selector:
      jsonpath: "sexe"
    masks:
      - constant: 2
      - fromCache: "cacheSex"

caches:
  cacheSex:
    unique: true
    reverse: true

version: "1"
seed: 42
masking:
  - selector:
      jsonpath: "sexe"
    masks:
      - template : "{{ round ( toString .sexe) 0  }}"
      - fromjson: "sexe"
      - fromCache: "cacheSex"
     
caches:
  cacheSex :
    unique: true
    reverse:  true

Cache file cacheSex.jsonl:

{ "key": "M", "value" : 2}
{ "key": "F", "value" : 1}

pimo --load-cache cacheSex=cacheSex.json -c masking.yml < input.json

Output file :

{"sexe": "M"}

[BUG] template over slice of map cause error

pimo version : 1.12.1

This venom test fail

  - name: template with range over slice of map
    steps:
      - script: rm -f masking.yml
      - script: |-
          cat > masking.yml <<EOF
          version: "1"
          masking:
            - selector:
                jsonpath: "CLE_V2_DOCUMENT.LIEN"
              mask :
                template: '[[if eq (int .CLE_V2_DOCUMENT.ID_LOC) (int "1") ]]OB=2016001-0/0 PDF[[else if has (int .CLE_V2_DOCUMENT.ID_LOC) (list 10 12 13 14 15 16 17 18 19 20 21 22) ]]29Z3lvv1r3a90ULQmGfiwddPPWq5W4fd[[else]][[.CLE_V2_DOCUMENT.LIEN]][[end]]'
          EOF
      - script: sed -i  "s/\[\[/\{\{/g"  masking.yml
      - script: sed -i  "s/\]\]/\}\}/g"  masking.yml
      - script: |-
          pimo <<EOF
          {"CODE_FAMILLE":"DOCDE","ID_DN":"1320000522255","NOM_CLE":"idRCI","TYPE_DOC":"AEMP","VALEUR_CLE":"1003468703","CLE_V2_DOCUMENT":[{"CODE_ERREUR_INTEGRATION":"","DATE_CREATION":"2020-03-11T19:42:12+01:00","DATE_MODIFICATION":null,"ID_DN":"1320000522255","ID_LOC":1,"LIEN":"OB=2020025-328/2272 PDF","METAS":"{\"idDoc\":1320000522255,\"cDoc\":\"AEMP\",\"idGED\":\"22003111942005580020370050331894\",\"cReg\":\"025\",\"taill\":204,\"tFlux\":\"UGUD\",\"idDE\":\"3997529\",\"dtArc\":\"20200310\",\"dtArr\":\"20200310\",\"dtDif\":\"20200310\",\"icon\":\"3997529\",\"cle\":\"R\",\"nbPag\":3,\"sTypo\":\"407\",\"idRCI\":1003468703,\"cCont\":1,\"cStaW\":3,\"caRec\":\"R1\",\"cFac\":\"W\",\"dses\":\"20200310\",\"dStaW\":\"20200311\",\"hDiff\":\"12:04:22\",\"typo\":\"40\",\"cAgen\":\"02012\",\"dtTrt\":\"20200311\",\"iGedO\":\"22003102003101204234489627952860\",\"corb\":\"TORECORD\",\"cSite\":\"25906\"}","STATUT_DN":2,"SUPPRIME":0,"TYPE_DOC":"AEMP","TYPE_MIME":"application/pdf"}]}
          EOF
        assertions:
          - result.code ShouldEqual 0
          - result.systemerr ShouldBeEmpty
          - result.systemoutjson.CLE_V2_DOCUMENT.0.LIEN ShouldEqual OB=2016001-0/0 PDF
       • template-with-range-over-slice-of-map FAILURE
Testcase "template with range over slice of map", step #4: Assertion "result.code ShouldEqual 0" failed. expected: 0  got: 4 (test/workspace/masking_template.yml:75)
Testcase "template with range over slice of map", step #4: Assertion "result.systemerr ShouldBeEmpty" failed. expected '9:09PM ERR Cannot execute pipeline error="Pipeline didn't complete run: template: template:1:29: executing \"template\" at <.CLE_V2_DOCUMENT.ID_LOC>: can't evaluate field ID_LOC in type model.Entry" config=masking.yml duration="550.823µs" input-line=1 output-line=1' to be empty but it wasn't (test/workspace/masking_template.yml:76)
Testcase "template with range over slice of map", step #4: Assertion "result.systemoutjson.CLE_V2_DOCUMENT.0.LIEN ShouldEqual OB=2016001-0/0 PDF" failed. expected: OB=2016001-0/0 PDF  got: <nil> (test/workspace/masking_template.yml:77)
ERROR running target 'test-int': in step 4: executing command: exit status 2

bug [randomChoice] use differents seed for differents field

Using this configuration

version: "1"
seed: 3
masking:
  - selector:
      jsonpath: "name"
    mask:
      randomChoiceInUri: "file://../names.txt"
  - selector:
      jsonpath: "name2"
    mask:
      randomChoiceInUri: "file://../names.txt"

name and name2 are always equal.

[PROPOSAL] add unixEpoch format in date parser

Problem

Pimo can't trasnform unixEpoch timestamp (1647512434) to a date format (Thu Mar 17 2022 10:20:34 GMT+0000).

Solution

add "unixEpoch" parameter in date parser.

  - selector:
      jsonpath: "date"
    mask:
      dateParser:
        inputFormat: "unixEpoch"
        outputFormat: "01/02/06"

transform input

{
  "date": 1647512434
}

to output

{
  "date": "17/03/22"
}

unixEpoch can be use as outputFormat argument

  - selector:
      jsonpath: "date"
    mask:
      dateParser:
        inputFormat: "01/02/06"
        outputFormat:  "unixEpoch"

[PROPOSAL] Implicit add

Implicit add if the field is missing

Frequently, a field needs to be created, then valorized.

version: "1"
masking:
  - selector :
      jsonpath: "gender"
    mask:
      add: ""
  - selector :
      jsonpath: "gender"
    mask:
      randomChoice:
        - "M"
        - "F"

This can be simplified with an auto-add feature

version: "1"
masking:
  # directly use the valorization mask
  - selector :
      jsonpath: "gender"
    mask:
      randomChoice:
        - "M"
        - "F"

Auto-add could also be disabled by default and enabled on demand

version: "1"
masking:
  - selector :
      jsonpath: "gender"
    autoadd: true
    mask:
      randomChoice:
        - "M"
        - "F"

[Proposal] new markov mask

Motivation

To generate random string the regex mask is limited for small text but can't generate pseudo natural language.

Solution

Use markov chain [1] to produce pseudo text based on example. Add a new markov mask with transitions as parameters

mask: 
  markov :
    # protection against infinity loop
    max-size: 20
    parameters:
       - from: "I am"
         to: "a"
         weight : 0.5
       - from: "I am"
         to: "not"
         weight : 0.5
       - from: "am a"
         to: "free"
         weight : 1
      - from: "free"
         to: "man"
         weight : 1

Parameters are extremely verbose and should not be compute by human. Parameters should be externalized in a json file.

mask: 
  markov :
    # protection against infinity loop
    max-size: 20
    parameters: free-man.json

Or better compute from an sample text

mask: 
  markov :
    # protection against infinity loop
    max-size: 20
    sample: free-man.txt

[1] https://en.wikipedia.org/wiki/Markov_chain#Markov_text_generators

[PROPOSAL] Templated uri for mask randomInUri

Problem

Conditional random choice is not easy. All possibilities have to be choice and the template switch to the valid choice.

For example to choice name by gender :

- selector:
      jsonpath: "name_F"
    mask:
      add : ""
- selector:
      jsonpath: "name_F"
    mask:
      randomChoiceInURI : "file://names_F.txt"
- selector:
      jsonpath: "name_M"
    mask:
      add : ""
- selector:
      jsonpath: "name_M"
    mask:
      randomChoiceInURI : "file://names_M.txt"
- selector:
      jsonpath: "name"
    mask:
      template: |-
        {{if .gender "F"}}{{.name_ F}}{{else}}{{.name_M}}{{end}}
# Remove temporaries fields
- selector:
      jsonpath: "name_F"
    mask:
      remove : true
- selector:
      jsonpath: "name_M"
    mask:
      remove : true

This is a pain for a two categories choice and is unusable for hundred categories choice.

Solution

This issue propose to use template in uri path.

For example :

- selector:
      jsonpath: "name"
    mask:
      randomChoiceInUri: "file://names_{{.gender}}.txt

[PROPOSAL] Enrich template functions by YAML definition

Problem

Complex multi steps template can be difficult to read. We need a solution to mutualize data processing formulas in a single configuration element.

Proposal

  • Add a new root section in masking.yml functions :
  • Load functions in the template engine (as it is already done for Sprig functions, or NoAccent)
version: "1"
functions:
  rangLettre:
    params:
      lettre: string
    code: -|
      return lettre - 'A' + 1;
masking:
  - selector:
      jsonpath: "rang_lettre_J"
    mask:
      template: "{{rangLettre 'J'}}"

OR (simpler) :

version: "1"
functions: -|
  func rangLettre(lettre) {
    return lettre - 'A' + 1;
  }
masking:
  - selector:
      jsonpath: "rang_lettre_J"
    mask:
      template: "{{rangLettre 'J'}}"

Extends use of function in masks

version: "1"
functions: -|
  func anonRIB(rib) {
    ...
    return anonimizedRIB;
  }
masking:
  - selector:
      jsonpath: "RIB"
    mask:
      call: "anonRIB"

With params :

version: "1"
functions: -|
  func anonRIB(rib) {
    ...
    return anonimizedRIB;
  }
masking:
  - selector:
      jsonpath: "RIB"
    mask:
      call: 
        name: "anonRIB"
        paramsFromContext:
           - "RIB"

[bug] [fluxUri] Cannot use cache with mask fluxUri

With the following pipeline, pimo does not save ids in cache cacheId:
masking.yml

version: "1"
seed: 42
masking:
  - selector:
      jsonpath: "ID"
    mask:
      fluxUri: "file://test-id.csv"
    cache: "cacheId"
caches:
  cacheId : {}

test-id.csv

1001
1002
1003

Executing pimo with this config, cacheIds.jsonl file is empty.

$ pimo --dump-cache cacheId=cacheId.jsonl << EOF
> {"ID":1}
> {"ID":2}
> {"ID":3}
> EOF
{"ID":1001}
{"ID":1002}
{"ID":1003}

In comparaison using mask randomChoiceInUri saves ids just fine

fromJson with integer or float value

Following the purpose of this mask, i've tried the following :

  - name: entry float value
    steps:
      - script: rm -rf masking.yml
      - script: |-
          cat > masking.yml <<EOF
          version: "1"
          masking:
            - selector:
                jsonpath: "targetfield"
              mask:
                fromjson: "sourcefield"
          EOF
          echo '{"sourcefield": "{\"property\":\"1.2\"}", "targetfield": ""}' | pimo
        assertions:
          - result.code ShouldEqual 0
          - result.systemout ShouldEqual {"sourcefield":"{\"property\":\"1.2\"}","targetfield":{"property":1.2}}
          - result.systemerr ShouldBeEmpty

In my comprehension of the mask, it should work this way but the targetfield property returns a string, not a float (works the same with integer)

[PROPOSAL] condition the execution of a mask for null or empty value

Problem

To preserve null or "" values in the output we have to use a template mask with conditional test.

  - selector:
      jsonpath: "comment"
    mask:
      template: |-
        {{if kindIs "string" .comment}}{{if eq "" .comment}}""{{else}}"Com_Fiche"{{end}}{{else}}null{{end}}

This kind of test is verbose and have to be repeat for each mask in the chain.

Proposal

This is a proposal to add new attribute preserve in pipeline's step fill with one of options "null", "empty", "blank" (null or empty), none (default).

The equivalent mask using the preserve feature is

  - selector:
      jsonpath: "comment"
    preserve: "blank"
    mask:
      template: "Com_Fiche"

[PROPOSAL] Pass multiple masking config to the command line

Possibility to pass multiple yaml configuration, that will be applied in the order provided by command line arguments

$ cat data.jsonl | pimo -c format.yml -c clean.yml -c masking.yml

Equivalent to

$ cat data.jsonl | pimo -c format.yml | pimo -c clean.yml | pimo -c masking.yml

[PROPOSAL] Markov sample separator

Problem

Markov Mask can be used on different samples:

  • lists of words
  • paragraphs

For list of words, we would want to read the file line by line (exemples: nameFR, pokemons, etc..)
For entire paragraphs, or text that can be spread over multiple lines.

Proposal

In addition of the separator parameter that determine the way we split the text (word by word, character by character, etc..), we would want a parameter that helps the mask to understand the structure of the text:

  • is it a list?
  • is it paragraphs?
  • is it something else?

Anyway, markov mask should have a default configuration in order not to make it unusable.

Originally posted by @baguettte in #81 (comment)

[PROPOSAL] Structured logging with -v flag

PIMO need to log what is happening in the stderr file.

The log might be structured with https://github.com/sirupsen/logrus, or (better performance) : https://github.com/rs/zerolog, https://github.com/uber-go/zap

The level of verbosity is passed via the -v flag, the default value (0) does not log anything, the other possible values are :

  1. error : log only errors
  2. warn : same as level 1 + warnings that should be checked by user
  3. info : same as level 2 + information about what is processed
  4. debug : same as level 3 + debugging information, to analyse what can cause an unexpected behavior
  5. trace : same as level 4 + tracing of events in code (enter function, exit function)

Example :

$ echo "{}" | pimo -v3 > result.jsonl
INFO[0000] Reading file from disk                      definition=file://.masking.yml
INFO[0000] Begin processing of pipeline                definition=file://.masking.yml
WARN[0000] Ignoring mask because path is non-existent  definition=file://.masking.yml path=name mask=randomInt

Logs can be in JSON format with --log-json flag

$ echo "{}" | pimo -v3 --log-json > result.jsonl
{"definition":"file://.masking.yml","level":"info","msg":"Reading file from disk","time":"2014-03-10 19:57:38.562264131 -0400 EDT"}
{"definition":"file://.masking.yml","level":"info","msg":"Begin processing of pipeline","time":"2014-03-10 19:57:38.562264131 -0400 EDT"}
{"definition":"file://.masking.yml","level":"warn","msg":"Ignoring mask because path is non-existent","time":"2014-03-10 19:57:38.562264131 -0400 EDT","path":"name","mask":"randomInt"}

feat(template) : access to context in nested arrays

I want to be able to modify a value in nested arrays by referencing the current value with a template mask.

data.jsonl

{"elements":[{"persons":[{"name":"bob"},{"name":"john"}]}]}

Expected

$ pimo <data.jsonl

{"elements":[{"persons":[{"name":"BOB"},{"name":"JOHN"}]}]}

Solutions that does not work

Using the same path as the selector

This will refer to a field that does not exist {"elements":{"persons":{"name":"bob"}}}, and generate an error.

masking.yml

version: "1"
seed: 42
masking:
  - selector:
      jsonpath: "elements.persons.name"
    mask:
      # this go template syntax refer to a field that is not in a nested array
      template: "{{upper .elements.persons.name}}"

Result

$ pimo <data.jsonl

template: template:1:17: executing "template" at <.elements.persons.name>: can't evaluate field persons in type model.Entry

Using go template syntax to access elements in array

This will always use the elements of index 0, and will only give the expected result for the first element bob.

masking.yml

version: "1"
seed: 42
masking:
  - selector:
      jsonpath: "elements.persons.name"
    mask:
      # this go template syntax refer to a single value of index (0;0) 
      template: "{{upper (index (index .elements 0).persons 0).name}}"

Result

$ pimo <data.jsonl

{"elements":[{"persons":[{"name":"BOB"},{"name":"BOB"}]}]}

[bug] pipe mask after a fromjson mask yield a panic error

With the following pipeline, pimo stop on panic error.

# version du fichier de configuration PIMO
version: "1"
# Initialisation du générateur pseudo-aléatoire (optionel)
seed: 42
# Liste ordonnée des masque à appliquer
masking:
  - selector:
      jsonpath: "numberOfPet"
    mask:
      add: ""

  - selector:
      jsonpath: "numberOfPet"
    mask:
      randomInt:
        min: 4
        max: 10

  - selector:
      jsonpath: "fk_pets_owner_id"
    mask:
      add: ""
  - selector:
      jsonpath: "fk_pets_owner_id"
    mask:
      template: |
        [
          {{- range  $index := until (int .numberOfPet) -}}
            {{- if $index }},{{end -}}
            {
              "id": 7
            }
          {{- end -}}
        ]
  - selector:
      jsonpath: "fk_pets_owner_id"
    mask:
      fromjson: "fk_pets_owner_id"


  - selector:
      jsonpath: "fk_pets_owner_id"
    mask:
      pipe:
        masking:
          - selector:
              jsonpath: id
            mask:
              incremental:
                start: 1
                increment: 1
pimo --empty-input > with-pipe-result.json
panic: interface conversion: interface {} is map[string]interface {}, not model.Dictionary

goroutine 1 [running]:
github.com/cgi-fr/pimo/pkg/model.CleanDictionary(...)
        /workspace/pkg/model/ordered_dict.go:95
github.com/cgi-fr/pimo/pkg/model.CleanDictionarySlice(0x873720, 0xc00000ceb8, 0xc000024468, 0x10, 0xc0001e9dd8)
        /workspace/pkg/model/ordered_dict.go:104 +0x50c
github.com/cgi-fr/pimo/pkg/pipe.MaskEngine.MaskContext(0x0, 0x0, 0x9b0f10, 0xc000039540, 0x0, 0x0, 0x0, 0x0, 0xc00000cf00, 0xc000024468, ...)
        /workspace/pkg/pipe/pipe.go:69 +0x145
github.com/cgi-fr/pimo/pkg/model.(*MaskContextEngineProcess).ProcessDictionary.func2(0xc00000cf00, 0xc00000cf00, 0xc000024468, 0x10, 0x873720, 0xc00000ceb8, 0xc0001e9d58, 0xc00017b838, 0xc0001e9d40)
        /workspace/pkg/model/process_maskcontext.go:45 +0xb5
github.com/cgi-fr/pimo/pkg/model.selector.applyContext(0xc000024468, 0x10, 0x0, 0x0, 0xc00000cf00, 0xc00000cf00, 0x873720, 0xc00000ceb8, 0xc00000e480, 0x1, ...)
        /workspace/pkg/model/selector.go:203 +0x9c
github.com/cgi-fr/pimo/pkg/model.selector.applySubContext(0xc000024468, 0x10, 0x0, 0x0, 0xc00000cf00, 0xc00000cf00, 0xc00000e480, 0x1, 0x1, 0xc00000cf30)
        /workspace/pkg/model/selector.go:196 +0x339
github.com/cgi-fr/pimo/pkg/model.selector.ApplyContext(...)
        /workspace/pkg/model/selector.go:170
github.com/cgi-fr/pimo/pkg/model.(*MaskContextEngineProcess).ProcessDictionary(0xc0001db360, 0xc00000ced0, 0x9aa800, 0xc0001db380, 0x0, 0x0)
        /workspace/pkg/model/process_maskcontext.go:44 +0x1de
github.com/cgi-fr/pimo/pkg/model.(*ProcessPipeline).Next(0xc0000395c0, 0x0)
        /workspace/pkg/model/model.go:397 +0x96
github.com/cgi-fr/pimo/pkg/model.SimpleSinkedPipeline.Run(0x9b2a28, 0xc0000395c0, 0x9aeda0, 0xc0001db3a0, 0x0, 0x0)
        /workspace/pkg/model/model.go:445 +0x15f
main.run()
        /workspace/cmd/pimo/main.go:196 +0xba9
main.main.func1(0xc0001be280, 0xc0000694a0, 0x0, 0x1)
        /workspace/cmd/pimo/main.go:94 +0x25
github.com/spf13/cobra.(*Command).execute(0xc0001be280, 0xc00001e050, 0x1, 0x1, 0xc0001be280, 0xc00001e050)
        /home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:856 +0x2c2
github.com/spf13/cobra.(*Command).ExecuteC(0xc0001be280, 0xc00017bf20, 0x1, 0x1)
        /home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:960 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
        /home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:897
main.main()
        /workspace/cmd/pimo/main.go:121 +0x713

[PROPOSAL] Randomize with custom seed

Problem

Sometimes, we need coherence in generated data (X always gives Y).
Using a randomization-based mask, such as regex, it's a difficult process that requires the use of caches and can be time/memory consuming.

Solution

Add a parameter in the masking file to let the user force the value of seed used by RNG in the masks.

Examples

  - selector:
      jsonpath: "phone"
    seed:
      template: "{{.phone}}"
    mask:
      regex: "0[1-7]( ([0-9]){2}){4}"
  - selector:
      jsonpath: "phone"
    seed:
      field: "phone"
    mask:
      regex: "0[1-7]( ([0-9]){2}){4}"

[PROPOSAL] multiple jsonpath in selector for same mask(s)

Proposal

Possibility to enter a list of jsonpath in selectorType to apply the same mask to multiple fields:

masking.yml

version: v1
  masking:
    - selector:
        jsonpaths:
          - name1
          - name2
          - name2
      mask:
        randomChoiceInUri: "pimo://nameFR"

Equivalent to:

version: v1
  masking:
    - selector:
        jsonpath: name1
      mask:
        randomChoiceInUri: "pimo://nameFR"
    - selector:
        jsonpath: name2
      mask:
        randomChoiceInUri: "pimo://nameFR"
    - selector:
        jsonpath: name3
      mask:
        randomChoiceInUri: "pimo://nameFR"

[PROPOSAL] export a pimo play sandbox as venom test

This is a proposal to add link in the pimo play page to export current status as non regression test in venom test.

We could create a drop-down buttons list on the upper right corner (similar to https://jqplay.org/).
Buttons :

Share : copy link
Export as Venom Test

image

the venom test template is :

name: "test generated  from pimoplay <current url>"
testcases:
- name: declaring cache
  steps:
  - script: rm -f masking.yml
  - script: |-
      cat > masking.yml <<EOF
      <content of the masking cell>
      EOF
  - script: |-
      cat > input.jsonl <<EOF
      <content of the input cell in jsonline format>
      EOF
  - script: |-
      cat > expected.jsonl <<EOF
      <content of the output cell in jsonline format>
      EOF
  - script: |-
      < input.jsonl pimo > result.jsonl
    assertions:
    - result.code ShouldEqual 0
  - script: |-
      diff expected.jsonl result.jsonl
    assertions:
    - result.code ShouldEqual 0
    - result.systemout ShouldBeEmpty
  • Share Button copy current link into the clipboard.
  • Venom Test button download a pimo-test.yaml file as a base64 data url

[BUG] Null values protection

PIMO is very sensitive to null values, most of the masks generate panic errors when encountering null values.

10:48AM INF Mask hash config=masking.yml context=stdin[1] input-line=1 output-line=1 path=prenom
panic: interface conversion: model.Entry is nil, not string

goroutine 1 [running]:
github.com/cgi-fr/pimo/pkg/hash.MaskEngine.Mask(0xc000107000, 0x375, 0x375, 0x0, 0x0, 0xc00026f600, 0x2, 0x2, 0x10, 0xc739c0, ...)
...

The default behavior when encountering a null value that can't be handled should be to ignore it. A null value is never a sensitive data to anonymize.

[PROPOSAL] Time Series generator

Time series generator

This issue is a proposal to generate and simulate time serie from a set of sensors.

Time series generator configuration

Simple configuration to generate time serie with a period of 5 seconds from 2012-04-23T18:25:00.000Z to 2012-04-23T18:25:15.000Z.

masks:
  - selector:
      jsonpath : 'timestamp'
      mask: 
        timeserie:
          period: "5s" # 1s by default
          from: "2012-04-23T18:25:00.000Z"  # current time by default
          to: "2012-04-23T18:25:15.000Z" # empty by default that mean endless time serie generator         

The following command generate timeserie dataset

$ echo '{"timestamp": "" }' |  pimo
{"timestamp": "2012-04-23T18:25:00.000Z" }
{"timestamp": "2012-04-23T18:25:05.000Z" }
{"timestamp": "2012-04-23T18:25:10.000Z" }

Multi-Sensors configuration

Timeserie mask is not streamable. It start to generate data after the close of the input stream. Each input line is a sensor configuration and timeserie mask generate data for each sensors.

$ echo '{ "id": 1, "timestamp": "" }\n'{"id": 2, "timestamp": "" }' |  pimo
{"id": 1, "timestamp": "2012-04-23T18:25:00.000Z" }
{"id": 2, "timestamp": "2012-04-23T18:25:00.000Z" }
{"id": 1, "timestamp": "2012-04-23T18:25:05.000Z" }
{"id": 2, "timestamp": "2012-04-23T18:25:05.000Z" }
{"id": 1, "timestamp": "2012-04-23T18:25:10.000Z" }
{"id": 2, "timestamp": "2012-04-23T18:25:10.000Z" }

Time Serie simulation

If the simulate option is activating timeserie wait for the period between each data generation.

masks:
  - selector:
      jsonpath : 'timestamp'
      mask: 
        timeserie:
          period: "5s" # 1s by default
          from: "2012-04-23T18:25:00.000Z"  # current time by default
          to: "2012-04-23T18:25:15.000Z" # empty by default that mean endless time serie generator    
          simulate: true # false by default     

[PERF] Mask pipe executed N times on array of length N

Mask pipe executed N times on array of length N

Data

data.json

{
    "organizations": [
        {
            "domain": "company.com",
            "persons": [
                {
                    "name": "leona",
                    "surname": "miller",
                    "email": ""
                },
                {
                    "name": "joe",
                    "surname": "davis",
                    "email": ""
                }
            ]
        },
        {
            "domain": "company.fr",
            "persons": [
                {
                    "name": "alain",
                    "surname": "mercier",
                    "email": ""
                },
                {
                    "name": "florian",
                    "surname": "legrand",
                    "email": ""
                }
            ]
        }
    ]
}

masking.yml

version: "1"
seed: 42
masking:
  - selector:
      jsonpath: "organizations.persons"
    mask:
      pipe:
        injectParent: "org"
        masking:
          - selector:
              jsonpath: "email"
            mask:
              template: "{{.name}}.{{.surname}}@{{.org.domain}}"

Execution

Note: data.json is passed twice to pimo to gererate two lines

$ cat data.json data.json | jq -c "."  | pimo --log-json -v5 >/dev/null 2> >( jq "." | mlr --ijson --opprint --barred cat)

Actual result

+-------+-------------+-------------+-----------------------+---------------+
| level | config      | line-number | path                  | message       |
+-------+-------------+-------------+-----------------------+---------------+
| info  | masking.yml | 1           | organizations.persons | Mask pipe     |
| info  | -           | 1           | email                 | Mask template |
| info  | -           | 2           | email                 | Mask template |
| info  | masking.yml | 1           | organizations.persons | Mask pipe     |
| info  | -           | 1           | email                 | Mask template |
| info  | -           | 2           | email                 | Mask template |
| info  | masking.yml | 2           | organizations.persons | Mask pipe     |
| info  | -           | 1           | email                 | Mask template |
| info  | -           | 2           | email                 | Mask template |
| info  | masking.yml | 2           | organizations.persons | Mask pipe     |
| info  | -           | 1           | email                 | Mask template |
| info  | -           | 2           | email                 | Mask template |
+-------+-------------+-------------+-----------------------+---------------+

Expected result

+-------+-------------+-------------+-----------------------+---------------+
| level | config      | line-number | path                  | message       |
+-------+-------------+-------------+-----------------------+---------------+
| info  | masking.yml | 1           | organizations.persons | Mask pipe     |
| info  | -           | 1           | email                 | Mask template |
| info  | -           | 2           | email                 | Mask template |
| info  | masking.yml | 2           | organizations.persons | Mask pipe     |
| info  | -           | 1           | email                 | Mask template |
| info  | -           | 2           | email                 | Mask template |
+-------+-------------+-------------+-----------------------+---------------+

[BUG] Implement missing flags

The README documentation mention these flags:

--skip-line-on-error This flag will totally skip a line if an error occurs masking a field.
--skip-field-on-error This flag will return output without a field if an error occurs masking this field.

But are currently not implemented.

Either remove them from documentation or implement them.

[BUG] PIMO panic when using both --repeat flag and pipe mask

Version : pimo v1.4.0

  1. Configure masking.yml with a pipe mask.
  2. Execute pimo with --repeat 2

Expected

First line of input is processed twice

Actual

pimo fails with panic message

panic: interface conversion: model.Entry is []model.Dictionary, not []model.Entry

goroutine 1 [running]:
github.com/cgi-fr/pimo/pkg/pipe.MaskEngine.MaskContext(0x0, 0x0, 0x99ae70, 0xc00022a8c0, 0x0, 0x0, 0xc00002f9e0, 0x2, 0xc000321b30, 0xc0000246c6, ...)
        /workspace/pkg/pipe/pipe.go:72 +0xccb
github.com/cgi-fr/pimo/pkg/model.(MaskContextEngineProcess).ProcessDictionary.func2(0xc000321890, 0xc000321b30, 0xc0000246c6, 0xe, 0x861fc0, 0xc000191b60, 0x3, 0x6, 0xc000382420)
        /workspace/pkg/model/process_maskcontext.go:48 +0xb5
github.com/cgi-fr/pimo/pkg/model.selector.applyContext(0xc0000246c6, 0xe, 0x0, 0x0, 0xc000321890, 0xc000321b30, 0x861fc0, 0xc000191b60, 0xc0003d8540, 0x1, ...)
        /workspace/pkg/model/selector.go:177 +0x9c
github.com/cgi-fr/pimo/pkg/model.selector.applySubContext(0xc0000246c6, 0xe, 0x0, 0x0, 0xc000321890, 0xc000321b30, 0xc0003d8540, 0x1, 0x1, 0xc0000f0f58)
        /workspace/pkg/model/selector.go:170 +0x32f
github.com/cgi-fr/pimo/pkg/model.selector.applySubContext(0xc0000246c0, 0x5, 0x7fd60a5fbf58, 0xc000272280, 0xc000321890, 0xc000321890, 0xc0003d8540, 0x1, 0x1, 0xc00000c6c0)
        /workspace/pkg/model/selector.go:166 +0x298
github.com/cgi-fr/pimo/pkg/model.selector.ApplyContext(...)
        /workspace/pkg/model/selector.go:144
github.com/cgi-fr/pimo/pkg/model.(MaskContextEngineProcess).ProcessDictionary(0xc0002722c0, 0xc000321860, 0x994860, 0xc0002722e0, 0x88c7c0, 0xc0000f1101)
        /workspace/pkg/model/process_maskcontext.go:47 +0x2fe
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00022a940, 0xc0003b58c0)
        /workspace/pkg/model/model.go:385 +0x96
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00022b0c0, 0xc0000f11c8)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00022b840, 0xc0000f1220)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc0002fe600, 0x8f12e0)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc0002fef00, 0x40f3db)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc0003952c0, 0x480b4f)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc000395700, 0xc0007e2d80)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc000395b40, 0x56d852)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc000395f80, 0x1)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc000395fc0, 0x9a1820)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00048c240, 0xc0002f0100)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00048c500, 0x98)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00048c7c0, 0xc000190100)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00048c800, 0x21)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00048cac0, 0xc000082000)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00048cd80, 0xc0000f1558)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00048d040, 0x22)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00048d080, 0x98)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00048d300, 0x57bced)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00048d340, 0x56d742)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00048d5c0, 0x7fd60a92cfff)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00048d600, 0x300)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00048d880, 0xc00)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00008ecc0, 0x40d7fb)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00008f940, 0x0)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00018c440, 0xc0000f17b8)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00018cbc0, 0xd)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00018d000, 0xc000162ebc)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00018d440, 0xd)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00018d880, 0x0)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00018dcc0, 0x4141dc)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00018dd00, 0x90e98c)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00018dd40, 0xc000162eb0)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00018dd80, 0xc000060020)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00018ddc0, 0xc00079d770)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00018de00, 0x901d9c)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.(ProcessPipeline).Next(0xc00018dec0, 0xc00079d770)
        /workspace/pkg/model/model.go:384 +0x49
github.com/cgi-fr/pimo/pkg/model.SimpleSinkedPipeline.Run(0x99c928, 0xc00018dec0, 0x998d40, 0xc000273100, 0x998d40, 0xc000273100)
        /workspace/pkg/model/model.go:434 +0x89
main.run()
        /workspace/cmd/pimo/main.go:178 +0xba9
main.main.func1(0xc000095b80, 0xc00008e700, 0x0, 0x4)
        /workspace/cmd/pimo/main.go:88 +0x25
github.com/spf13/cobra.(Command).execute(0xc000095b80, 0xc00001e0b0, 0x4, 0x4, 0xc000095b80, 0xc00001e0b0)
        /home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:856 +0x2c2
github.com/spf13/cobra.(Command).ExecuteC(0xc000095b80, 0xc4b035, 0x9058a3, 0x13)
        /home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:960 +0x375
github.com/spf13/cobra.(Command).Execute(...)
        /home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:897
main.main()
        /workspace/cmd/pimo/main.go:103 +0x633

[PROPOSAL] temp property to auto-remove in post-processing

Auto remove

Instead of doing this pattern

version: "1"
masking:
  - selector :
      jsonpath: "temp"
    mask:
      add: "temp_value"
...
<use the field>
...
  - selector:
      jsonpath: "temp"
    mask:
      remove: true

Do this

version: "1"
masking:
  - selector :
      jsonpath: "temp"
    temp: true
    mask:
      add: "temp_value"
...
<use the field>

[PROPOSAL] Chain multiple masks on the same jsonpath

Chain masks on the same jsonpath

A new dedicated mask

  - selector :
      jsonpath: "birthdate"
    mask:
      chain:
        - randDate:
            dateMin: "1960-01-01T00:00:00Z"
            dateMax: "2002-12-31T00:00:00Z"
        - dateParser:
            outputFormat: "2006-01-02"

Can be used to split a big YAML file in chunks or to reuse the same YAML on different paths

  - selector :
      jsonpath: "birthdate"
    mask:
      chain:
        definition: mask-date.yaml
  - selector :
      jsonpath: "otherdate"
    mask:
      chain:
        definition: mask-date.yaml

bug: only first mask is processed with nested arrays

data.jsonl

{"elements":[{"persons": [{"phonenumber": "027","email": "[email protected]"}]}]}

masking.yml

version: "1"
seed: 42
masking:
  - selector:
      jsonpath: "elements.persons.phonenumber"
    mask:
      regex: "0[1-7]( ([0-9]){2}){4}"
  - selector:
      jsonpath: "elements.persons.email"
    mask:
      regex: "[a-z]{10}@company\.com"

Result

{
  "elements": [
    {
      "persons": [
        {
          "email": "[email protected]",
          "phonenumber": "04 87 48 09 96"
        }
      ]
    }
  ]
}

Expected

{
  "elements": [
    {
      "persons": [
        {
          "email": "[email protected]",
          "phonenumber": "04 87 48 09 96"
        }
      ]
    }
  ]
}

[BUG] Cache should apply on whole masking item

Example YAML

version: "1"
caches:
  mycache:
    unique: true
masking:
  - selector:
      jsonpath: "test"
    masks:
      - add: "1"
      - constant: "1"
    cache: mycache

Expected

$ pimo --empty-input
1

Actual

$ pimo --empty-input
5:21PM ERR Cannot execute pipeline error="Pipeline didn't complete run: Unique value not found" config=masking.yml duration="598.2µs" input-line=1 output-line=1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.