GithubHelp home page GithubHelp logo

cgi-fr / lino Goto Github PK

View Code? Open in Web Editor NEW
20.0 5.0 3.0 2.22 MB

Large Input Narrow Output - LINO is a simple ETL tool to manage tests data.

Home Page: https://cgi-fr.github.io/lino-doc/

License: GNU General Public License v3.0

Dockerfile 0.60% Shell 0.44% Makefile 0.49% Go 98.01% SQLPL 0.46%
testdata sampling jsonlines relational-databases rdbms graph

lino's People

Contributors

adrienaury avatar capkicklee avatar chao-ma5566 avatar dependabot[bot] avatar giraud10 avatar p0labrd avatar youen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

lino's Issues

[PROPOSAL] Import file

table.yaml

  - name: attachments
    keys:
      - id
    columns:
      - name: content
        import: file

LINO will interpret the input value as a path to a file, and will use the content of the file as data.

[PROPOSAL] Build database schema on push

lino push --create-schema insert target-db

Will create

  • tables
  • columns
  • primary keys
  • foreign keys

Based on the first data line encountered on each table.

Other possibility, a new push action

lino push schema target-db

[Proposal]push delete on conditionnal clause

Hi,
We have need to purge some data in a given table according to a condition (where MyData>9999).

Currently, we can do that with

$ lino pull --table MyTable --where 'MyData>9999' MyAliasDB | lino push delete --table MyTable  MyAliasDB 

Ideally we would have

$ lino push delete --table MyTable  --where 'MyData>9999'  MyAliasDB 

lino push delete with conditionnal must ignore input data
Thanks

[BUG] Push type RAW base oracle (ORA-01465)

Problème

Lors d'un push LINO, une erreur oracle est survenue: ORA-01465: nombre hexadécimal non valide.

D'après le thread: https://stackoverflow.com/questions/33708959/ora-01465-invalid-hex-number-in-oracle-while-using-blob
il faudrait utiliser une fonction du type utl_raw.cast_to_raw pour convertir la chaîne de caractère en type compatible oracle.

(remarque: même solution proposée pour erreur #51, mais problèmes de tailles de chaîne de caractères en plus dans ce cas , >4000)

Peut-être intégrer la gestion des types dans le dialecte oracle de LINO.

[BUG?] Pull behavior changed from v1.10.0 on

Context

Until LINO v1.9.2 included, when pulling a table with a child relationship, data from child were pulled as a JSON Object :

{
    "myField1": {
        "childField1": "",
        "childField2": "2021-07-06T09:59:13+02:00",
        "childField3": "2021-07-13T12:04:19+02:00",
        ...
    },
    "myField2": "bla",
    "myField3": "9921196128284",
    ...
}

Starting from v1.10.0, same data are pulled as a JSON Array containing a single JSON Object :

{
    "myField1": [
        {
            "childField1": "",
            "childField2": "2021-07-06T09:59:13+02:00",
            "childField3": "2021-07-13T12:04:19+02:00",
            ...
        }
    ]
    "myField2": "bla",
    "myField3": "9921196128284",
    ...
}

Asking

Is it a desired behavior? If so, is there any way to get the previous one without going back to v1.9.2?

Could this also be documented so that it becomes a known kind of breaking change?

Thanks

[BUG][Oracle] panic during pull

Got this error while running a lino push command with lino-toolbox:1.4.0 on an oracle 19 database table.

the command
lino pull ${SOURCE} --table ${TABLE} -l 0 | lino -v 3 push truncate --table ${TABLE} ${DESTINATION} -e error-ref-${TABLE}.json

the stacktrace

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xa01ab0]
goroutine 1 [running]:
github.com/cgi-fr/lino/internal/infra/push.OracleDialect.ConvertValue({}, {0xad6de0?, 0xc0006800c0?}, {{0xc00012a1f8, 0x8}, 0x0, {0x0, 0x0}})
	/workspace/internal/infra/push/datadestination_oracle.go:185 +0x70
github.com/cgi-fr/lino/internal/infra/push.(*SQLRowWriter).Write(0xc0004a2180, 0xc00068a1b0, 0xc0000b4140?)
	/workspace/internal/infra/push/datadestination_sql.go:345 +0x2ea
github.com/cgi-fr/lino/pkg/push.pushRow(0x42f7b0?, {0xcde350, 0xc0003c30e0}, {0xcdf300, 0xc0000b4140}, {0xcddb60, 0xc00005a4e0}, 0x0, {0xcd9eb8, 0xc000120980}, ...)
	/workspace/pkg/push/driver.go:224 +0x3df
github.com/cgi-fr/lino/pkg/push.Push({0xcde318, 0xc00005a510}, {0xcde350, 0xc0003c30e0}, {0xcddb60, 0xc00005a4e0}, 0x0?, 0x1f4, 0x0?, {0xcd7000, ...}, ...)
	/workspace/pkg/push/driver.go:70 +0x2e5
github.com/cgi-fr/lino/internal/app/push.NewCommand.func3(0xc0003cd500?, {0xc00013d280, 0x2, 0x8?})
	/workspace/internal/app/push/cli.go:155 +0x845
github.com/spf13/cobra.(*Command).execute(0xc0003cd500, {0xc00013d200, 0x8, 0x8})
	/home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:987 +0xa91
github.com/spf13/cobra.(*Command).ExecuteC(0x1da1640)
	/home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:1115 +0x425
github.com/spf13/cobra.(*Command).Execute(...)
	/home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:1039
main.main()
	/workspace/cmd/lino/main.go:138 +0x25

was working with lino-toolbox:1.2.0 and 1.3.0

[PROPOSAL] Feed primary key field with sequence.nextval and foreign key with sequence.currval

Context

We have to insert the jsonobject in a table film

{
  "film_id": 452
  "title" :  "The Matrix"
   "year": 1999
}

Problem

During a dataset insert primary key have to be unique to avoid conflict with existing values. Also sequences have to be updated upper than the maximum primary key inserted to avoid future conflict.

Solution

Add sequence information in table.yaml and use it during lino push insert process.

version: v1
tables:
  - name: film
    keys:
      - film_id
    sequence :
      film_id:  sequence_film_id

If the film_id is omitted, lino use the value of sequence_film_id.nextval to feed the primary_key.

Relation

If the primary key is a part of a relation

  - name: film_film_category
    parent:
        name: film
        keys:
          - film_id
    child:
        name: film_category
        keys:
          - film_id

inserting the following JSON object

{
  "title" :  "The Matrix",
   "year": 1999,
   "film_film_category" : [
      {
         "category_id": 151
      },
      {
         "category_id": 452
      },
   ]
}

will produce the sql statements

insert into film (film_id, title, year)
values (sequence_film_id.nextval, 'The Matrix', 1999);
insert into film_category (film_id, category_id)
values (sequence_film_id.currval, 151 );
insert into film_category (film_id, category_id)
values (sequence_film_id.currval, 452 );

[PROPOSAL] Exported columns definition

Objective

A new section columns under tables elements inside tables.yaml file lists exported columns in the JSON output.

Example :

version: v1
tables:
  - name: public.actor
    keys:
      - actor_id
    columns:
      - name: "first_name"
      - name: "last_name"

Default behavior

If not present, LINO outputs all columns in alphabetical order, this is the current behavior and to not break compatibility with existing configurations

Order

If at least a column is defined inside the columns section, then LINO will output only the defined column(s) and in the order they appear in the file.

Primary keys

Primary keys are necessary for the push action, so, even if not present in the columnssection, they will always be exported in the output.

Foreign keys

Foreign keys involved in active relations must be exported even if not present in the columnssection.

Type

The name property is mandatory. The type property is optional and ask LINO to convert the column values to a certain JSON type in the output.

Example :

version: v1
tables:
  - name: public.actor
    keys:
      - actor_id
    columns:
      - name: "first_name"
        type: string
      - name: "last_name"
        type: string

Available types are:

  • string : LINO output the value as a string in the JSON, e.g. : "hello".
  • decimal : LINO output the value as a decimal number in the JSON, e.g. : 25.2.
  • integer : LINO output the value as an integer in the JSON, e.g. : 25.
  • boolean : LINO output the value as a boolean in the JSON, e.g. : true.
  • base64 : LINO output the value base64 encoded in the JSON, e.g. : aGVsbG8=.

[Feature Request] Show graph : include 'dot' application in release

Initial condition:

Postgresql db with dvdrental image
lino dataconnector add dvdrental --read-only postgresql://postgres@localhost:5432/dvdrental?sslmode=disable --password
lino relation extract dvdrental
lino table extract dvdrental
lino id create "film"

Action

lino id show-graph

Problem

exec: "dot": executable file not found in %PATH%

Expected 1/3

Either the release is completed to include the DOT application (best solution !) or the installation process is completed to describe that DOT should be installed.

Expected 2/3

in URL 'https://github.com/CGI-FR/LINO/issues' the example is
$ lino id customer show-graph
With 'customer' which looks like a table name.
but with command 'lino id "film" show-graph' => the help 'lino id' is displayed.
=> is it possible to fix the example and remove the 'customer' word

Expected 3/3

after calling command 'lino id "film" show-graph' => the help 'lino id' is displayed.
=> Is it possible to have an error message because after 'lino id ' the "film" string is not a valid [command]
extract from lino id --help:
Usage:
lino id [command]

[BUG] on error on truncate lino doesn't restore constraint

  • version : lino version 2.3.0-beta3 (commit=4feebf61de618059f3a860001d33ae2250785fe0 date=2023-02-15 by=goreleaser)
  • connector : oracle
  • expected : lino restore constraint on error, or on kill (ctrl+c)
  • bug : lino exit without restoring constraints
❯ ./lino push truncate cible -v 5 -d
3:23PM INF Logger level set to trace
3:23PM INF Start LINO color=auto debug=false log-json=false verbosity=5
3:23PM INF Push mode catch-errors= commitSize=500 disable-constraints=true table=
3:23PM TRC building relation {FKDD01402DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01402_EXPERIENCE [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01402_EXPERIENCE [ID_RCI ID_EXPERIENCE] [] 0} action=push
3:23PM TRC building relation {FKDD01403DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01403_FORMATION [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01403_FORMATION [ID_RCI ID_FORMATION] [] 0} action=push
3:23PM TRC building relation {FKDD01404DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01404_CENTRE_INTERET [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01404_CENTRE_INTERET [ID_RCI ID_CENTRE_INTERET] [] 0} action=push
3:23PM TRC building relation {FKDD01405DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01405_COMPETENCE [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01405_COMPETENCE [ID_RCI ID_COMPETENCE] [] 0} action=push
3:23PM TRC building relation {FKDD01406DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01406_LANGUE [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01406_LANGUE [ID_RCI ID_LANGUE] [] 0} action=push
3:23PM TRC building relation {FKDD01407DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01407_PERMIS [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01407_PERMIS [ID_RCI ID_PERMIS] [] 0} action=push
3:23PM TRC building relation {FKDD01408DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01408_LOCOMOTION [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01408_LOCOMOTION [ID_RCI CODE_MOYEN_LOCOMOTION] [] 0} action=push
3:23PM TRC building relation {FKDD01409DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01409_PIECE_JOINTE [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01409_PIECE_JOINTE [ID_RCI ID_PIECE_JOINTE] [] 0} action=push
3:23PM TRC building relation {FKDD01410DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01410_COMPTEURS [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01410_COMPTEURS [ID_RCI] [] 0} action=push
3:23PM TRC building relation {FKDD01411DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01411_CONSENTEMENT [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01411_CONSENTEMENT [ID_RCI] [] 0} action=push
3:23PM TRC building relation {FKDD01412DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01412_HISTORISATION [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01412_HISTORISATION [ID_RCI DATE_ACTION] [] 0} action=push
3:23PM TRC building relation {FKDD01413DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01413_CARTE_VISITE [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01413_CARTE_VISITE [ID_CARTE_VISITE] [] 0} action=push
3:23PM TRC building relation {FKDD01414DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01414_SIGNALEMENT [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01414_SIGNALEMENT [ID_RCI] [] 0} action=push
3:23PM TRC building relation {FKDD01415DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01415_RELANCE [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01415_RELANCE [ID_RCI] [] 0} action=push
3:23PM TRC building relation {FKDD01416DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01416_HISTO_RELANCE [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01416_HISTO_RELANCE [ID_RCI DATE_RELANCE] [] 0} action=push
3:23PM TRC building relation {FKDD01417DD01409 {DD01409_PIECE_JOINTE [ID_PIECE_JOINTE]} {DD01417_BROUILLON [ID_PIECE_JOINTE]}} action=push
3:23PM TRC building table {DD01409_PIECE_JOINTE [ID_RCI ID_PIECE_JOINTE] [] 0} action=push
3:23PM TRC building table {DD01417_BROUILLON [ID_BROUILLON ID_PIECE_JOINTE] [] 0} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM DBG call Push with mode truncate action=push
3:23PM DBG open table with mode truncate action=push
3:23PM DBG BEGIN
        FOR c IN
                (SELECT c.owner, c.table_name, c.constraint_name
                FROM user_constraints c, user_tables t
                WHERE c.table_name = t.table_name
                AND c.owner = sys_context( 'userenv', 'current_schema' )
                AND c.table_name = 'DD01417_BROUILLON'
                AND c.status = 'ENABLED'
                AND c.constraint_type = 'R'
                ORDER BY c.constraint_type DESC)
        LOOP
                dbms_utility.exec_ddl_statement('alter table "' || c.owner || '"."' || c.table_name || '" disable constraint ' || c.constraint_name);
        END LOOP;
  END;
   action=push
3:23PM DBG TRUNCATE TABLE DD01417_BROUILLON action=push
3:23PM DBG open table with mode truncate action=push
3:23PM DBG BEGIN
        FOR c IN
                (SELECT c.owner, c.table_name, c.constraint_name
                FROM user_constraints c, user_tables t
                WHERE c.table_name = t.table_name
                AND c.owner = sys_context( 'userenv', 'current_schema' )
                AND c.table_name = 'DD01402_EXPERIENCE'
                AND c.status = 'ENABLED'
                AND c.constraint_type = 'R'
                ORDER BY c.constraint_type DESC)
        LOOP
                dbms_utility.exec_ddl_statement('alter table "' || c.owner || '"."' || c.table_name || '" disable constraint ' || c.constraint_name);
        END LOOP;
  END;
   action=push
3:23PM DBG TRUNCATE TABLE DD01402_EXPERIENCE action=push
3:23PM DBG open table with mode truncate action=push
3:23PM DBG BEGIN
        FOR c IN
                (SELECT c.owner, c.table_name, c.constraint_name
                FROM user_constraints c, user_tables t
                WHERE c.table_name = t.table_name
                AND c.owner = sys_context( 'userenv', 'current_schema' )
                AND c.table_name = 'DD01404_CENTRE_INTERET'
                AND c.status = 'ENABLED'
                AND c.constraint_type = 'R'
                ORDER BY c.constraint_type DESC)
        LOOP
                dbms_utility.exec_ddl_statement('alter table "' || c.owner || '"."' || c.table_name || '" disable constraint ' || c.constraint_name);
        END LOOP;
  END;
   action=push
3:23PM DBG TRUNCATE TABLE DD01404_CENTRE_INTERET action=push
3:23PM DBG open table with mode truncate action=push
3:23PM DBG BEGIN
        FOR c IN
                (SELECT c.owner, c.table_name, c.constraint_name
                FROM user_constraints c, user_tables t
                WHERE c.table_name = t.table_name
                AND c.owner = sys_context( 'userenv', 'current_schema' )
                AND c.table_name = 'DD01412_HISTORISATION'
                AND c.status = 'ENABLED'
                AND c.constraint_type = 'R'
                ORDER BY c.constraint_type DESC)
        LOOP
                dbms_utility.exec_ddl_statement('alter table "' || c.owner || '"."' || c.table_name || '" disable constraint ' || c.constraint_name);
        END LOOP;
  END;
   action=push
3:23PM DBG TRUNCATE TABLE DD01412_HISTORISATION action=push
3:23PM DBG open table with mode truncate action=push
3:23PM DBG BEGIN
        FOR c IN
                (SELECT c.owner, c.table_name, c.constraint_name
                FROM user_constraints c, user_tables t
                WHERE c.table_name = t.table_name
                AND c.owner = sys_context( 'userenv', 'current_schema' )
                AND c.table_name = 'DD01403_FORMATION'
                AND c.status = 'ENABLED'
                AND c.constraint_type = 'R'
                ORDER BY c.constraint_type DESC)
        LOOP
                dbms_utility.exec_ddl_statement('alter table "' || c.owner || '"."' || c.table_name || '" disable constraint ' || c.constraint_name);
        END LOOP;
  END;
   action=push
3:23PM DBG TRUNCATE TABLE DD01403_FORMATION action=push
3:23PM DBG open table with mode truncate action=push
3:23PM DBG BEGIN
        FOR c IN
                (SELECT c.owner, c.table_name, c.constraint_name
                FROM user_constraints c, user_tables t
                WHERE c.table_name = t.table_name
                AND c.owner = sys_context( 'userenv', 'current_schema' )
                AND c.table_name = 'DD01410_COMPTEURS'
                AND c.status = 'ENABLED'
                AND c.constraint_type = 'R'
                ORDER BY c.constraint_type DESC)
        LOOP
                dbms_utility.exec_ddl_statement('alter table "' || c.owner || '"."' || c.table_name || '" disable constraint ' || c.constraint_name);
        END LOOP;
  END;
   action=push
3:23PM DBG TRUNCATE TABLE DD01410_COMPTEURS action=push
3:23PM DBG open table with mode truncate action=push
3:23PM DBG BEGIN
        FOR c IN
                (SELECT c.owner, c.table_name, c.constraint_name
                FROM user_constraints c, user_tables t
                WHERE c.table_name = t.table_name
                AND c.owner = sys_context( 'userenv', 'current_schema' )
                AND c.table_name = 'DD01414_SIGNALEMENT'
                AND c.status = 'ENABLED'
                AND c.constraint_type = 'R'
                ORDER BY c.constraint_type DESC)
        LOOP
                dbms_utility.exec_ddl_statement('alter table "' || c.owner || '"."' || c.table_name || '" disable constraint ' || c.constraint_name);
        END LOOP;
  END;
   action=push
3:23PM DBG TRUNCATE TABLE DD01414_SIGNALEMENT action=push
3:23PM DBG open table with mode truncate action=push
3:23PM DBG BEGIN
        FOR c IN
                (SELECT c.owner, c.table_name, c.constraint_name
                FROM user_constraints c, user_tables t
                WHERE c.table_name = t.table_name
                AND c.owner = sys_context( 'userenv', 'current_schema' )
                AND c.table_name = 'DD01415_RELANCE'
                AND c.status = 'ENABLED'
                AND c.constraint_type = 'R'
                ORDER BY c.constraint_type DESC)
        LOOP
                dbms_utility.exec_ddl_statement('alter table "' || c.owner || '"."' || c.table_name || '" disable constraint ' || c.constraint_name);
        END LOOP;
  END;
   action=push
3:23PM DBG TRUNCATE TABLE DD01415_RELANCE action=push
3:23PM DBG open table with mode truncate action=push
3:23PM DBG BEGIN
        FOR c IN
                (SELECT c.owner, c.table_name, c.constraint_name
                FROM user_constraints c, user_tables t
                WHERE c.table_name = t.table_name
                AND c.owner = sys_context( 'userenv', 'current_schema' )
                AND c.table_name = 'DD01405_COMPETENCE'
                AND c.status = 'ENABLED'
                AND c.constraint_type = 'R'
                ORDER BY c.constraint_type DESC)
        LOOP
                dbms_utility.exec_ddl_statement('alter table "' || c.owner || '"."' || c.table_name || '" disable constraint ' || c.constraint_name);
        END LOOP;
  END;
   action=push
3:23PM DBG TRUNCATE TABLE DD01405_COMPETENCE action=push
3:23PM FTL Fatal error stop the push command error="ORA-02266: unique/primary keys in table referenced by enabled foreign keys" action=push

[BUG] lino push truncate with --table option doesn't truncate table

version : cgifr/lino:v2.0-oracle

Modus opeandi

  1. Configure lino with a dataconnector formation to oracle database
  2. have some rows in table HOPHJOUP ``
  3. execute echo "" | lino -v 5 push truncate --table HOPHJOUP formation

Expected outcome

1:03PM INF Logger level set to trace
1:03PM INF Start LINO color=auto debug=false log-json=false verbosity=5
1:03PM INF Push mode catch-errors= commitSize=500 disable-constraints=false table=HOPHJOUP
1:03PM WRN missing table HOPHJOUP in tables.yaml action=push
1:03PM DBG call Push with mode truncate action=push
1:03PM DBG open table with mode truncate action=push
1:03PM DBG TRUNCATE TABLE HOPHJOUP action=push
1:03PM DBG transaction committed action=push
1:03PM FTL Fatal error stop the push command error="unexpected end of JSON input" action=push

The table HOPHJOUP is empty

Actual outcome

1:06PM INF Logger level set to trace
1:06PM INF Start LINO color=auto debug=false log-json=false verbosity=5
1:06PM INF Push mode catch-errors= commitSize=500 disable-constraints=false table=HOPHJOUP
1:06PM TRC building table {HOPHJOUP [MATRI DAT] []} action=push
1:06PM DBG call Push with mode truncate action=push
1:06PM DBG transaction committed action=push
unexpected end of JSON input

The table HOPHJOUP is unchanged

[Feature Request] due to risk of displaying SECRET in docker files, request with connection string including password should be rejected

Initial condition:

Windows server without DOCKER image
follow instruction in 'https://github.com/CGI-FR/LINO'

Action

lino dataconnector add source postgresql://postgres:@localhost:5432/postgres?sslmode=disable

Problem

warning is displayed :
["warn: password should not be included in URI, use --password-from-env or --password"](warn: password will be stored unencrypted in ~/.lino/credentials.yaml, configure a credential helper to remove this warning. See https://github.com/docker/docker-credential-helpers)

Expected

in case of request sent outside of a docker container, it is strange to receive a warning relater to DOCKER. Is it possible to not display this warning when there is no DOCKER image ?
Mainly if there is a real risk to have a SECRET unencrypted displayed somewhere with this syntax, the request should be rejected .

[PROPOSAL] Structured log with -v flag

Converting actual logs to structured logs

* `--debug` This flag complete the logs with debug information (source file, line number).
* `--log-json` Set this flag to produce JSON formatted logs

[BUG] error SQL (no error capture configured)

Problème

Lors d'un push LINO dans une base oracle, la sortie log est la suivant:

10:18AM DBG INSERT INTO ................. action=push
10:18AM TRC ................. action=push
10:18AM DBG close statement insert action=push
10:18AM DBG close statement insert action=push
10:18AM DBG close statement insert action=push
10:18AM DBG close statement insert action=push
10:18AM DBG transaction committed action=push
sql: expected 0 arguments, got 4 (No error capture configured)

L'erreur intervient pour une relation lorsqu'il y a 3 enfants à push (pas de casse pour un enfant)

[BUG] Postgres : error 'no table named ListX' for command ' lino id create "ListX" '

Initial condition:

Windows Computer
Postgresql db with table ListX (table name include UPPER CASE character and no foreign keys)
lino dataconnector add DB_prod --read-only --password postgresql://postgres@localhost:5432/DB_prod?sslmode=disable
lino relation extract DB_prod
lino table extract DB_prod

Action

lino id create "ListX"

Problem

no table named ListX
whereas in file tables.yml, the table ListX is correctly displayed

Expected

Whatever the case of table name, the lino command should work.

NB: After call With @adrienaury
1/2: In case of no relation for request "lino id create "ListX", the feedback shoud propose to execute the command "lino pull --table "listx" DB_prod "
2/2 for request "lino pull --table "listx" DB_prod | jq" the reply is "3:07PM FTL Fatal error stop the pull command error="pq: la relation « listx » n'existe pas" action=pull" => investigation must be done on this error.

Additional info

extract from tables.yml

  • name: ListX
    keys:
    • id

Additional request

[PROPOSAL] Specify format on import

Problem

We need to pseudonymize a files table, by generating a new content (column data) based on the value of the name column.

We use this table.yaml file (note the data column is not selected because we don't need it as it is replaced by new value).

  - name: files
    keys:
      - id
    columns:
      - name: id
      - name: name

The column data is added and generated by PIMO.

On the push, there is a problem because the value of data is in base64 in the JSON stream, so we need to update the table.yaml file

  - name: files
    keys:
      - id
    columns:
      - name: id
      - name: name
      - name: data
        export: binary

Now the push is OK, the data values are treated as base64. But the pull will now extract the column data from the source database. This is a concern about performance, because this extraction is costly and useless.

Solution (edited 2022-08-02)

Currently the import property can only impact the in-memory data type (int, float, string, ...). But this property could (should?) also enable the possibility to impact how the data is read from the JSON stream.

A way to do this without breaking compatibility is to continue the support of import: []byte, import: string, etc... and add the support of import: binary, import: datetime, etc... (same options as export properties). To set the data type and the format at the same time, this king of value can be used for the import property : import: binary(int64) (i.e. : import: format(type)).

So our example become :

  - name: files
    keys:
      - id
    columns:
      - name: id
      - name: name
      - name: data
        import: binary

[PROPOSAL] Add flag to enable PK update in-situ

Requirement

We need to replace a field that is part of a primary key on a table.

The original pulled object from lino pull command is

{
  "PKFIELD": "orig"
}

The modified version used in lino push update command is

{
  "PKFIELD": "newvalue"
}

We have access to a cache file in jsonl format on the filesystem

{"key": "orig", "value": "newvalue"}
{"key": "orig2", "value": "newvalue2"}
{"key": "orig3", "value": "newvalue3"}

Update command

lino push update should allow using the cache file to generate the following update query

UPDATE table_name
SET PKFIELD = "newvalue"
WHERE PKFIELD = "orig";    # <= orig value recovered from cache file

Delete command

lino push delete should allow using the cache file to generate the following delete query

DELETE FROM table_name
WHERE PKFIELD = "orig";    # <= orig value recovered from cache file

Solution

Add a pk-translation flag

lino push update bdd --pk-translation PKFIELD=cache.json < newvalues.jsonl
lino push update bdd --pk-translation PKFIELD=cache.json --pk-translation PKFIELD2=cache2.json < newvalues.jsonl

[Request] Is it possible to complete the Github documentation with a QUICKSTART ?

Is it possible to add at the beginning of the WEB URL for LINO( also in PIMO?) a link to a QUICKSTART. This Quickstart page should include installation procedure/scripts in order to quickly install Lino/Pimo tools without needing any licence. For windows computer a script (or link to a script) should be available to help install the latest version. See example in attached Powershell script. Warning : in powershell sript line (.\7Zip4Powershell\2.4.0\7Zip4Powershell.psd1") should be updated as the version '2.4.0' may be updated.
CGI_Functions.ps1.txt
Install_Lino.ps1.txt
Install_Pimo.ps1.txt

In addition , is it possible to add in this quickstart another methods like the LINO-toolbox including example to use this Docker Image ?

Thanks.

[PROPOSAL] new raw SQL action

Motivation

Some simple sql queries have better optimization than LINO'pull algorithm (such as DISTINCT).

Proposal

This is a proposal to add a new sql action. SQL query is read from standard input and result is dump as jsonline in standard output.

Example

$ echo "select * from public.customer" | lino sql source
{"id" : 1,  "name":  "Robert"}
{"id" : 2,  "name":  "Nick"}
...

[BUG] Special character in password leads to error

Hi,

A user could not connect to an Oracle Database after using a lino dc add command, with a password containing a #.
Could it be possible that the yaml file resulting from the command is interpreting it as a comment?

Is there any fix or alternative way of using it?

Thanks,

[Feature Request] Postgres : the LINO PULL should retrieve the correct numeric values without manual actions in tables.yml

Initial condition:

Postgresql db with dvdrental image
table Film contains some columns with Numeric type: rating, rental_rate,replacement_cost
lino dataconnector add dvdrental --read-only postgresql://postgres@localhost:5432/dvdrental?sslmode=disable --password
lino relation extract dvdrental
lino table extract dvdrental
lino id create "film"

Action

lino pull dvdrental | jq

Problem

all numeric column are not displaying numeric values.
Example of invalid numeric value:
"rating": "TkMtMTc=",

Expected

The LINO tool should take into account the column data type in order to display the correct numeric values in the PULL.

Additional info

current workaround - which is hard to implement for huge databases
Manually complete the file 'tables.yaml' with all columns and export:numeric for numeric column and the correct format for other format

  • name: payment
    keys:
    • payment_id
      columns:
    • name: amount
      export: numeric

Problem found using CC TECH repo

https://pawssource.ent.cgi.com/gitlab/wse/GRALPTEST/anonymisation/poc_lino_postgres_dvdrental

Extract from lino pull dvdrental | jq

"description": "A Fateful Reflection of a Moose And a Husband who must Overcome a Monkey in Nigeria",
"film_id": 133,
"fulltext": "J2NoYW1iZXInOjEgJ2ZhdGUnOjQgJ2h1c2JhbmQnOjExICdpdGFsaWFuJzoyICdtb25rZXknOjE2ICdtb29zJzo4ICdtdXN0JzoxMyAnbmlnZXJpYSc6MTggJ292ZXJjb20nOjE0ICdyZWZsZWN0Jzo1",
"language_id": 1,
"last_update": "2013-05-26T14:50:58.951Z",
"length": 117,
"rating": "TkMtMTc=",
"release_year": 2006,
"rental_duration": 7,
"rental_rate": "NC45OQ==",
"replacement_cost": "MTQuOTk=",

[PROPOSAL] Improve `lino table extract` to Automatically Initialize Column Information Based on Database Schema

Description:
Currently, users have reported anomalies when using the LINO tool, particularly regarding the automatic detection of column types during data extraction. When extracting data from databases such as PostgreSQL, numeric columns are not recognized correctly, leading to incorrect representations of numeric values in the output JSON format (#244).

Problem:
Numeric columns are not displaying correct numeric values in the output JSON format. For instance, a value like "rating": "TkMtMTc=" is incorrect.

Expected Outcome:
The LINO tool should consider the column data type during extraction to ensure that numeric values are displayed correctly in the output JSON format.

Additional Information:

  • A current workaround involves manually updating the tables.yaml file to include column information, specifying export: numeric for numeric columns. However, this approach is cumbersome, especially for large databases.
  • It is crucial to address this issue to enhance the usability and accuracy of the LINO tool, particularly for users dealing with numeric data.

Proposed Solution:
Enhance the lino table extract command to automatically initialize column information, including data types, based on the database schema. This improvement would streamline the data extraction process and ensure accurate representation of numeric values in the output JSON format.

Suggested Implementation Steps:

  1. Modify the lino table extract command to analyze the database schema and retrieve column information.
  2. Determine the data type of each column (e.g., numeric, string, date) based on the database schema.
  3. Automatically populate the tables.yaml file with column information, including the appropriate data type for each column.
  4. Ensure backward compatibility and error handling for databases with complex schemas or unusual column types.

Benefits:

  • Simplifies the data extraction process for users by eliminating the need for manual configuration of column information.
  • Improves the accuracy of data representation in the output JSON format, particularly for numeric values.
  • Enhances the overall usability and efficiency of the LINO tool, making it more accessible to a wider range of users.

Related Issue:
This enhancement request is related to issue #244 , which addresses similar challenges in data extraction and representation.

[PROPOSAL] lino query

Add lino query <database alias> <sql query> action, to execute any SQL query on a database with lino.

This feature can adress other issues :

MariaDB support

Hello,

We tried yours tools, but we've been quickly stopped by this problem.
when executing "lino relation extract source" we got "no extractor found for database type"
While reading doc, we found that LINO doesn't support MariaDB or/and MySQL.
How to add this support ?

How to Reproduct
run docker run -v /home/vagrant/LINO/LINO:/home/lino lino_lino dataconnector add source mysql://<ip>:<port>?sslmode=disable -U USER -P PASS -s myschema

run docker run --env PASS=password --env USER=user -v /home/vagrant/LINO/LINO:/home/lino lino_lino table extract source

[PROPOSAL] Web Socket Protocol

Message

CommandMessage:

  • id : string identifiant unique du message
  • action : string valeurs possible : "ping", "extract_tables", "extract_relations", "pull_open", "push_open", "push_data", "push_commit", "push_close"
  • payload : Payload payload différent en fonction du type de l'action

ResultMessage:

  • id : string identifiant du message d'origine
  • error : Error|null
  • next : bool retourne true si il y a d'autres messages à suivre
  • payload : Payload

Pull

PullOpenPayloay

  • table : string : nom de la table
  • colums: []string : liste des colonnes
  • filtre : Filtre

Filter

  • limit number
  • values map[string]JSONValue
  • where string
  • distinct bool

PingPayload

[PROPOSAL] Tables configuration commands

Add new subcommands to the lino table command, to easily modify the tables.yaml configuration.

version: v1
tables:
  - name: public.actor
    keys:
      - actor_id
$ lino table add-column public.actor first_name
successfully added column first_name to public.actor table
$ lino table add-column public.actor last_name
successfully added column last_name to public.actor table
version: v1
tables:
  - name: public.actor
    keys:
      - actor_id
    columns:
      - name: "first_name"
      - name: "last_name"
$ lino table remove-column public.actor first_name
successfully removed column first_name to public.actor table
$ lino table set-column public.actor last_name string
successfully added column last_name to public.actor table
version: v1
tables:
  - name: public.actor
    keys:
      - actor_id
    columns:
      - name: "last_name"
        type: string

[PROPOSAL] Cursor Scan filter

To extract a large part of table's lines. It's faster to fetch all lines and apply filter after.

This is a proposal to add a flag --scan to the lino pull action.

--scan is activable only for one table pull (mainly with the --table flag) and execute filter ( -f and -F) in memory.

[Proposal] REPORT log at the end of pull or push command

Similar to what is done with PIMO (https://github.com/CGI-FR/PIMO/tree/main/demo/demo9#fields) I think it would be nice to have a list of fields that are always printed when pulling/pushing.

Examples

  • id (or ids) of the current line being processed
  • line number from the input (on push)
  • line number from the output (on pull)
  • ...

Another "nice to have" is a small result of the processing, for example with PIMO :

+-------+--------------------+--------------------+---------------------+--------+-------------+----------+------------+-------------+----------+
| level | stats:ignoredPaths | stats:skippedLines | stats:skippedFields | return | config      | duration | input-line | output-line | message  |
+-------+--------------------+--------------------+---------------------+--------+-------------+----------+------------+-------------+----------+
| info  | 0                  | 0                  | 0                   | 0      | masking.yml | 2.1671ms | 1          | 3           | End PIMO |
+-------+--------------------+--------------------+---------------------+--------+-------------+----------+------------+-------------+----------+

Originally posted by @adrienaury in #18 (comment)

Add Support for Configuring Stats Templates and URL using Environment Variables

As a user of the lino command line tool, I would like to request the addition of support for configuring the stats templates and URL using environment variables LINO_STATS_URL and LINO_STATS_TEMPLATE. Currently, the stats template and URL can be specified using the --statsTemplate and --stats flags. This feature would allow for more flexible configuration of the stats.

example usage:

export LINO_STATS_URL=http://localhost:8080/stats
export LINO_STATS_TEMPLATE='{"software":"LINO","stats":{{ .Stats }}}'
lino pull source --limit 10 > customers.jsonl

[Feature Request] update Password documentation for 'Connection string'

Initial condition:

follow instruction in 'https://github.com/CGI-FR/LINO'

Action

lino dataconnector add source postgresql://postgres:@localhost:5432/postgres?sslmode=disable

Problem

warning is displayed :
"warn: password should not be included in URI, use --password-from-env or --password"

Expected

'https://github.com/CGI-FR/LINO' should be updated to remove password from connection string and replace it with parameter '--password'.With this parameter, the operator is asked to enter the password in a secured way.

Devcontainer broken

Since integration tests were added as Github Action, the devcontainer is broken with VSCode (if affect only VSCode, not Github Actions):

Logs

Step 20/21 : RUN useradd -ms /bin/bash vscode
[2021-05-07T20:53:19.006Z]  ---> Running in dd54d23b6959
[2021-05-07T20:53:19.321Z] useradd: user 'vscode' already exists
[2021-05-07T20:53:19.637Z] ERROR: Service 'vscode' failed to build : The command '/bin/sh -c useradd -ms /b
in/bash vscode' returned a non-zero code: 9

The line in error

RUN useradd -ms /bin/bash vscode

[PROPOSAL] lino constraint disable

Problem

As a user of lino command, I want to disable foreign key constraints on a given table or ingress-descriptor, without re-enabling immediately. I want to be able to enable disabled contraints later.

Solution

The new command lino constraint will handle listing/enabling/disabling of foreign key contraints.

List constraints on foreign key pointing to <table_name>

$ lino constraints list --table <table_name>
{"table": "ACTION", "constraint_name": "FK_ACTION_01"}

List constraints on foreign key pointing to any table in the ingress-descriptor

$ lino constraints list --i <ingress-descriptor-name>
{"table": "ACTION", "constraint_name": "FK_ACTION_01"}
  • constraints are listed in the order they need to be enabled

List and disable constraints on foreign key pointing to <table_name>

$ lino constraints disable --table <table_name> > constraints.jsonl
1 constraint disabled

List and disable constraints on foreign key pointing to any table in the ingress-descriptor

$ lino constraints disable -i <ingress-descriptor-name> > constraints.jsonl
1 constraint disabled

Enable constraints previously disabled

$ lino constraints enable < constraints.jsonl
1 constraint enabled

[PROPOSAL] Export columns options

Not all use case have to export primary keys or foreign keys. Primary keys and foreign keys are implicitly exported as nested json object. Other use have to export all columns in the table.

We can add an option to specify the pull behavior.

  • all : export all columns and respect listed columns
  • with-keys : export listed columns and pk + fk
  • only : export only listed columns

Example :

version: v1
tables:
  - name: public.actor
    keys:
      - actor_id
    columns:
      - name: "first_name"
        type: string
      - name: "last_name"
        type: string
    export: all

Originally posted by @youen in #33 (comment)

[BUG] Push truncate impact too many tables

Problem

Ingress Descriptor defines a cluster of two table (country->city).
All other relations are disabled.

When using command lino push truncate this happens:

{"action":"truncate","table":"payment_p2007_01"}
{"action":"truncate","table":"payment_p2007_04"}
{"action":"truncate","table":"payment_p2007_06"}
{"action":"truncate","table":"rental"}
{"action":"truncate","table":"payment_p2007_03"}
{"action":"truncate","table":"payment_p2007_02"}
{"action":"truncate","table":"payment_p2007_05"}
{"action":"truncate","table":"payment"}
{"action":"truncate","table":"store"}
{"action":"truncate","table":"staff"}
{"action":"truncate","table":"customer"}
{"action":"truncate","table":"address"}
{"action":"truncate","table":"inventory"}
{"action":"truncate","table":"film_actor"}
{"action":"truncate","table":"city"}
{"action":"truncate","table":"film"}
{"action":"truncate","table":"film_category"}
{"action":"truncate","table":"category"}
{"action":"truncate","table":"language"}
{"action":"truncate","table":"actor"}
{"action":"truncate","table":"country"}

All of the tables mentionned in the ingress decriptor are impacted.

Solution

Change the following method, so it returns only tables that are parts of a active relation (lookup: true).

func (p plan) Tables() []Table {
tableOrder := map[string]int{}
name2Table := map[string]Table{}
for _, r := range p.relations {
tableOrder[r.Child().Name()] = tableOrder[r.Parent().Name()] + 1
name2Table[r.Child().Name()] = r.Child()
name2Table[r.Parent().Name()] = r.Parent()
}
// propagate priority to children
for i := 0; i < len(tableOrder); i++ {
for _, r := range p.relations {
tableOrder[r.Child().Name()] = tableOrder[r.Parent().Name()] + 1
}
}
type to struct {
t Table
o int
}
tables := []to{}
for name, table := range name2Table {
tables = append(tables, to{table, tableOrder[name]})
}
sort.Slice(tables, func(i, j int) bool {
return tables[i].o > tables[j].o
})
result := []Table{}
for _, v := range tables {
result = append(result, v.t)
}
return result
}

[Feature Request] Add DuckDB Support to LINO

Proposal to Add DuckDB Support in LINO

Goal

The goal of this proposal is to enhance LINO by adding support for DuckDB as a database source. DuckDB is a lightweight, embeddable analytical database that is gaining popularity for its speed and efficiency. Integrating DuckDB into LINO will provide users with the flexibility to extract relations and tables from DuckDB databases, expanding the range of supported databases.

Implementation Details

  1. Extractor for DuckDB Relations (internal/infra/relation/extractor_duckdb.go):
    • Implement a function in extractor_duckdb.go that generates the SQL query to fetch all relations in the specified schema.
    • Use existing files like extractor_postgres.go as a reference.
// internal/infra/relation/extractor_duckdb.go

package relation

// DuckDBDialect represents the DuckDB SQL dialect.
type DuckDBDialect struct{}

// SQL returns the SQL query to fetch all relations in the named schema.
func (d DuckDBDialect) SQL(schema string) string {
    return fmt.Sprintf("SELECT table_name FROM information_schema.tables WHERE table_schema = '%s'", schema)
}
  1. Extractor for DuckDB Tables (internal/infra/table/extractor_duckdb.go):
    • Implement a function in extractor_duckdb.go that generates the SQL query to fetch all tables in the specified schema.
    • Use existing files like extractor_postgres.go as a reference.
// internal/infra/table/extractor_duckdb.go

package table

// DuckDBDialect represents the DuckDB SQL dialect.
type DuckDBDialect struct{}

// SQL returns the SQL query to fetch all tables in the named schema.
func (d DuckDBDialect) SQL(schema string) string {
    return fmt.Sprintf("SELECT table_name FROM information_schema.tables WHERE table_schema = '%s'", schema)
}
  1. Data Destination for DuckDB (internal/infra/push/datadestination_duckdb.go):
    • Implement the SQLDialect interface in datadestination_duckdb.go to define specifics about the DuckDB SQL dialect.
    • Refer to existing files like datadestination_postgres.go for guidance.
// internal/infra/push/datadestination_duckdb.go

package push

// DuckDBDialect represents the DuckDB SQL dialect.
type DuckDBDialect struct{}

// Implement the SQLDialect interface here.
// Example: func (d DuckDBDialect) CreateUpdateStatement() string { ... }
  1. Data Source for DuckDB (internal/infra/pull/datasource_duckdb.go):
    • Implement the SQLDialect interface in datasource_duckdb.go to define specifics about the DuckDB SQL dialect.
    • Refer to existing files like datasource_postgres.go for guidance.
// internal/infra/pull/datasource_duckdb.go

package pull

// DuckDBDialect represents the DuckDB SQL dialect.
type DuckDBDialect struct{}

// Implement the SQLDialect interface here.
// Example: func (d DuckDBDialect) CreateUpdateStatement() string { ... }

Once the above files are implemented, users will be able to add DuckDB as a source to LINO by running commands similar to those provided in the example for MySQL:

docker run -v /path/to/lino:/home/lino lino_lino dataconnector add source duckdb:///path/to/duckdb/database -s duckschema
docker run -v /path/to/lino:/home/lino lino_lino table extract source

This proposal aims to extend LINO's capabilities and offer users more options for data extraction. Your feedback and contributions to this feature request are highly appreciated.

Please feel free to open discussions and ask questions for further clarification.

Youen

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.