cgi-fr / lino Goto Github PK
View Code? Open in Web Editor NEWLarge Input Narrow Output - LINO is a simple ETL tool to manage tests data.
Home Page: https://cgi-fr.github.io/lino-doc/
License: GNU General Public License v3.0
Large Input Narrow Output - LINO is a simple ETL tool to manage tests data.
Home Page: https://cgi-fr.github.io/lino-doc/
License: GNU General Public License v3.0
table.yaml
- name: attachments
keys:
- id
columns:
- name: content
import: file
LINO will interpret the input value as a path to a file, and will use the content of the file as data.
lino push --create-schema insert target-db
Will create
Based on the first data line encountered on each table.
Other possibility, a new push action
lino push schema target-db
Hi,
We have need to purge some data in a given table according to a condition (where MyData>9999).
Currently, we can do that with
$ lino pull --table MyTable --where 'MyData>9999' MyAliasDB | lino push delete --table MyTable MyAliasDB
Ideally we would have
$ lino push delete --table MyTable --where 'MyData>9999' MyAliasDB
lino push delete with conditionnal must ignore input data
Thanks
Lors d'un push LINO, une erreur oracle est survenue: ORA-01465: nombre hexadécimal non valide.
D'après le thread: https://stackoverflow.com/questions/33708959/ora-01465-invalid-hex-number-in-oracle-while-using-blob
il faudrait utiliser une fonction du type utl_raw.cast_to_raw pour convertir la chaîne de caractère en type compatible oracle.
(remarque: même solution proposée pour erreur #51, mais problèmes de tailles de chaîne de caractères en plus dans ce cas , >4000)
Peut-être intégrer la gestion des types dans le dialecte oracle de LINO.
Until LINO v1.9.2 included, when pulling a table with a child relationship, data from child were pulled as a JSON Object :
{
"myField1": {
"childField1": "",
"childField2": "2021-07-06T09:59:13+02:00",
"childField3": "2021-07-13T12:04:19+02:00",
...
},
"myField2": "bla",
"myField3": "9921196128284",
...
}
Starting from v1.10.0, same data are pulled as a JSON Array containing a single JSON Object :
{
"myField1": [
{
"childField1": "",
"childField2": "2021-07-06T09:59:13+02:00",
"childField3": "2021-07-13T12:04:19+02:00",
...
}
]
"myField2": "bla",
"myField3": "9921196128284",
...
}
Is it a desired behavior? If so, is there any way to get the previous one without going back to v1.9.2?
Could this also be documented so that it becomes a known kind of breaking change?
Thanks
Got this error while running a lino push command with lino-toolbox:1.4.0 on an oracle 19 database table.
the command
lino pull ${SOURCE} --table ${TABLE} -l 0 | lino -v 3 push truncate --table ${TABLE} ${DESTINATION} -e error-ref-${TABLE}.json
the stacktrace
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xa01ab0]
goroutine 1 [running]:
github.com/cgi-fr/lino/internal/infra/push.OracleDialect.ConvertValue({}, {0xad6de0?, 0xc0006800c0?}, {{0xc00012a1f8, 0x8}, 0x0, {0x0, 0x0}})
/workspace/internal/infra/push/datadestination_oracle.go:185 +0x70
github.com/cgi-fr/lino/internal/infra/push.(*SQLRowWriter).Write(0xc0004a2180, 0xc00068a1b0, 0xc0000b4140?)
/workspace/internal/infra/push/datadestination_sql.go:345 +0x2ea
github.com/cgi-fr/lino/pkg/push.pushRow(0x42f7b0?, {0xcde350, 0xc0003c30e0}, {0xcdf300, 0xc0000b4140}, {0xcddb60, 0xc00005a4e0}, 0x0, {0xcd9eb8, 0xc000120980}, ...)
/workspace/pkg/push/driver.go:224 +0x3df
github.com/cgi-fr/lino/pkg/push.Push({0xcde318, 0xc00005a510}, {0xcde350, 0xc0003c30e0}, {0xcddb60, 0xc00005a4e0}, 0x0?, 0x1f4, 0x0?, {0xcd7000, ...}, ...)
/workspace/pkg/push/driver.go:70 +0x2e5
github.com/cgi-fr/lino/internal/app/push.NewCommand.func3(0xc0003cd500?, {0xc00013d280, 0x2, 0x8?})
/workspace/internal/app/push/cli.go:155 +0x845
github.com/spf13/cobra.(*Command).execute(0xc0003cd500, {0xc00013d200, 0x8, 0x8})
/home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:987 +0xa91
github.com/spf13/cobra.(*Command).ExecuteC(0x1da1640)
/home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:1115 +0x425
github.com/spf13/cobra.(*Command).Execute(...)
/home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:1039
main.main()
/workspace/cmd/lino/main.go:138 +0x25
was working with lino-toolbox:1.2.0 and 1.3.0
We have to insert the jsonobject in a table film
{
"film_id": 452
"title" : "The Matrix"
"year": 1999
}
During a dataset insert primary key have to be unique to avoid conflict with existing values. Also sequences have to be updated upper than the maximum primary key inserted to avoid future conflict.
Add sequence information in table.yaml
and use it during lino push insert
process.
version: v1
tables:
- name: film
keys:
- film_id
sequence :
film_id: sequence_film_id
If the film_id
is omitted, lino use the value of sequence_film_id.nextval
to feed the primary_key.
If the primary key is a part of a relation
- name: film_film_category
parent:
name: film
keys:
- film_id
child:
name: film_category
keys:
- film_id
inserting the following JSON object
{
"title" : "The Matrix",
"year": 1999,
"film_film_category" : [
{
"category_id": 151
},
{
"category_id": 452
},
]
}
will produce the sql statements
insert into film (film_id, title, year)
values (sequence_film_id.nextval, 'The Matrix', 1999);
insert into film_category (film_id, category_id)
values (sequence_film_id.currval, 151 );
insert into film_category (film_id, category_id)
values (sequence_film_id.currval, 452 );
A new section columns
under tables elements inside tables.yaml file lists exported columns in the JSON output.
Example :
version: v1
tables:
- name: public.actor
keys:
- actor_id
columns:
- name: "first_name"
- name: "last_name"
If not present, LINO outputs all columns in alphabetical order, this is the current behavior and to not break compatibility with existing configurations
If at least a column is defined inside the columns
section, then LINO will output only the defined column(s) and in the order they appear in the file.
Primary keys are necessary for the push action, so, even if not present in the columns
section, they will always be exported in the output.
Foreign keys involved in active relations must be exported even if not present in the columns
section.
The name
property is mandatory. The type
property is optional and ask LINO to convert the column values to a certain JSON type in the output.
Example :
version: v1
tables:
- name: public.actor
keys:
- actor_id
columns:
- name: "first_name"
type: string
- name: "last_name"
type: string
Available types are:
"hello"
.25.2
.25
.true
.aGVsbG8=
.These kind of warning message are annoying, it breaks lino|pimo pipelines.
LINO/internal/app/urlbuilder/urlbuilder.go
Line 141 in 34eb10a
It should be a log in stderr with warning level
Postgresql db with dvdrental image
lino dataconnector add dvdrental --read-only postgresql://postgres@localhost:5432/dvdrental?sslmode=disable --password
lino relation extract dvdrental
lino table extract dvdrental
lino id create "film"
lino id show-graph
exec: "dot": executable file not found in %PATH%
Either the release is completed to include the DOT application (best solution !) or the installation process is completed to describe that DOT should be installed.
in URL 'https://github.com/CGI-FR/LINO/issues' the example is
$ lino id customer show-graph
With 'customer' which looks like a table name.
but with command 'lino id "film" show-graph' => the help 'lino id' is displayed.
=> is it possible to fix the example and remove the 'customer' word
after calling command 'lino id "film" show-graph' => the help 'lino id' is displayed.
=> Is it possible to have an error message because after 'lino id ' the "film" string is not a valid [command]
extract from lino id --help:
Usage:
lino id [command]
❯ ./lino push truncate cible -v 5 -d
3:23PM INF Logger level set to trace
3:23PM INF Start LINO color=auto debug=false log-json=false verbosity=5
3:23PM INF Push mode catch-errors= commitSize=500 disable-constraints=true table=
3:23PM TRC building relation {FKDD01402DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01402_EXPERIENCE [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01402_EXPERIENCE [ID_RCI ID_EXPERIENCE] [] 0} action=push
3:23PM TRC building relation {FKDD01403DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01403_FORMATION [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01403_FORMATION [ID_RCI ID_FORMATION] [] 0} action=push
3:23PM TRC building relation {FKDD01404DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01404_CENTRE_INTERET [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01404_CENTRE_INTERET [ID_RCI ID_CENTRE_INTERET] [] 0} action=push
3:23PM TRC building relation {FKDD01405DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01405_COMPETENCE [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01405_COMPETENCE [ID_RCI ID_COMPETENCE] [] 0} action=push
3:23PM TRC building relation {FKDD01406DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01406_LANGUE [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01406_LANGUE [ID_RCI ID_LANGUE] [] 0} action=push
3:23PM TRC building relation {FKDD01407DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01407_PERMIS [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01407_PERMIS [ID_RCI ID_PERMIS] [] 0} action=push
3:23PM TRC building relation {FKDD01408DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01408_LOCOMOTION [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01408_LOCOMOTION [ID_RCI CODE_MOYEN_LOCOMOTION] [] 0} action=push
3:23PM TRC building relation {FKDD01409DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01409_PIECE_JOINTE [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01409_PIECE_JOINTE [ID_RCI ID_PIECE_JOINTE] [] 0} action=push
3:23PM TRC building relation {FKDD01410DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01410_COMPTEURS [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01410_COMPTEURS [ID_RCI] [] 0} action=push
3:23PM TRC building relation {FKDD01411DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01411_CONSENTEMENT [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01411_CONSENTEMENT [ID_RCI] [] 0} action=push
3:23PM TRC building relation {FKDD01412DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01412_HISTORISATION [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01412_HISTORISATION [ID_RCI DATE_ACTION] [] 0} action=push
3:23PM TRC building relation {FKDD01413DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01413_CARTE_VISITE [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01413_CARTE_VISITE [ID_CARTE_VISITE] [] 0} action=push
3:23PM TRC building relation {FKDD01414DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01414_SIGNALEMENT [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01414_SIGNALEMENT [ID_RCI] [] 0} action=push
3:23PM TRC building relation {FKDD01415DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01415_RELANCE [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01415_RELANCE [ID_RCI] [] 0} action=push
3:23PM TRC building relation {FKDD01416DD01401 {DD01401_PORTEFEUILLE [ID_RCI]} {DD01416_HISTO_RELANCE [ID_RCI]}} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM TRC building table {DD01416_HISTO_RELANCE [ID_RCI DATE_RELANCE] [] 0} action=push
3:23PM TRC building relation {FKDD01417DD01409 {DD01409_PIECE_JOINTE [ID_PIECE_JOINTE]} {DD01417_BROUILLON [ID_PIECE_JOINTE]}} action=push
3:23PM TRC building table {DD01409_PIECE_JOINTE [ID_RCI ID_PIECE_JOINTE] [] 0} action=push
3:23PM TRC building table {DD01417_BROUILLON [ID_BROUILLON ID_PIECE_JOINTE] [] 0} action=push
3:23PM TRC building table {DD01401_PORTEFEUILLE [ID_RCI] [] 0} action=push
3:23PM DBG call Push with mode truncate action=push
3:23PM DBG open table with mode truncate action=push
3:23PM DBG BEGIN
FOR c IN
(SELECT c.owner, c.table_name, c.constraint_name
FROM user_constraints c, user_tables t
WHERE c.table_name = t.table_name
AND c.owner = sys_context( 'userenv', 'current_schema' )
AND c.table_name = 'DD01417_BROUILLON'
AND c.status = 'ENABLED'
AND c.constraint_type = 'R'
ORDER BY c.constraint_type DESC)
LOOP
dbms_utility.exec_ddl_statement('alter table "' || c.owner || '"."' || c.table_name || '" disable constraint ' || c.constraint_name);
END LOOP;
END;
action=push
3:23PM DBG TRUNCATE TABLE DD01417_BROUILLON action=push
3:23PM DBG open table with mode truncate action=push
3:23PM DBG BEGIN
FOR c IN
(SELECT c.owner, c.table_name, c.constraint_name
FROM user_constraints c, user_tables t
WHERE c.table_name = t.table_name
AND c.owner = sys_context( 'userenv', 'current_schema' )
AND c.table_name = 'DD01402_EXPERIENCE'
AND c.status = 'ENABLED'
AND c.constraint_type = 'R'
ORDER BY c.constraint_type DESC)
LOOP
dbms_utility.exec_ddl_statement('alter table "' || c.owner || '"."' || c.table_name || '" disable constraint ' || c.constraint_name);
END LOOP;
END;
action=push
3:23PM DBG TRUNCATE TABLE DD01402_EXPERIENCE action=push
3:23PM DBG open table with mode truncate action=push
3:23PM DBG BEGIN
FOR c IN
(SELECT c.owner, c.table_name, c.constraint_name
FROM user_constraints c, user_tables t
WHERE c.table_name = t.table_name
AND c.owner = sys_context( 'userenv', 'current_schema' )
AND c.table_name = 'DD01404_CENTRE_INTERET'
AND c.status = 'ENABLED'
AND c.constraint_type = 'R'
ORDER BY c.constraint_type DESC)
LOOP
dbms_utility.exec_ddl_statement('alter table "' || c.owner || '"."' || c.table_name || '" disable constraint ' || c.constraint_name);
END LOOP;
END;
action=push
3:23PM DBG TRUNCATE TABLE DD01404_CENTRE_INTERET action=push
3:23PM DBG open table with mode truncate action=push
3:23PM DBG BEGIN
FOR c IN
(SELECT c.owner, c.table_name, c.constraint_name
FROM user_constraints c, user_tables t
WHERE c.table_name = t.table_name
AND c.owner = sys_context( 'userenv', 'current_schema' )
AND c.table_name = 'DD01412_HISTORISATION'
AND c.status = 'ENABLED'
AND c.constraint_type = 'R'
ORDER BY c.constraint_type DESC)
LOOP
dbms_utility.exec_ddl_statement('alter table "' || c.owner || '"."' || c.table_name || '" disable constraint ' || c.constraint_name);
END LOOP;
END;
action=push
3:23PM DBG TRUNCATE TABLE DD01412_HISTORISATION action=push
3:23PM DBG open table with mode truncate action=push
3:23PM DBG BEGIN
FOR c IN
(SELECT c.owner, c.table_name, c.constraint_name
FROM user_constraints c, user_tables t
WHERE c.table_name = t.table_name
AND c.owner = sys_context( 'userenv', 'current_schema' )
AND c.table_name = 'DD01403_FORMATION'
AND c.status = 'ENABLED'
AND c.constraint_type = 'R'
ORDER BY c.constraint_type DESC)
LOOP
dbms_utility.exec_ddl_statement('alter table "' || c.owner || '"."' || c.table_name || '" disable constraint ' || c.constraint_name);
END LOOP;
END;
action=push
3:23PM DBG TRUNCATE TABLE DD01403_FORMATION action=push
3:23PM DBG open table with mode truncate action=push
3:23PM DBG BEGIN
FOR c IN
(SELECT c.owner, c.table_name, c.constraint_name
FROM user_constraints c, user_tables t
WHERE c.table_name = t.table_name
AND c.owner = sys_context( 'userenv', 'current_schema' )
AND c.table_name = 'DD01410_COMPTEURS'
AND c.status = 'ENABLED'
AND c.constraint_type = 'R'
ORDER BY c.constraint_type DESC)
LOOP
dbms_utility.exec_ddl_statement('alter table "' || c.owner || '"."' || c.table_name || '" disable constraint ' || c.constraint_name);
END LOOP;
END;
action=push
3:23PM DBG TRUNCATE TABLE DD01410_COMPTEURS action=push
3:23PM DBG open table with mode truncate action=push
3:23PM DBG BEGIN
FOR c IN
(SELECT c.owner, c.table_name, c.constraint_name
FROM user_constraints c, user_tables t
WHERE c.table_name = t.table_name
AND c.owner = sys_context( 'userenv', 'current_schema' )
AND c.table_name = 'DD01414_SIGNALEMENT'
AND c.status = 'ENABLED'
AND c.constraint_type = 'R'
ORDER BY c.constraint_type DESC)
LOOP
dbms_utility.exec_ddl_statement('alter table "' || c.owner || '"."' || c.table_name || '" disable constraint ' || c.constraint_name);
END LOOP;
END;
action=push
3:23PM DBG TRUNCATE TABLE DD01414_SIGNALEMENT action=push
3:23PM DBG open table with mode truncate action=push
3:23PM DBG BEGIN
FOR c IN
(SELECT c.owner, c.table_name, c.constraint_name
FROM user_constraints c, user_tables t
WHERE c.table_name = t.table_name
AND c.owner = sys_context( 'userenv', 'current_schema' )
AND c.table_name = 'DD01415_RELANCE'
AND c.status = 'ENABLED'
AND c.constraint_type = 'R'
ORDER BY c.constraint_type DESC)
LOOP
dbms_utility.exec_ddl_statement('alter table "' || c.owner || '"."' || c.table_name || '" disable constraint ' || c.constraint_name);
END LOOP;
END;
action=push
3:23PM DBG TRUNCATE TABLE DD01415_RELANCE action=push
3:23PM DBG open table with mode truncate action=push
3:23PM DBG BEGIN
FOR c IN
(SELECT c.owner, c.table_name, c.constraint_name
FROM user_constraints c, user_tables t
WHERE c.table_name = t.table_name
AND c.owner = sys_context( 'userenv', 'current_schema' )
AND c.table_name = 'DD01405_COMPETENCE'
AND c.status = 'ENABLED'
AND c.constraint_type = 'R'
ORDER BY c.constraint_type DESC)
LOOP
dbms_utility.exec_ddl_statement('alter table "' || c.owner || '"."' || c.table_name || '" disable constraint ' || c.constraint_name);
END LOOP;
END;
action=push
3:23PM DBG TRUNCATE TABLE DD01405_COMPETENCE action=push
3:23PM FTL Fatal error stop the push command error="ORA-02266: unique/primary keys in table referenced by enabled foreign keys" action=push
version : cgifr/lino:v2.0-oracle
formation
to oracle database echo "" | lino -v 5 push truncate --table HOPHJOUP formation
1:03PM INF Logger level set to trace
1:03PM INF Start LINO color=auto debug=false log-json=false verbosity=5
1:03PM INF Push mode catch-errors= commitSize=500 disable-constraints=false table=HOPHJOUP
1:03PM WRN missing table HOPHJOUP in tables.yaml action=push
1:03PM DBG call Push with mode truncate action=push
1:03PM DBG open table with mode truncate action=push
1:03PM DBG TRUNCATE TABLE HOPHJOUP action=push
1:03PM DBG transaction committed action=push
1:03PM FTL Fatal error stop the push command error="unexpected end of JSON input" action=push
The table HOPHJOUP
is empty
1:06PM INF Logger level set to trace
1:06PM INF Start LINO color=auto debug=false log-json=false verbosity=5
1:06PM INF Push mode catch-errors= commitSize=500 disable-constraints=false table=HOPHJOUP
1:06PM TRC building table {HOPHJOUP [MATRI DAT] []} action=push
1:06PM DBG call Push with mode truncate action=push
1:06PM DBG transaction committed action=push
unexpected end of JSON input
The table HOPHJOUP
is unchanged
Windows server without DOCKER image
follow instruction in 'https://github.com/CGI-FR/LINO'
lino dataconnector add source postgresql://postgres:@localhost:5432/postgres?sslmode=disable
warning is displayed :
["warn: password should not be included in URI, use --password-from-env or --password"](warn: password will be stored unencrypted in ~/.lino/credentials.yaml, configure a credential helper to remove this warning. See https://github.com/docker/docker-credential-helpers)
in case of request sent outside of a docker container, it is strange to receive a warning relater to DOCKER. Is it possible to not display this warning when there is no DOCKER image ?
Mainly if there is a real risk to have a SECRET unencrypted displayed somewhere with this syntax, the request should be rejected .
Converting actual logs to structured logs
* `--debug` This flag complete the logs with debug information (source file, line number).
* `--log-json` Set this flag to produce JSON formatted logs
Lors d'un push LINO dans une base oracle, la sortie log est la suivant:
10:18AM DBG INSERT INTO ................. action=push
10:18AM TRC ................. action=push
10:18AM DBG close statement insert action=push
10:18AM DBG close statement insert action=push
10:18AM DBG close statement insert action=push
10:18AM DBG close statement insert action=push
10:18AM DBG transaction committed action=push
sql: expected 0 arguments, got 4 (No error capture configured)
L'erreur intervient pour une relation lorsqu'il y a 3 enfants à push (pas de casse pour un enfant)
Windows Computer
Postgresql db with table ListX (table name include UPPER CASE character and no foreign keys)
lino dataconnector add DB_prod --read-only --password postgresql://postgres@localhost:5432/DB_prod?sslmode=disable
lino relation extract DB_prod
lino table extract DB_prod
lino id create "ListX"
no table named ListX
whereas in file tables.yml, the table ListX is correctly displayed
Whatever the case of table name, the lino command should work.
NB: After call With @adrienaury
1/2: In case of no relation for request "lino id create "ListX", the feedback shoud propose to execute the command "lino pull --table "listx" DB_prod "
2/2 for request "lino pull --table "listx" DB_prod | jq" the reply is "3:07PM FTL Fatal error stop the pull command error="pq: la relation « listx » n'existe pas" action=pull" => investigation must be done on this error.
extract from tables.yml
We need to pseudonymize a files table, by generating a new content (column data) based on the value of the name
column.
We use this table.yaml file (note the data
column is not selected because we don't need it as it is replaced by new value).
- name: files
keys:
- id
columns:
- name: id
- name: name
The column data
is added and generated by PIMO.
On the push, there is a problem because the value of data
is in base64 in the JSON stream, so we need to update the table.yaml file
- name: files
keys:
- id
columns:
- name: id
- name: name
- name: data
export: binary
Now the push is OK, the data values are treated as base64. But the pull will now extract the column data from the source database. This is a concern about performance, because this extraction is costly and useless.
Currently the import
property can only impact the in-memory data type (int, float, string, ...). But this property could (should?) also enable the possibility to impact how the data is read from the JSON stream.
A way to do this without breaking compatibility is to continue the support of import: []byte
, import: string
, etc... and add the support of import: binary
, import: datetime
, etc... (same options as export
properties). To set the data type and the format at the same time, this king of value can be used for the import
property : import: binary(int64)
(i.e. : import: format(type)
).
So our example become :
- name: files
keys:
- id
columns:
- name: id
- name: name
- name: data
import: binary
We need to replace a field that is part of a primary key on a table.
The original pulled object from lino pull
command is
{
"PKFIELD": "orig"
}
The modified version used in lino push update
command is
{
"PKFIELD": "newvalue"
}
We have access to a cache file in jsonl format on the filesystem
{"key": "orig", "value": "newvalue"}
{"key": "orig2", "value": "newvalue2"}
{"key": "orig3", "value": "newvalue3"}
lino push update
should allow using the cache file to generate the following update query
UPDATE table_name
SET PKFIELD = "newvalue"
WHERE PKFIELD = "orig"; # <= orig value recovered from cache file
lino push delete
should allow using the cache file to generate the following delete query
DELETE FROM table_name
WHERE PKFIELD = "orig"; # <= orig value recovered from cache file
Add a pk-translation
flag
lino push update bdd --pk-translation PKFIELD=cache.json < newvalues.jsonl
lino push update bdd --pk-translation PKFIELD=cache.json --pk-translation PKFIELD2=cache2.json < newvalues.jsonl
Is it possible to add at the beginning of the WEB URL for LINO( also in PIMO?) a link to a QUICKSTART. This Quickstart page should include installation procedure/scripts in order to quickly install Lino/Pimo tools without needing any licence. For windows computer a script (or link to a script) should be available to help install the latest version. See example in attached Powershell script. Warning : in powershell sript line (.\7Zip4Powershell\2.4.0\7Zip4Powershell.psd1") should be updated as the version '2.4.0' may be updated.
CGI_Functions.ps1.txt
Install_Lino.ps1.txt
Install_Pimo.ps1.txt
In addition , is it possible to add in this quickstart another methods like the LINO-toolbox including example to use this Docker Image ?
Thanks.
Some simple sql queries have better optimization than LINO'pull algorithm (such as DISTINCT).
This is a proposal to add a new sql
action. SQL query is read from standard input and result is dump as jsonline in standard output.
$ echo "select * from public.customer" | lino sql source
{"id" : 1, "name": "Robert"}
{"id" : 2, "name": "Nick"}
...
Hi,
A user could not connect to an Oracle Database after using a lino dc add
command, with a password containing a #.
Could it be possible that the yaml file resulting from the command is interpreting it as a comment?
Is there any fix or alternative way of using it?
Thanks,
Postgresql db with dvdrental image
table Film contains some columns with Numeric type: rating, rental_rate,replacement_cost
lino dataconnector add dvdrental --read-only postgresql://postgres@localhost:5432/dvdrental?sslmode=disable --password
lino relation extract dvdrental
lino table extract dvdrental
lino id create "film"
lino pull dvdrental | jq
all numeric column are not displaying numeric values.
Example of invalid numeric value:
"rating": "TkMtMTc=",
The LINO tool should take into account the column data type in order to display the correct numeric values in the PULL.
current workaround - which is hard to implement for huge databases
Manually complete the file 'tables.yaml' with all columns and export:numeric for numeric column and the correct format for other format
https://pawssource.ent.cgi.com/gitlab/wse/GRALPTEST/anonymisation/poc_lino_postgres_dvdrental
"description": "A Fateful Reflection of a Moose And a Husband who must Overcome a Monkey in Nigeria",
"film_id": 133,
"fulltext": "J2NoYW1iZXInOjEgJ2ZhdGUnOjQgJ2h1c2JhbmQnOjExICdpdGFsaWFuJzoyICdtb25rZXknOjE2ICdtb29zJzo4ICdtdXN0JzoxMyAnbmlnZXJpYSc6MTggJ292ZXJjb20nOjE0ICdyZWZsZWN0Jzo1",
"language_id": 1,
"last_update": "2013-05-26T14:50:58.951Z",
"length": 117,
"rating": "TkMtMTc=",
"release_year": 2006,
"rental_duration": 7,
"rental_rate": "NC45OQ==",
"replacement_cost": "MTQuOTk=",
Description:
Currently, users have reported anomalies when using the LINO tool, particularly regarding the automatic detection of column types during data extraction. When extracting data from databases such as PostgreSQL, numeric columns are not recognized correctly, leading to incorrect representations of numeric values in the output JSON format (#244).
Problem:
Numeric columns are not displaying correct numeric values in the output JSON format. For instance, a value like "rating": "TkMtMTc="
is incorrect.
Expected Outcome:
The LINO tool should consider the column data type during extraction to ensure that numeric values are displayed correctly in the output JSON format.
Additional Information:
tables.yaml
file to include column information, specifying export: numeric
for numeric columns. However, this approach is cumbersome, especially for large databases.Proposed Solution:
Enhance the lino table extract
command to automatically initialize column information, including data types, based on the database schema. This improvement would streamline the data extraction process and ensure accurate representation of numeric values in the output JSON format.
Suggested Implementation Steps:
lino table extract
command to analyze the database schema and retrieve column information.tables.yaml
file with column information, including the appropriate data type for each column.Benefits:
Related Issue:
This enhancement request is related to issue #244 , which addresses similar challenges in data extraction and representation.
Hello,
We tried yours tools, but we've been quickly stopped by this problem.
when executing "lino relation extract source" we got "no extractor found for database type"
While reading doc, we found that LINO doesn't support MariaDB or/and MySQL.
How to add this support ?
How to Reproduct
run docker run -v /home/vagrant/LINO/LINO:/home/lino lino_lino dataconnector add source mysql://<ip>:<port>?sslmode=disable -U USER -P PASS -s myschema
run docker run --env PASS=password --env USER=user -v /home/vagrant/LINO/LINO:/home/lino lino_lino table extract source
LIMIT keyword is not a valid for DB SQL. dilaect.
Lors d'un push lino sur une base oracle avec une colonne BLOB, nous avons une erreur oracle:
ORA-01461 : https://stackoverflow.com/questions/36717990/updating-a-blob-column-ora-01461-can-bind-a-long-value-only-for-insert-into-a
Example de fichier à push:
lino_export_duplicate.zip
Remarque: Au préalable nous avons du augmenter la taille du buffer (scanner.Buffer() dans NewJSONRowIterator())
A discuter ensemble
id
: string
identifiant unique du messageaction
: string
valeurs possible : "ping", "extract_tables", "extract_relations", "pull_open", "push_open", "push_data", "push_commit", "push_close"payload
: Payload
payload différent en fonction du type de l'actionid
: string
identifiant du message d'origineError|null
bool
retourne true
si il y a d'autres messages à suivrepayload
: Payload
string
: nom de la table[]string
: liste des colonnesFilter
number
map[string]JSONValue
string
bool
lino pull duplicate values.
New --distinct
parameter (-D
for short) to activate DISTINCT SQL keyword.
Add new subcommands to the lino table
command, to easily modify the tables.yaml configuration.
version: v1
tables:
- name: public.actor
keys:
- actor_id
$ lino table add-column public.actor first_name
successfully added column first_name to public.actor table
$ lino table add-column public.actor last_name
successfully added column last_name to public.actor table
version: v1
tables:
- name: public.actor
keys:
- actor_id
columns:
- name: "first_name"
- name: "last_name"
$ lino table remove-column public.actor first_name
successfully removed column first_name to public.actor table
$ lino table set-column public.actor last_name string
successfully added column last_name to public.actor table
version: v1
tables:
- name: public.actor
keys:
- actor_id
columns:
- name: "last_name"
type: string
To extract a large part of table's lines. It's faster to fetch all lines and apply filter after.
This is a proposal to add a flag --scan
to the lino pull
action.
--scan
is activable only for one table pull (mainly with the --table
flag) and execute filter ( -f
and -F
) in memory.
Currently lino silently ignore insertion of object already in database.
This is due to the ON CONFLICT DO NOTHING
clause used by the postgresql dataconnector.
Note: with ingress-descriptor this applies to the start table only
Similar to what is done with PIMO (https://github.com/CGI-FR/PIMO/tree/main/demo/demo9#fields) I think it would be nice to have a list of fields that are always printed when pulling/pushing.
Examples
Another "nice to have" is a small result of the processing, for example with PIMO :
+-------+--------------------+--------------------+---------------------+--------+-------------+----------+------------+-------------+----------+
| level | stats:ignoredPaths | stats:skippedLines | stats:skippedFields | return | config | duration | input-line | output-line | message |
+-------+--------------------+--------------------+---------------------+--------+-------------+----------+------------+-------------+----------+
| info | 0 | 0 | 0 | 0 | masking.yml | 2.1671ms | 1 | 3 | End PIMO |
+-------+--------------------+--------------------+---------------------+--------+-------------+----------+------------+-------------+----------+
Originally posted by @adrienaury in #18 (comment)
As a user of the lino
command line tool, I would like to request the addition of support for configuring the stats templates and URL using environment variables LINO_STATS_URL
and LINO_STATS_TEMPLATE
. Currently, the stats template and URL can be specified using the --statsTemplate
and --stats
flags. This feature would allow for more flexible configuration of the stats.
example usage:
export LINO_STATS_URL=http://localhost:8080/stats
export LINO_STATS_TEMPLATE='{"software":"LINO","stats":{{ .Stats }}}'
lino pull source --limit 10 > customers.jsonl
We could add a parameter (i.e --stats
or -S
) which would output execution stats in a JSON dump file (instead of having them at the end of the logs output)
we could also add a 'limit' keyword to cut big branches
Originally posted by @youen in #145 (comment)
follow instruction in 'https://github.com/CGI-FR/LINO'
lino dataconnector add source postgresql://postgres:@localhost:5432/postgres?sslmode=disable
warning is displayed :
"warn: password should not be included in URI, use --password-from-env or --password"
'https://github.com/CGI-FR/LINO' should be updated to remove password from connection string and replace it with parameter '--password'.With this parameter, the operator is asked to enter the password in a secured way.
Since integration tests were added as Github Action, the devcontainer is broken with VSCode (if affect only VSCode, not Github Actions):
Logs
Step 20/21 : RUN useradd -ms /bin/bash vscode
[2021-05-07T20:53:19.006Z] ---> Running in dd54d23b6959
[2021-05-07T20:53:19.321Z] useradd: user 'vscode' already exists
[2021-05-07T20:53:19.637Z] ERROR: Service 'vscode' failed to build : The command '/bin/sh -c useradd -ms /b
in/bash vscode' returned a non-zero code: 9
The line in error
RUN useradd -ms /bin/bash vscode
Truncate statements are executed in random order and fail if parents tables are truncate before children tables.
As a user of lino command, I want to disable foreign key constraints on a given table or ingress-descriptor, without re-enabling immediately. I want to be able to enable disabled contraints later.
The new command lino constraint
will handle listing/enabling/disabling of foreign key contraints.
List constraints on foreign key pointing to <table_name>
$ lino constraints list --table <table_name>
{"table": "ACTION", "constraint_name": "FK_ACTION_01"}
List constraints on foreign key pointing to any table in the ingress-descriptor
$ lino constraints list --i <ingress-descriptor-name>
{"table": "ACTION", "constraint_name": "FK_ACTION_01"}
List and disable constraints on foreign key pointing to <table_name>
$ lino constraints disable --table <table_name> > constraints.jsonl
1 constraint disabled
List and disable constraints on foreign key pointing to any table in the ingress-descriptor
$ lino constraints disable -i <ingress-descriptor-name> > constraints.jsonl
1 constraint disabled
Enable constraints previously disabled
$ lino constraints enable < constraints.jsonl
1 constraint enabled
Not all use case have to export primary keys or foreign keys. Primary keys and foreign keys are implicitly exported as nested json object. Other use have to export all columns in the table.
We can add an option to specify the pull behavior.
Example :
version: v1
tables:
- name: public.actor
keys:
- actor_id
columns:
- name: "first_name"
type: string
- name: "last_name"
type: string
export: all
Originally posted by @youen in #33 (comment)
Ingress Descriptor defines a cluster of two table (country->city).
All other relations are disabled.
When using command lino push truncate
this happens:
{"action":"truncate","table":"payment_p2007_01"}
{"action":"truncate","table":"payment_p2007_04"}
{"action":"truncate","table":"payment_p2007_06"}
{"action":"truncate","table":"rental"}
{"action":"truncate","table":"payment_p2007_03"}
{"action":"truncate","table":"payment_p2007_02"}
{"action":"truncate","table":"payment_p2007_05"}
{"action":"truncate","table":"payment"}
{"action":"truncate","table":"store"}
{"action":"truncate","table":"staff"}
{"action":"truncate","table":"customer"}
{"action":"truncate","table":"address"}
{"action":"truncate","table":"inventory"}
{"action":"truncate","table":"film_actor"}
{"action":"truncate","table":"city"}
{"action":"truncate","table":"film"}
{"action":"truncate","table":"film_category"}
{"action":"truncate","table":"category"}
{"action":"truncate","table":"language"}
{"action":"truncate","table":"actor"}
{"action":"truncate","table":"country"}
All of the tables mentionned in the ingress decriptor are impacted.
Change the following method, so it returns only tables that are parts of a active relation (lookup: true).
Lines 45 to 82 in 14d1006
The goal of this proposal is to enhance LINO by adding support for DuckDB as a database source. DuckDB is a lightweight, embeddable analytical database that is gaining popularity for its speed and efficiency. Integrating DuckDB into LINO will provide users with the flexibility to extract relations and tables from DuckDB databases, expanding the range of supported databases.
extractor_duckdb.go
that generates the SQL query to fetch all relations in the specified schema.extractor_postgres.go
as a reference.// internal/infra/relation/extractor_duckdb.go
package relation
// DuckDBDialect represents the DuckDB SQL dialect.
type DuckDBDialect struct{}
// SQL returns the SQL query to fetch all relations in the named schema.
func (d DuckDBDialect) SQL(schema string) string {
return fmt.Sprintf("SELECT table_name FROM information_schema.tables WHERE table_schema = '%s'", schema)
}
extractor_duckdb.go
that generates the SQL query to fetch all tables in the specified schema.extractor_postgres.go
as a reference.// internal/infra/table/extractor_duckdb.go
package table
// DuckDBDialect represents the DuckDB SQL dialect.
type DuckDBDialect struct{}
// SQL returns the SQL query to fetch all tables in the named schema.
func (d DuckDBDialect) SQL(schema string) string {
return fmt.Sprintf("SELECT table_name FROM information_schema.tables WHERE table_schema = '%s'", schema)
}
SQLDialect
interface in datadestination_duckdb.go
to define specifics about the DuckDB SQL dialect.datadestination_postgres.go
for guidance.// internal/infra/push/datadestination_duckdb.go
package push
// DuckDBDialect represents the DuckDB SQL dialect.
type DuckDBDialect struct{}
// Implement the SQLDialect interface here.
// Example: func (d DuckDBDialect) CreateUpdateStatement() string { ... }
SQLDialect
interface in datasource_duckdb.go
to define specifics about the DuckDB SQL dialect.datasource_postgres.go
for guidance.// internal/infra/pull/datasource_duckdb.go
package pull
// DuckDBDialect represents the DuckDB SQL dialect.
type DuckDBDialect struct{}
// Implement the SQLDialect interface here.
// Example: func (d DuckDBDialect) CreateUpdateStatement() string { ... }
Once the above files are implemented, users will be able to add DuckDB as a source to LINO by running commands similar to those provided in the example for MySQL:
docker run -v /path/to/lino:/home/lino lino_lino dataconnector add source duckdb:///path/to/duckdb/database -s duckschema
docker run -v /path/to/lino:/home/lino lino_lino table extract source
This proposal aims to extend LINO's capabilities and offer users more options for data extraction. Your feedback and contributions to this feature request are highly appreciated.
Please feel free to open discussions and ask questions for further clarification.
Youen
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.