npolar / marine-db Goto Github PK
View Code? Open in Web Editor NEWhttps://doi.org/10.21334/marine-db
https://doi.org/10.21334/marine-db
2 "On-ice CTD-047", slett den fra station On-ice CTD-046, 28.04.2015
2 "N-ICE2015/SWN-068", slett den som er fra Refrozen lead thin 14.05.2015
2 "N-ICE2015/POC-530", slett den som er fra station Nutrient Experiment 4
2 "N-ICE2015/NUT-915", den som er fra station Nutrient experiment 4 kan slettes
2 "N-ICE2015/NUT-914", den som er fra station Nutrient experiment 4 kan slettes
2 "N-ICE2015/NUT-913", den som er fra station Nutrient experiment 4 kan slettes
2 "N-ICE2015/NUT-912", den som er fra station Nutrient experiment 4 kan slettes
2 "N-ICE2015/NUT-699", slett den som er fra station On-ice CTD-083
2 "N-ICE2015/NUT-698", slett den som er fra station On-ice CTD-083
2 "N-ICE2015/NUT-697", slett den som er fra station On-ice CTD-083
2 "N-ICE2015/NUT-670", slett den som er fra depth: 130-134 cm
2 "N-ICE2015/NUT-289", den som er fra station On-ice CTD-040 skal endres til NUT-389
2 "N-ICE2015/NUT-288", den som er fra station On-ice CTD-040 skal endres til NUT-388
2 "N-ICE2015/NUT-286", den som er fra station On-ice CTD-040 skal endres til NUT-386
2 "N-ICE2015/IAT-255", slett den som er fra station Nutrient Experiment 2
2 "N-ICE2015/IAT-230", duplicat er reel. Kan man kalle den fra Ridge thin ice for IAT-230b?
2 "N-ICE2015/IAT-229", duplicat er reel. Kan man kalle den fra Ridge thin ice for IAT-229b?
2 "N-ICE2015/FCF-109", kan slettes
2 "N-ICE2015/DOX-232", kan slettes
2 "N-ICE2015/DOC-200", kan slettes
2 "N-ICE2015/DOC-004", kan slettes
2 "N-ICE2015/CHL-872", den fra station Sediment trap hole kan slettes
2 "N-ICE2015/CHL-414", slette den fra Experimental Site UV-, 13.05.2015
2 "N-ICE2015/CHL-173", slett den fra Lance Bow, 13.03.2015
2 "N-ICE2015/CHL-105", slett den fra stasjon D4, 10.03.2015
2 "N-ICE2015/BSI-058", slett den som er fra leg 2, 12.03.2015, Super Site Coring
2 "MOSJ2015/CHL-43", den fra V10 endres til CHL-48
2 "MOSJ2015/CHL-42", den fra V10 endres til CHL-47
2 "MOSJ2015/CHL-41", den fra V10 endres til CHL-46
2 "MOSJ2015/CHL-40", den fra V10 endres til CHL-45
2 "MOSJ2015/CHL-39", den fra V10 endres til CHL-44
$ cat data/master/sample/*2015*.ndjson | ndjson-filter '!/2015/.test(d.sample)' | ndjson-map '[d.expedition,d.sample]'
["N-ICE2015","CH4-349"]
["N-ICE2015","DIC417"]
["N-ICE2015","Ship_CTD_024"]
["N-ICE2015","O10-572"]
["N-ICE2015","O10-573"]
["N-ICE2015","O10-574"]
["N-ICE2015","O10-575"]
["N-ICE2015","O10-576"]
["N-ICE2015","O10-577"]
["N-ICE16","N-ICE16/NUT-562"]
["N-ICE17","N-ICE17/NUT-563"]
["N-ICE2015","On-ice_CTD-077_"]
["N-ICE2015","MAA-239cC"]
$ cat data/master/sample/*2014*.ndjson | ndjson-filter '!/2014/.test(d.sample)' | ndjson-map '[d.expedition,d.sample]'
["MOSJ2014","NUT0-77"]
["N-ICE2014","Handnet-01"]
Quite a few fieldNumbers are composite:
$ curl "https://v2-api.npolar.no/darwin-core/sample/_search?type=feed&page=..&and=expedition:GlacierFront2017&show=fieldNumber" | ndjson-filter 'd.fieldNumber.length != 7' | ndjson-map 'd.fieldNumber' | sort | uniq -c
1 "AMM-001H_AMM-002H_AMM-003H"
1 "AMM-004H_AMM-005H_AMM-006H"
1 "AMM-007H_AMM-008H_AMM-009H"
1 "AMM-010H_AMM-011H_AMM-012H"
1 "AMM-013H_AMM-014H_AMM-015H"
1 "AMM-016H_AMM-017H_AMM-018H"
1 "AMM-019H_AMM-020H_AMM-021H"
1 "AMM-022H_AMM-023H_AMM-024H"
1 "AMM-025H_AMM-026H_AMM-027H"
1 "AMM-028H_AMM-029H_AMM-030H"
1 "AMM-031H_AMM-032H_AMM-033H"
1 "AMM-034H_AMM-035H_AMM-036H"
1 "AMM-040H_AMM-041H_AMM-042H"
1 "AMM-043H_AMM-044H_AMM-045H"
1 "AMM-046H_AMM-047H_AMM-048H"
1 "AMM-049H_AMM-050H_AMM-051H"
1 "AMM-052H_AMM-053H_AMM-054H"
1 "AMM-058H_AMM-059H_AMM-060H"
1 "AMM-061H"
1 "AMM-062H"
1 "AMM-063H"
1 "AMM-111H_AMM-112H_AMM-113H"
1 "AMM-114H_AMM-115H_AMM-116H"
1 "CC2_R1_10M_CC2_R2_10M"
1 "CC4_R1_10M_CC4_R2_10M"
1 "CC5_R1_10M_CC5_R2_10M"
1 "CDO-45"
1 "CDO-46"
1 "CPN2_R1_10M_CPN2_R2_10M"
1 "CPN4_R1_10M;"
1 "CPN5_R1_5M"
1 "CPS2_R1_10M_CPS2_R2_10M"
1 "CPS4_R1_10M;"
1 "GS1_R1"
1 "GS2_R1"
1 "GS_R1_GS_R2"
1 "KC6_R1_10M_KC6_R2_10M"
1 "KC7_R1_10M_KC7_R2_10M"
1 "KPM2_R1_10M_KPM2_R2_10M"
1 "KPM4_R1_10M_KPM4_R1_10M"
1 "KPM5_R1_10M_KPM5_R2_10M"
1 "KPM5_R1_1M_KPM5_R2_1M"
1 "KPM5_R1_50M_KPM5_R2_50M"
1 "KPN2_R1_10M_KPN2_R2_10M"
1 "KPN4_R1_10M_KPN4_R2_10M"
1 "KPN5_R1_10M_KPN5_R2_10M"
1 "KPN6_R1_10M"
1 "KPN6_R1_30M"
1 "KPN6_R1_3M"
1 "KPNS2_R1_10M_KPS2_R2_10M"
1 "KPS2_R1_10M_KPS2_R2_10M"
1 "KPS4_R1_10M_KPS4_R2_10M"
1 "KPS5_R1_KPS5_R2"
1 "KPS6_R1_5M"
1 "MOSJNUT-068"
1 "MOSJNUT-069"
1 "MOSJNUT-070"
5 "SAL_TEMP_PH"
4 "UNDEFINED"
1 "URE-005_URE-006"
1 "URE-007_URE-008"
1 "URE-073_URE-074"
1 "URE-079_URE-088"
1 "URE-081_URE-082"
1 "URE-083;URE-084"
1 "URE-085_URE-086"
1 "URE-087_URE-088"
1 "URE-091_URE-092"
1 "URE-093_URE-094"
1 "URE-095_URE-096"
1 "URE-099,URE-100"
1 "URE-101_URE-102"
1 "URE-105_URE-106"
1 "URE-107_URE-108"
1 "URE-109_URE-110"
1 "URE-111_URE-112"
1 "URE-113_URE-114"
1 "URE-117_URE-118"
1 "URE-89_URE-090"
N-ICE2015 er litt detektivarbeid så jeg trenger litt tid for å få det ferdig
2017
37 "Sal_Temp_pH", slettes
8 "undefined", slettes
4 "KpN4_R1_10M; KpN4_R2_10M", den første er riktig de 3 neste skal endres til "KpN2_R1_10M; KpN2_R2_10M", "KpM2_R1_10M; KpM2_R2_10M", "KpS2_R1_10M; KpS2_R2_10M" i denne rekkefølgen
2 "R6_R1_S; R6_R2_S", jeg finner de ikke i excel fil men kan antagelig slettes
2 "R6_R1_M; R6_R2_M", jeg finner de ikke i excel fil men kan antagelig slettes
2 "R6_R1_B; R6_R2_B", jeg finner de ikke i excel fil men kan antagelig slettes
2 "MOSJ2017/DOX-009", dobbel registrering en av de kan slettes
2 "MOSJ2017/DOX-008", dobbel registrering en av de kan slettes
2 "MOSJ2017/DOX-007", dobbel registrering en av de kan slettes
2 "MOSJ2017/DIC-091", den som er fra R4 skal slettes
2 "GlacierFront2017/SAL-386", den fra stasjon KpM6 skal endres til SAL-387
2 "GlacierFront2017/PHT-033", den fra stasjon CpS3 kan slettes
2 "GlacierFront2017/OXY-161", den fra stasjon CpS4 skal være OXI-163
2 "GlacierFront2017/OXY-160", den fra stasjon CpS4 skal være OXI-162
2 "GlacierFront2017/OXY-021", den fra stasjon Kb5 skal være OXY-022
2 "GlacierFront2017/DIC-079", den fra KpM1 kan slettes
2 "GlacierFront2017/DIC-078", den fra KpM1 kan slettes
2 "GlacierFront2017/DIC-077", den fra KpM1 kan slettes
2 "GlacierFront2017/DIC-076", den fra KpM1 kan slettes
2 "GlacierFront2017/DIC-033", jeg klarer ikke å se hva som er feil
2 "GlacierFront2017/DIC-031", jeg klarer ikke å se hva som er feil
2016
6 "undefined", slettes
2 "MOSJ2016/ZOT-063", den fra stasjon R5 skal være ZOT-067
2 "MOSJ2016/ZOT-062", den fra stasjon R5 skal være ZOT-066
2 "MOSJ2016/ZOT-061", den fra stasjon R5 skal være ZOT-065
2 "MOSJ2016/ZOT-018", dobbel registrering en av de kan slettes
2 "MOSJ2016/ZOT-017", dobbel registrering en av de kan slettes
2 "MOSJ2016/ZOT-016", dobbel registrering en av de kan slettes
2 "MOSJ2016/ZOT-015", dobbel registrering en av de kan slettes
2 "MOSJ2016/ZOT-014", dobbel registrering en av de kan slettes
2 "MOSJ2016/PAB-070", den fra stasjon R2 kan slettes
2 "MOSJ2016/MAA-045", prøvenummer er forskjøvet for stasjon R5, MAA- 045 skal være MAA-046, MAA-046 skal være MAA-047 & MAA-047 skal være MAA-048
2 "MOSJ2016/MIT-015", jeg klarer ikke å se hva som er feil
2 "MOSJ2016/CDO-054", prøvenummer for stasjon V12 har blitt forskjøvet, alle tall skulle vært ett tall lavere (048=047, 049=048, 050=049, 051= 050, 052= 051, 053=052, 054=053)
2 "GlacierFront2016/NUT-243", prøvenummer for stasjon GF-04 er blitt forskjøvet, alle tall skal være to høyere (042 = 044, 043=045, 044=046)
2 "GlacierFront2016/NUT-242", samme som NUT-043
2014
2 "MOSJ2014/FCM-068", endres til FCM-052
2 "MOSJ2014/FCM-067", endres til FCM-053
2 "MOSJ2014/FCM-066", endres til FCM-054
2 "MOSJ2014/FCM-065", endres til FCM-055
2 "MOSJ2014/FCM-064", endres til FCM-056
2 "MOSJ2014/FCM-063", endres til FCM-057
2 "MOSJ2014/FCM-062", endres til FCM-058
2 "ICE2014/ZOT-076", den fra R1b kan slettes
Telonema
Telonema subtile
Kingdom: Chromista, Phylum: Telonemia
Ebria tripartite
Kingdom: Chromista, Phylum: Cercozoa, Class: Thecofilosea
Monosiga
Parvicobicula
Bicosta minor
Bicosta spinifera
Calliacantha natans
Diaphanoeca pedicellata
Monosiga marina
Polyfibula sphyrelata
Salpingoeca inquillata
-> class Choanoflagellatea
Samplelogs need fixing and/or gear schema needs to be updated
1879 ERROR invalid gear On-ice CTD
952 ERROR invalid gear Niskin
141 ERROR invalid gear NO-GEAR
90 ERROR invalid gear Multinet
81 ERROR invalid gear Swimnet 200 µm
50 ERROR invalid gear Handnet 20 μm
49 ERROR invalid gear CTD
47 ERROR invalid gear Divers
46 ERROR invalid gear Limnos
45 ERROR invalid gear Multinet 64 μm
35 ERROR invalid gear MIK
34 ERROR invalid gear Van veen grab
20 ERROR invalid gear MIK 1500 μm
15 ERROR invalid gear Ice core 9 cm
14 ERROR invalid gear WP2 200 μm
7 ERROR invalid gear WP3-1000µm
7 ERROR invalid gear WP2-200µm
6 ERROR invalid gear Hand
3 ERROR invalid gear CO2 chambers
3 ERROR invalid gear
cat data/master/sample-db.ndjson | ndjson-filter 'd.latitude > 90' | ndjson-map '[d.sample,d.latitude,d.longitude,d.depth_from]' | sort | uniq -c
39 ["N-ICE2015/SWN-098",91.914,12.2654,0]
39 ["N-ICE2015/SWN-099",91.914,12.2654,0]
39 ["N-ICE2015/SWN-100",91.914,12.2654,0]
39 ["N-ICE2015/SWN-101",91.914,12.2654,1]
39 ["N-ICE2015/SWN-102",91.914,12.2654,1]
39 ["N-ICE2015/SWN-103",91.914,12.2654,1]
39 ["N-ICE2015/SWN-104",91.914,12.2654,5]
39 ["N-ICE2015/SWN-105",91.914,12.2654,5]
Checked 218 names, found 205 in local taxonomy, unknown: 13
[ 'Acartia clausii AF',
'Cyclopina schneideri (cf.)',
'Eusirus (cf. holmi)',
'Frittilaria',
'Halitholus cirratus (cf.)',
'Isopoda Bopyridae',
'Jashnovia brevis',
'Oithona tlantica',
'Onisimus lotoralis',
'Triconia (=Oncaea)',
'Triconia (=Oncaea) borealis',
'Triconia conifera (cf.)',
'Typhloscolecidae (cf. Travisiopsis)' ]
{ scientificName: 'Triconia (=Oncaea) borealis' } 'Not accepted in WoRMS/GBIF and no local match > 0.8 similarity'
{ scientificName: 'Triconia (=Oncaea)' } 'Not accepted in WoRMS/GBIF and no local match > 0.8 similarity'
{ scientificName: 'Isopoda Bopyridae' } 'Not accepted in WoRMS/GBIF and no local match > 0.8 similarity'
{ unknown: 'Jashnovia brevis',
candidate:
{ scientificName: 'Jaschnovia brevis',
scientificNameAuthorship: '(Farran, 1936)',
similarity: 0.896551724137931 } }
{ scientificName: 'Typhloscolecidae (cf. Travisiopsis)' } 'Not accepted in WoRMS/GBIF and no local match > 0.8 similarity'
{ unknown: 'Halitholus cirratus (cf.)',
candidate:
{ scientificName: 'Halitholus cirratus',
scientificNameAuthorship: 'Hartlaub, 1913',
similarity: 0.8717948717948718 } }
{ unknown: 'Oithona tlantica',
candidate:
{ scientificName: 'Oithona atlantica',
scientificNameAuthorship: 'Farran, 1908',
similarity: 0.9655172413793104 } }
{ unknown: 'Onisimus lotoralis',
candidate:
{ scientificName: 'Onisimus litoralis',
scientificNameAuthorship: '(Krøyer, 1845)',
similarity: 0.875 } }
{ unknown: 'Cyclopina schneideri (cf.)',
candidate:
{ scientificName: 'Cyclopina schneideri',
scientificNameAuthorship: 'Scott T., 1904',
similarity: 0.8780487804878049 } }
{ unknown: 'Acartia clausii AF',
candidate:
{ scientificName: 'Acartia clausi',
scientificNameAuthorship: 'Giesbrecht, 1889',
similarity: 0.8888888888888888 } }
{ scientificName: 'Frittilaria' } 'Not accepted in WoRMS/GBIF and no local match > 0.8 similarity'
{ unknown: 'Triconia conifera (cf.)',
candidate:
{ scientificName: 'Triconia conifera',
scientificNameAuthorship: '(Giesbrecht, 1891)',
similarity: 0.8571428571428571 } }
{ scientificName: 'Eusirus (cf. holmi)' } 'Not accepted in WoRMS/GBIF and no local match > 0.8 similarity'
./bin/csv-to-ndjson < data/deposit/taxonomy/phytoplankton-2017-12-22.csv | sort | uniq | ndjson-map 'd.species=d.species.replace(/\ssp\.$/, ""),d' | sort | uniq | wc -l
405 lines => 404 species => 394 after removing " sp."
267 names matches against taxon-db
128 non-matches
To-add:
[ 'Chrysophyta',
'Dinophysis rotundata',
'Dinophyta',
'Favella meunieri',
'Gymnodinium pulchellum',
'Heterokontophyta',
'Myrionecta rubra',
'Nitzschia arctica',
'Pleurochrysis carterae',
'Prorocentrum minimum',
'Protoperidinium marielebourae',
'Scuticociliatida',
'Strobilidium spiralis',
'Strombidium spirale' ]
After { 'taxon-db': 641, candidates: 393, 'to-check': 77, added: 14 }
Unwanted #REF! in original data:
~/npolar/marine-db$ grep "REF!" -r data/deposit | wc -l
46
~/npolar/marine-db$ grep "REF!" -r data/input | wc -l
36
75 TW-ICE2017-handnet.tsv
347 TW-ICE2017-phytoplankton.tsv
This and other 2017 data is already published: https://doi.org/10.21334/npolar.2019.56c2cd62
Jeg har også oppdatert samplelog på N:\Forskning\Marindatabase\DATASET TO UPLOAD\SAMPLELOG
Arcex 2016, PHT-063: Den har antagelig ikke blitt lagt inn fordi den manglet lat & long. Jeg har inkludert lat og long i fil « arcex_2016_samplelog.xlsx».
Expedition: Arcex2016; Sample name: PHT-063; Station: Outside ice station; Latitude: 78.8851; Longitude: 23.2967; Sampling date: 24.05.2016 12:00; Gear: Niskin bottle; Sample type: Phytoplankton taxonomy; Responsible: Philipp Assmy)
ICE10-455: copy info from previous entry “ICE10-454”, sampling depth 10m
MOSJ2014/PHT-077: copy info from previous entry “MOSJ2014/PHT-076”, assumed sampling depth 5m
N-ICE2015/PHT-142 & PHT-143: copy info from previous “PHT-141, assumed sampling depth 15m & 25m. Jeg har lagt til info i “n-ice_2015_leg3_4_samplelog.xlsx»
N-ICE2015/PHT-192:Det manglet expedition derfor har den antagelig ikke blitt lagt inn. Jeg har inkludert expedition i «n-ice_2015_leg5_6_samplelog.xlsx»
ndjson-filter 'd.depth_bottom > 12000' | ndjson-map '[d.sample,d.latitude,d.longitude,d.depth_bottom]' | sort | uniq -c
43 ["06MAR134",78.928,7.782,550496]
43 ["06MAR136",79.033,10.897,325335]
43 ["06MAR145",79.018,10.293,282272]
43 ["06MAR146",79.009,10.2043,266256]
43 ["06MAR147",79.976,11.618,281316]
43 ["ALK09-171",79.04,11.13,250270]
43 ["ALK09-228",78.94,8.54,278352]
43 ["ALK09-229",78.9,7.77,1116352]
43 ["OTI03 LIP19",78.139667,12.857167,290300]
43 ["OTI03 LIP20",78.139667,12.857167,290300]
43 ["OTI04 LIP01",77.205833,29.836,196200]
Add
No match GBIF's WORMS for { scientificName: 'Gymnodinium gracile',
records: 9,
endOfRecords: true }
Best local match: { target: 'Gymnodinium gracilentum',
rating: 0.8947368421052632 }
No match GBIF's WORMS for { scientificName: 'Algirosphaera quadricornu',
records: 3,
endOfRecords: true }
Best local match: { target: 'Algirosphaera', rating: 0.6857142857142857 }
A surprising high number, over 55000 occurrence records, contains 0 as count (organismQuantity).
Now, absence data may also be valuable, but absence is normally not recorded, ie. there is no consistent use of 0 for absent.
Occurrence lines: 146388
Missing sample event metadata: 115
38 Taxon occurrence refers to missing sample Arcex2016/PHT-063
36 Taxon occurrence refers to missing sample ICE10-455
5 Taxon occurrence refers to missing sample MOSJ2014/PHT-077
13 Taxon occurrence refers to missing sample N-ICE2015/PHT-142
18 Taxon occurrence refers to missing sample N-ICE2015/PHT-143
5 Taxon occurrence refers to missing sample N-ICE2015/PHT-192
1538 data/deposit/iopan/protist-biodiversity/mosj_2011.tsv
Complete list,except for N-ICE2015:
["Hornsund-818","Hornsund-820","Arcex2016"]
["Hornsund-820","Hornsund-821","Arcex2016"]
["Storfjorden-859","Storfjorden-860","Arcex2016"]
["Storfjorden-860","Storfjorden-861","Arcex2016"]
["Erik Eriksenstredet-875","Erik Eriksenstredet-877","Arcex2016"]
["Erik Eriksenstredet-877","Erik Eriksenstredet-876","Arcex2016"]
["Polar Front-911","Polar Front-912","Arcex2016"]
["Polar Front-912","Polar Front-913","Arcex2016"]
["Polar Front-929","Polar Front-930","Arcex2016"]
["Polar Front-930","Polar Front-931","Arcex2016"]
["CTD","R5c","MOSJ2012"]
["Kb1","Kb0","MOSJ2015"]
["1","2","UNIS-AB310-2000"]
["old docks","Gåsebu","01M"]
["STATION-MISSING","Kb3","2003UV"]
["Kb3","STATION-MISSING","2003UV"]
["STATION-MISSING","Kb3","2003UV"]
["Kb3","STATION-MISSING","2003UV"]
["Erik Eriksen Strait","Erik Erksen Strait","OTI2003"]
["Kb0","Kb1","BIODAFF-2004"]
["Kb1","Kb2","BIODAFF-2004"]
["Kb2","Kb3","BIODAFF-2004"]
["Kb3","Kb9","BIODAFF-2004"]
["Kb4","Kb5","BIODAFF-2004"]
["Kb5","Kb6","BIODAFF-2004"]
["Kb6","Kb7","BIODAFF-2004"]
["Kb7","Kb8","BIODAFF-2004"]
["Kb10","Kb11","BIODAFF-2004"]
["Kb11","Kb12","BIODAFF-2004"]
["Kb12","Kb13","BIODAFF-2004"]
["Kb13","Kb14","BIODAFF-2004"]
["Kb15","Kb16","BIODAFF-2004"]
["Kb16","Kb17","BIODAFF-2004"]
["Kb17","Kb18","BIODAFF-2004"]
["Kb18","Kb19","BIODAFF-2004"]
["Kb20","Kb21","BIODAFF-2004"]
["Kb21","Kb22","BIODAFF-2004"]
["Kb22","Kb23","BIODAFF-2004"]
["Kb3","Kb5","MariClim-2006"]
["MF3","MF7","Alkekonge-2009b"]
#samples: 24976
#events: 3188
#stationsWithDifferentNameButIdenticalTimeAndPosition: 40
NDJSON of updated taxon-db from #21
ndjson-cat data/deposit/taxonomy/taxon-db.json | ndjson-split > data/input/taxonomy/taxon-db.ndjson
./bin/ndjson-from-csv < data/deposit/taxonomy/functional-groups.tsv | ndjson-map '{name: d.species}' > /tmp/func-names.ndjson
cat data/input/taxonomy/taxon-db.ndjson | ndjson-map 'd.taxon || d.canonicalName' | sort | uniq | ndjson-map '{ name: d}' > /tmp/names.ndjson
~/npolar/marine-db$ ndjson-join --right 'd.name' /tmp/names.ndjson /tmp/func-names.ndjson | grep null [null,{"name":"Coxiella pseudoannulata"}] [null,{"name":"Gymnodinium gaelatum"}] [null,{"name":"Gymnodinium gracilientum"}] [null,{"name":"Karenia brevis"}] [null,{"name":"Protherythropsis vigilans"}] [null,{"name":"Chaetoceros convulutus"}] [null,{"name":"Pseudo-nitzschia pseudodelicatisima"}]
Unknown (not in sample log)
"IAT-018"
"IAT-021"
"IAT-022"
"IAT-023"
"IAT-024"
"IAT-025"
"IAT-026"
"IAT-027"
"IAT-028"
"IAT-029"
"IAT-030"
"IAT-041"
"IAT-072"
"IAT-073"
"IAT-099"
"IAT-100"
"IAT-280"
conrad@nordfjellet:~/npolar/marine-db$ ./bin/sampling-events-mosjify | ndjson-map d.eventID | sort | uniq -cd
3 "08817bd2-4180-5cc3-9b45-6cdf015c891e"
2 "08d55c4d-fce4-5986-976f-2ad844bb0efd"
3 "0f04da54-3719-5ed6-a70e-efcc513f807f"
5 "13c9ac48-e4ee-528b-b4be-648178bbb1a4"
2 "1b645c10-5fee-5dc8-9a0d-8c008a25df47"
3 "1c86f0c0-ec76-5291-953e-b2b9edfe6235"
2 "20cf7938-6fc2-58d5-be69-789df6df61ba"
3 "30db2ee1-65f5-572b-b7d5-a03499e2e37d"
2 "37717336-59cc-5e30-95ba-51dbd88a23ca"
10 "3c6c288e-70db-5abc-9560-6f692c0ba848"
2 "3ef566f3-080e-5c80-9ad8-4a3d3d472741"
2 "43b08d3e-30ae-5333-a9c4-e27e5e90bb03"
2 "5cb4d680-956a-5186-a31a-18592bcdab64"
3 "60beb0a3-72bf-5f75-b33f-3665d3087547"
3 "6419d686-386d-56f0-b71e-a093509f4bc4"
2 "679b0ec0-730e-5530-ae31-e826cdd2daca"
4 "6c99d65c-aee9-5f0b-a2ec-66ef5d468919"
2 "73aa7509-fc6c-5f61-aee5-1629b582ef8f"
11 "83fcdd18-3d5b-501b-936c-51b9f1c7add1"
2 "8e465810-4c5e-58bd-8302-34fba5dc4dc2"
2 "938d1b8b-0f44-5ce3-85a9-23a37c3009c7"
2 "9680d820-ac54-58b7-bcec-fbb749e65670"
2 "9e35b381-4a24-5dfb-b51c-9dd50030fb57"
2 "a1617564-32d2-55a2-b14e-7e00f6a69a92"
2 "a44a2a6c-7a3e-59ed-bb0a-83f7bc0dba36"
2 "a4d1433f-ac1e-586a-a996-dc63279fc244"
2 "ad4aac1f-a7c2-56ed-9910-646635f5b4b1"
2 "b1f9c3d5-e817-59fe-be33-7d92454c9a45"
2 "b7015fc2-13f5-50f2-a6dd-dc626f6504a0"
2 "bb0793c8-7b4b-5a7f-acab-1d3ebed6b93e"
2 "bb41e464-2bac-58e3-a939-92144f863f0f"
3 "bb494ea6-62a9-57b3-97b7-3b0f7025fb63"
10 "be7e618d-8224-5b64-97c8-ecb8af50e73f"
6 "bf566495-0b65-5f39-8e94-7f854ce79d1a"
2 "c757b467-7bf0-5dd5-9013-649db3644e1b"
2 "d0eeca30-1280-5fb4-88d6-75d1313701e2"
2 "d4c841f1-33eb-57e4-8ac9-e4ec8738d192"
2 "d949eb5d-245e-56b7-870e-c01bb558f748"
2 "d9881d64-2331-51c1-8cb2-5bb2716c6b01"
2 "e0257498-4609-5a12-8973-747861eb886e"
2 "e02f6abf-a81e-5f9b-94d6-4c3582458865"
13 "e093485e-6401-5e8d-8ece-80036b88ec48"
2 "e30c519a-3163-52d3-b0bf-01d003dc4ecd"
16 "e55187c4-c0bc-5317-add4-842d0a0ecb63"
3 "e5e46509-d7be-57a9-8409-3e152389115a"
12 "ec211c2a-5eaa-588c-a919-ceeab87b6054"
2 "ecc061ab-7e82-5ac4-827b-dbf3c1c2f4ae"
2 "ed99dcbd-ea7a-53b6-b979-ad443b3a9141"
2 "f399a97c-4f34-58f0-903a-9c46135eab6f"
2 "f522dd1c-a2ae-515c-aed6-d93c2cdbd012"
"aff. X" means similar to, but distinct from X
aff. Telonema sp =>
"scientificName":"Eukaryota incertae sedis",
"identificationQualifier":" aff. Telonema sp."
$ wc -l data/deposit/iopan/protist-biodiversity/*ICE10*
3444 data/deposit/iopan/protist-biodiversity/konghau_database_completeICE10.csv
153 data/deposit/iopan/protist-biodiversity/konghau_database_completeICE10-handnet.csv
3597 total
Common: "Genus sp1/sp2" => "cf. sp1/sp2" ? Chaetoceros convolutus/concavicornis
Some problems:
"var." "Bacillaria paxillifer var.tumidula"
32 ["aff Catenula",null,null]
"Chrysophyceae cyst 1"
Cocal sphere
67 Spora
1 "Deformed Ciliophora with endosymbionts
1 ["Empty dinophyceae",null,"spores"]
1 ["Empty unknown",null,"cysts"]
1 ["Encysting protist on stem",null,null]
5 ["Fragilariopsis cylindrus/F. Reginae-jahniae",null,null]
99 Incertains taxa
5 ["Unknown",null,"cysts"]
2 ["Unknown",null,"spores"]
11 ["Unknown taxon",null,null]
Actual types: 62
$ cat data/master/sample-db.ndjson | ndjson-map 'd.sample_type' | sort | uniq -c | sort -rn
4784 null
2692 "Nutrients"
2001 "Pigments"
1928 "δ18 Oxygen"
1790 "DIC/AT"
1675 "Phytoplankton taxonomy"
1609 "POC/PON"
1485 "CDOM"
1355 "Mesozooplankton taxonomy"
1292 "Ammonium"
1063 "Flow cytometry"
843 "Salinity"
823 "Biogenic silica"
797 "Particulate absorption"
795 "Methane"
698 "Barium"
557 "CTD"
514 "DOC/TDN"
499 "HPLC pigments"
438 "Microplankton taxonomy"
362 "N2O/CH4/CO2"
287 "Iodine"
258 "Dissolved oxygen"
254 "HPLC Pigments"
222 "Macrozooplankton taxonomy"
216 "Chlorophyll"
209 "Ice algal taxonomy"
202 "Mycosporine-like aminoacids"
177 "Mycosporin-like aminoacids"
156 "Ice algae taxonomy"
129 "Amino Acids"
114 "Fluorescence exitation spectra"
100 "Handnet 20 µm"
86 "Lipids"
83 "Particulate absortion"
67 "Meiofauna taxonomy"
67 "Ice fauna taxonomy"
66 "Genomics"
61 "Zooplankton taxonomy"
59 "Stable isotopes"
59 "Handnet 20 μm"
56 "Particle absorption"
50 "POC/TDN"
41 "Zooplankton genetics"
37 "Zooplankton physiology"
37 "Sal_Temp_pH"
33 "δ13 Carbon"
32 "Benthos"
27 "Silicate"
26 "Genetics"
22 "Fluorescence excitation spectra"
20 "Bromoform"
14 "Uranium"
12 "Fish abundance"
11 "Urea"
11 "Ecotoxocology"
10 "Bacteria"
7 "Experiments"
6 "Silicon isotopes"
6 "Prokaryote genomics 0.22 μm"
6 "Amino acids"
5 "stable isotopes"
4 "Respiration"
4 "PAB"
4 "Eukaryote genomics 0.45 μm"
3 "Clione & Limacina"
2 "Limacina helicina"
2 "Fish"
ndjson-filter '/^Calanus/.test(d.taxon)' | ndjson-map 'd.stage' | sort | uniq -c | sort -rn
4910 "CV"
4583 "AF"
4517 "CIV"
4004 "CIII"
3088 "CII"
2938 "CI"
1796 "AM"
480 "AF/AM"
409 "CIV-CV"
229 "CI-CIII"
2 null
Fieldnumbers in data files are not the same as in sample log, they are naked numbers but need these prefixes to match
Phytoplankton
Checked 235 names, found 91 in local taxonomy, unknown: 144 [!]
$ cat data/deposit/2016/MOSJ2016/mosj2016_phytoplankton_all_datajw.tsv | ./bin/csv-transform --ndjson | ndjson-map d.takson | sort |uniq | ./bin/gbif-validate-taxon
cat data/deposit/2016/MOSJ2016/mosj2016_phytoplankton_all_datajw.tsv | ./bin/dwc-occurrence-csv-transform |ndjson-filter 'd.errors' | ndjson-split 'd.errors' | sort | uniq
{"value":"aff. Telonema sp.","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"aff. Telonema sp.","dataPath":".scientificName","message":"should match pattern \"^[A-Z][a-z\\s-]\""}
{"value":"Bifagellates","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Calycomonas wulffii","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Chaetoceros convolutus f. trisetosa","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Chaetoceros holsaticus","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Chrysophycean","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Coccolithophores","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Contricibra","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"","dataPath":".scientificName","message":"should match pattern \"^[A-Z][a-z\\s-]\""}
{"value":"Dinohyceae","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Dinophycean","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Dunaliella salina","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Fecal pellets","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Fourfagellates","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Gyrodinium lahryma","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Heliozoa","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Incertae taxa","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Leprotintinnus","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Pelagococcus","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Spora","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Stenoneis inconspicua","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Strobilidium conicum","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Strobilidium spiralis","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Symbiont Mesodinium rubrum","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Telonema antarctica","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Tintinidae","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Tintinnus","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Uniflagellates","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
The "total database" contains:
~/npolar/marine-db$ cat data/deposit/iopan/protist-biodiversity/total_database_npi2009-2013.tsv | ./bin/csv-transform --ndjson | ndjson-map d.name| sort | uniq -c
1468 "ALK09"
961 "ALK10"
3424 "ICE10"
2579 "ICE12"
1109 "MER09.01"
1507 "MOSJ11"
1638 "MOSJ12"
2268 "MOSJ13"
The following are excluded, since there are alternate sources with more data.
After removal, we are left with:
~/npolar/marine-db$ cat data/input/iopan/2009-2012-2013-protist-biodiversity-iopan.ndjson| ndjson-map [d.year,d.expedition] | sort | uniq -c
1468 [2009,"Alkekonge-2009"]
1109 [2009,"MERCLIM-2009"]
961 [2010,"Alkekonge-2010"]
1638 [2012,"MOSJ2012"]
2268 [2013,"MOSJ2013"]
Split scientific names into object holding name, identification qualifier, and life stage, using Darwin Core Taxon keys
Two data files contributed for publication by Eva Leu in 2019, see #46 for some context (really just that API v1 contains metadata only for 74 phytoplankton samples from 2003/2004, all from Eva Leu)
In API v1 there is mention of 74 phytoplankton samples as early as 2003/2004, all from Eva Leu, but there is no data, just sample metadata.
All 1307 later phytoplankton samples in API v1 should simply be discarded, since we now sit on the original IOPAN data, see #44 and #45.
year
2009 (142) 2010 (301) 2011 (367) 2012 (288) 2013 (209)
expedition
ICE2011 (238) ICE2012 (224) MOSJ2013 (209) MOSJ2012 (193) Alkekonge-2010 (156) ICE2010 (145) Alkekonge-2009 (104) MERCLIM-2009 (38)
animal_group
Phytoplankton (1094) Microplankton (136) phytoplankton (70) Microplankton (6) Phytoplankton taxonomy (1)
The 2009-2013 protist biodiversity contains 14954 records, and ~ 352 unique scientific names (~ since there are whitespace and two fields are used; Taxon_full
contains 619 unique strings).
Of the original names, 308 were known and 37 were unknown.
{
"sample": "AMM-016h; AMM-017h; AMM-018h",
"expedition": "GlacierFront2017",
"station": "KpM6",
"sampled_from": "Helicopter",
"cast": 5,
"transect": "KpM",
"gps": 11,
"event": "340e8058-490c-5c11-8ef1-f05e847ea4b1"
}
If the /[Ss]ymbiont/ is found in a scientificName
=> move scientificName into organismRemarks
=> set organismScope to "symbiont"
(this is safe since no taxa have [Ss]ymbiont in their name)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.