GithubHelp home page GithubHelp logo

marine-db's People

Contributors

cnrdh avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

marine-db's Issues

2015 samplelog corrections

2 "On-ice CTD-047", slett den fra station On-ice CTD-046, 28.04.2015
2 "N-ICE2015/SWN-068", slett den som er fra Refrozen lead thin 14.05.2015
2 "N-ICE2015/POC-530", slett den som er fra station Nutrient Experiment 4
2 "N-ICE2015/NUT-915", den som er fra station Nutrient experiment 4 kan slettes
2 "N-ICE2015/NUT-914", den som er fra station Nutrient experiment 4 kan slettes
2 "N-ICE2015/NUT-913", den som er fra station Nutrient experiment 4 kan slettes
2 "N-ICE2015/NUT-912", den som er fra station Nutrient experiment 4 kan slettes
2 "N-ICE2015/NUT-699", slett den som er fra station On-ice CTD-083
2 "N-ICE2015/NUT-698", slett den som er fra station On-ice CTD-083
2 "N-ICE2015/NUT-697", slett den som er fra station On-ice CTD-083
2 "N-ICE2015/NUT-670", slett den som er fra depth: 130-134 cm
2 "N-ICE2015/NUT-289", den som er fra station On-ice CTD-040 skal endres til NUT-389
2 "N-ICE2015/NUT-288", den som er fra station On-ice CTD-040 skal endres til NUT-388
2 "N-ICE2015/NUT-286", den som er fra station On-ice CTD-040 skal endres til NUT-386
2 "N-ICE2015/IAT-255", slett den som er fra station Nutrient Experiment 2
2 "N-ICE2015/IAT-230", duplicat er reel. Kan man kalle den fra Ridge thin ice for IAT-230b?
2 "N-ICE2015/IAT-229", duplicat er reel. Kan man kalle den fra Ridge thin ice for IAT-229b?
2 "N-ICE2015/FCF-109", kan slettes
2 "N-ICE2015/DOX-232", kan slettes
2 "N-ICE2015/DOC-200", kan slettes
2 "N-ICE2015/DOC-004", kan slettes
2 "N-ICE2015/CHL-872", den fra station Sediment trap hole kan slettes
2 "N-ICE2015/CHL-414", slette den fra Experimental Site UV-, 13.05.2015
2 "N-ICE2015/CHL-173", slett den fra Lance Bow, 13.03.2015
2 "N-ICE2015/CHL-105", slett den fra stasjon D4, 10.03.2015
2 "N-ICE2015/BSI-058", slett den som er fra leg 2, 12.03.2015, Super Site Coring
2 "MOSJ2015/CHL-43", den fra V10 endres til CHL-48
2 "MOSJ2015/CHL-42", den fra V10 endres til CHL-47
2 "MOSJ2015/CHL-41", den fra V10 endres til CHL-46
2 "MOSJ2015/CHL-40", den fra V10 endres til CHL-45
2 "MOSJ2015/CHL-39", den fra V10 endres til CHL-44

Weird sample names

$ cat data/master/sample/*2015*.ndjson | ndjson-filter '!/2015/.test(d.sample)' | ndjson-map '[d.expedition,d.sample]'
["N-ICE2015","CH4-349"]
["N-ICE2015","DIC417"]
["N-ICE2015","Ship_CTD_024"]
["N-ICE2015","O10-572"]
["N-ICE2015","O10-573"]
["N-ICE2015","O10-574"]
["N-ICE2015","O10-575"]
["N-ICE2015","O10-576"]
["N-ICE2015","O10-577"]
["N-ICE16","N-ICE16/NUT-562"]
["N-ICE17","N-ICE17/NUT-563"]
["N-ICE2015","On-ice_CTD-077_"]
["N-ICE2015","MAA-239cC"]

$ cat data/master/sample/*2014*.ndjson | ndjson-filter '!/2014/.test(d.sample)' | ndjson-map '[d.expedition,d.sample]'
["MOSJ2014","NUT0-77"]
["N-ICE2014","Handnet-01"]

GlacierFront2017 events

Quite a few fieldNumbers are composite:

$ curl "https://v2-api.npolar.no/darwin-core/sample/_search?type=feed&page=..&and=expedition:GlacierFront2017&show=fieldNumber" | ndjson-filter 'd.fieldNumber.length != 7' | ndjson-map 'd.fieldNumber' | sort | uniq -c

      1 "AMM-001H_AMM-002H_AMM-003H"
      1 "AMM-004H_AMM-005H_AMM-006H"
      1 "AMM-007H_AMM-008H_AMM-009H"
      1 "AMM-010H_AMM-011H_AMM-012H"
      1 "AMM-013H_AMM-014H_AMM-015H"
      1 "AMM-016H_AMM-017H_AMM-018H"
      1 "AMM-019H_AMM-020H_AMM-021H"
      1 "AMM-022H_AMM-023H_AMM-024H"
      1 "AMM-025H_AMM-026H_AMM-027H"
      1 "AMM-028H_AMM-029H_AMM-030H"
      1 "AMM-031H_AMM-032H_AMM-033H"
      1 "AMM-034H_AMM-035H_AMM-036H"
      1 "AMM-040H_AMM-041H_AMM-042H"
      1 "AMM-043H_AMM-044H_AMM-045H"
      1 "AMM-046H_AMM-047H_AMM-048H"
      1 "AMM-049H_AMM-050H_AMM-051H"
      1 "AMM-052H_AMM-053H_AMM-054H"
      1 "AMM-058H_AMM-059H_AMM-060H"
      1 "AMM-061H"
      1 "AMM-062H"
      1 "AMM-063H"
      1 "AMM-111H_AMM-112H_AMM-113H"
      1 "AMM-114H_AMM-115H_AMM-116H"
      1 "CC2_R1_10M_CC2_R2_10M"
      1 "CC4_R1_10M_CC4_R2_10M"
      1 "CC5_R1_10M_CC5_R2_10M"
      1 "CDO-45"
      1 "CDO-46"
      1 "CPN2_R1_10M_CPN2_R2_10M"
      1 "CPN4_R1_10M;"
      1 "CPN5_R1_5M"
      1 "CPS2_R1_10M_CPS2_R2_10M"
      1 "CPS4_R1_10M;"
      1 "GS1_R1"
      1 "GS2_R1"
      1 "GS_R1_GS_R2"
      1 "KC6_R1_10M_KC6_R2_10M"
      1 "KC7_R1_10M_KC7_R2_10M"
      1 "KPM2_R1_10M_KPM2_R2_10M"
      1 "KPM4_R1_10M_KPM4_R1_10M"
      1 "KPM5_R1_10M_KPM5_R2_10M"
      1 "KPM5_R1_1M_KPM5_R2_1M"
      1 "KPM5_R1_50M_KPM5_R2_50M"
      1 "KPN2_R1_10M_KPN2_R2_10M"
      1 "KPN4_R1_10M_KPN4_R2_10M"
      1 "KPN5_R1_10M_KPN5_R2_10M"
      1 "KPN6_R1_10M"
      1 "KPN6_R1_30M"
      1 "KPN6_R1_3M"
      1 "KPNS2_R1_10M_KPS2_R2_10M"
      1 "KPS2_R1_10M_KPS2_R2_10M"
      1 "KPS4_R1_10M_KPS4_R2_10M"
      1 "KPS5_R1_KPS5_R2"
      1 "KPS6_R1_5M"
      1 "MOSJNUT-068"
      1 "MOSJNUT-069"
      1 "MOSJNUT-070"
      5 "SAL_TEMP_PH"
      4 "UNDEFINED"
      1 "URE-005_URE-006"
      1 "URE-007_URE-008"
      1 "URE-073_URE-074"
      1 "URE-079_URE-088"
      1 "URE-081_URE-082"
      1 "URE-083;URE-084"
      1 "URE-085_URE-086"
      1 "URE-087_URE-088"
      1 "URE-091_URE-092"
      1 "URE-093_URE-094"
      1 "URE-095_URE-096"
      1 "URE-099,URE-100"
      1 "URE-101_URE-102"
      1 "URE-105_URE-106"
      1 "URE-107_URE-108"
      1 "URE-109_URE-110"
      1 "URE-111_URE-112"
      1 "URE-113_URE-114"
      1 "URE-117_URE-118"
      1 "URE-89_URE-090"

Samplelog corrections for 2014, 2016 & 2017.

N-ICE2015 er litt detektivarbeid så jeg trenger litt tid for å få det ferdig
2017
37 "Sal_Temp_pH", slettes
8 "undefined", slettes
4 "KpN4_R1_10M; KpN4_R2_10M", den første er riktig de 3 neste skal endres til "KpN2_R1_10M; KpN2_R2_10M", "KpM2_R1_10M; KpM2_R2_10M", "KpS2_R1_10M; KpS2_R2_10M" i denne rekkefølgen
2 "R6_R1_S; R6_R2_S", jeg finner de ikke i excel fil men kan antagelig slettes
2 "R6_R1_M; R6_R2_M", jeg finner de ikke i excel fil men kan antagelig slettes
2 "R6_R1_B; R6_R2_B", jeg finner de ikke i excel fil men kan antagelig slettes
2 "MOSJ2017/DOX-009", dobbel registrering en av de kan slettes
2 "MOSJ2017/DOX-008", dobbel registrering en av de kan slettes
2 "MOSJ2017/DOX-007", dobbel registrering en av de kan slettes
2 "MOSJ2017/DIC-091", den som er fra R4 skal slettes
2 "GlacierFront2017/SAL-386", den fra stasjon KpM6 skal endres til SAL-387
2 "GlacierFront2017/PHT-033", den fra stasjon CpS3 kan slettes
2 "GlacierFront2017/OXY-161", den fra stasjon CpS4 skal være OXI-163
2 "GlacierFront2017/OXY-160", den fra stasjon CpS4 skal være OXI-162
2 "GlacierFront2017/OXY-021", den fra stasjon Kb5 skal være OXY-022
2 "GlacierFront2017/DIC-079", den fra KpM1 kan slettes
2 "GlacierFront2017/DIC-078", den fra KpM1 kan slettes
2 "GlacierFront2017/DIC-077", den fra KpM1 kan slettes
2 "GlacierFront2017/DIC-076", den fra KpM1 kan slettes
2 "GlacierFront2017/DIC-033", jeg klarer ikke å se hva som er feil
2 "GlacierFront2017/DIC-031", jeg klarer ikke å se hva som er feil

2016
6 "undefined", slettes
2 "MOSJ2016/ZOT-063", den fra stasjon R5 skal være ZOT-067
2 "MOSJ2016/ZOT-062", den fra stasjon R5 skal være ZOT-066
2 "MOSJ2016/ZOT-061", den fra stasjon R5 skal være ZOT-065
2 "MOSJ2016/ZOT-018", dobbel registrering en av de kan slettes
2 "MOSJ2016/ZOT-017", dobbel registrering en av de kan slettes
2 "MOSJ2016/ZOT-016", dobbel registrering en av de kan slettes
2 "MOSJ2016/ZOT-015", dobbel registrering en av de kan slettes
2 "MOSJ2016/ZOT-014", dobbel registrering en av de kan slettes
2 "MOSJ2016/PAB-070", den fra stasjon R2 kan slettes
2 "MOSJ2016/MAA-045", prøvenummer er forskjøvet for stasjon R5, MAA- 045 skal være MAA-046, MAA-046 skal være MAA-047 & MAA-047 skal være MAA-048
2 "MOSJ2016/MIT-015", jeg klarer ikke å se hva som er feil
2 "MOSJ2016/CDO-054", prøvenummer for stasjon V12 har blitt forskjøvet, alle tall skulle vært ett tall lavere (048=047, 049=048, 050=049, 051= 050, 052= 051, 053=052, 054=053)
2 "GlacierFront2016/NUT-243", prøvenummer for stasjon GF-04 er blitt forskjøvet, alle tall skal være to høyere (042 = 044, 043=045, 044=046)
2 "GlacierFront2016/NUT-242", samme som NUT-043

2014
2 "MOSJ2014/FCM-068", endres til FCM-052
2 "MOSJ2014/FCM-067", endres til FCM-053
2 "MOSJ2014/FCM-066", endres til FCM-054
2 "MOSJ2014/FCM-065", endres til FCM-055
2 "MOSJ2014/FCM-064", endres til FCM-056
2 "MOSJ2014/FCM-063", endres til FCM-057
2 "MOSJ2014/FCM-062", endres til FCM-058
2 "ICE2014/ZOT-076", den fra R1b kan slettes

Taxonomy corrections

Telonema
Telonema subtile
Kingdom: Chromista, Phylum: Telonemia

Ebria tripartite
Kingdom: Chromista, Phylum: Cercozoa, Class: Thecofilosea

Monosiga
Parvicobicula
Bicosta minor
Bicosta spinifera
Calliacantha natans
Diaphanoeca pedicellata
Monosiga marina
Polyfibula sphyrelata
Salpingoeca inquillata

-> class Choanoflagellatea

Invalid gears

Samplelogs need fixing and/or gear schema needs to be updated

   1879 ERROR invalid gear On-ice CTD
    952 ERROR invalid gear Niskin
    141 ERROR invalid gear NO-GEAR
     90 ERROR invalid gear Multinet
     81 ERROR invalid gear Swimnet 200 µm
     50 ERROR invalid gear Handnet 20 μm
     49 ERROR invalid gear CTD
     47 ERROR invalid gear Divers
     46 ERROR invalid gear Limnos
     45 ERROR invalid gear Multinet 64 μm
     35 ERROR invalid gear MIK
     34 ERROR invalid gear Van veen grab
     20 ERROR invalid gear MIK 1500 μm
     15 ERROR invalid gear Ice core 9 cm
     14 ERROR invalid gear WP2 200 μm
      7 ERROR invalid gear WP3-1000µm
      7 ERROR invalid gear WP2-200µm
      6 ERROR invalid gear Hand
      3 ERROR invalid gear CO2 chambers
      3 ERROR invalid gear 

Eight N-ICE2015 swimnet samples has latitudes > 90

cat data/master/sample-db.ndjson | ndjson-filter 'd.latitude > 90' | ndjson-map '[d.sample,d.latitude,d.longitude,d.depth_from]' | sort | uniq -c
39 ["N-ICE2015/SWN-098",91.914,12.2654,0]
39 ["N-ICE2015/SWN-099",91.914,12.2654,0]
39 ["N-ICE2015/SWN-100",91.914,12.2654,0]
39 ["N-ICE2015/SWN-101",91.914,12.2654,1]
39 ["N-ICE2015/SWN-102",91.914,12.2654,1]
39 ["N-ICE2015/SWN-103",91.914,12.2654,1]
39 ["N-ICE2015/SWN-104",91.914,12.2654,5]
39 ["N-ICE2015/SWN-105",91.914,12.2654,5]

Fix invalid scientific names i zooplankton

Checked 218 names, found 205 in local taxonomy, unknown: 13

[ 'Acartia clausii AF',
  'Cyclopina schneideri (cf.)',
  'Eusirus (cf. holmi)',
  'Frittilaria',
  'Halitholus cirratus (cf.)',
  'Isopoda Bopyridae',
  'Jashnovia brevis',
  'Oithona tlantica',
  'Onisimus lotoralis',
  'Triconia (=Oncaea)',
  'Triconia (=Oncaea) borealis',
  'Triconia conifera (cf.)',
  'Typhloscolecidae (cf. Travisiopsis)' ]

{ scientificName: 'Triconia (=Oncaea) borealis' } 'Not accepted in WoRMS/GBIF and no local match > 0.8 similarity'
{ scientificName: 'Triconia (=Oncaea)' } 'Not accepted in WoRMS/GBIF and no local match > 0.8 similarity'
{ scientificName: 'Isopoda Bopyridae' } 'Not accepted in WoRMS/GBIF and no local match > 0.8 similarity'
{ unknown: 'Jashnovia brevis',
  candidate:
   { scientificName: 'Jaschnovia brevis',
     scientificNameAuthorship: '(Farran, 1936)',
     similarity: 0.896551724137931 } }
{ scientificName: 'Typhloscolecidae (cf. Travisiopsis)' } 'Not accepted in WoRMS/GBIF and no local match > 0.8 similarity'
{ unknown: 'Halitholus cirratus (cf.)',
  candidate:
   { scientificName: 'Halitholus cirratus',
     scientificNameAuthorship: 'Hartlaub, 1913',
     similarity: 0.8717948717948718 } }
{ unknown: 'Oithona tlantica',
  candidate:
   { scientificName: 'Oithona atlantica',
     scientificNameAuthorship: 'Farran, 1908',
     similarity: 0.9655172413793104 } }
{ unknown: 'Onisimus lotoralis',
  candidate:
   { scientificName: 'Onisimus litoralis',
     scientificNameAuthorship: '(Krøyer, 1845)',
     similarity: 0.875 } }
{ unknown: 'Cyclopina schneideri (cf.)',
  candidate:
   { scientificName: 'Cyclopina schneideri',
     scientificNameAuthorship: 'Scott T., 1904',
     similarity: 0.8780487804878049 } }
{ unknown: 'Acartia clausii AF',
  candidate:
   { scientificName: 'Acartia clausi',
     scientificNameAuthorship: 'Giesbrecht, 1889',
     similarity: 0.8888888888888888 } }
{ scientificName: 'Frittilaria' } 'Not accepted in WoRMS/GBIF and no local match > 0.8 similarity'
{ unknown: 'Triconia conifera (cf.)',
  candidate:
   { scientificName: 'Triconia conifera',
     scientificNameAuthorship: '(Giesbrecht, 1891)',
     similarity: 0.8571428571428571 } }
{ scientificName: 'Eusirus (cf. holmi)' } 'Not accepted in WoRMS/GBIF and no local match > 0.8 similarity'

Update phytoplankton taxonomy

./bin/csv-to-ndjson < data/deposit/taxonomy/phytoplankton-2017-12-22.csv | sort | uniq | ndjson-map 'd.species=d.species.replace(/\ssp\.$/, ""),d'  | sort | uniq | wc -l

405 lines => 404 species => 394 after removing " sp."
267 names matches against taxon-db
128 non-matches

To-add:

[ 'Chrysophyta',
'Dinophysis rotundata',
'Dinophyta',
'Favella meunieri',
'Gymnodinium pulchellum',
'Heterokontophyta',
'Myrionecta rubra',
'Nitzschia arctica',
'Pleurochrysis carterae',
'Prorocentrum minimum',
'Protoperidinium marielebourae',
'Scuticociliatida',
'Strobilidium spiralis',
'Strombidium spirale' ]

After { 'taxon-db': 641, candidates: 393, 'to-check': 77, added: 14 }

#REF!

Unwanted #REF! in original data:

~/npolar/marine-db$ grep "REF!" -r data/deposit | wc -l
46
~/npolar/marine-db$ grep "REF!" -r data/input | wc -l
36

info om missing samples from sample log

Jeg har også oppdatert samplelog på N:\Forskning\Marindatabase\DATASET TO UPLOAD\SAMPLELOG

Arcex 2016, PHT-063: Den har antagelig ikke blitt lagt inn fordi den manglet lat & long. Jeg har inkludert lat og long i fil « arcex_2016_samplelog.xlsx».
Expedition: Arcex2016; Sample name: PHT-063; Station: Outside ice station; Latitude: 78.8851; Longitude: 23.2967; Sampling date: 24.05.2016 12:00; Gear: Niskin bottle; Sample type: Phytoplankton taxonomy; Responsible: Philipp Assmy)

ICE10-455: copy info from previous entry “ICE10-454”, sampling depth 10m
MOSJ2014/PHT-077: copy info from previous entry “MOSJ2014/PHT-076”, assumed sampling depth 5m
N-ICE2015/PHT-142 & PHT-143: copy info from previous “PHT-141, assumed sampling depth 15m & 25m. Jeg har lagt til info i “n-ice_2015_leg3_4_samplelog.xlsx»
N-ICE2015/PHT-192:Det manglet expedition derfor har den antagelig ikke blitt lagt inn. Jeg har inkludert expedition i «n-ice_2015_leg5_6_samplelog.xlsx»

Too deep

ndjson-filter 'd.depth_bottom > 12000' | ndjson-map '[d.sample,d.latitude,d.longitude,d.depth_bottom]' | sort | uniq -c
43 ["06MAR134",78.928,7.782,550496]
43 ["06MAR136",79.033,10.897,325335]
43 ["06MAR145",79.018,10.293,282272]
43 ["06MAR146",79.009,10.2043,266256]
43 ["06MAR147",79.976,11.618,281316]
43 ["ALK09-171",79.04,11.13,250270]
43 ["ALK09-228",78.94,8.54,278352]
43 ["ALK09-229",78.9,7.77,1116352]
43 ["OTI03 LIP19",78.139667,12.857167,290300]
43 ["OTI03 LIP20",78.139667,12.857167,290300]
43 ["OTI04 LIP01",77.205833,29.836,196200]

MOSJ2017 phytoplankton taxonomy

Add

No match GBIF's WORMS for { scientificName: 'Gymnodinium gracile',
  records: 9,
  endOfRecords: true }
Best local match: { target: 'Gymnodinium gracilentum',
  rating: 0.8947368421052632 }

No match GBIF's WORMS for { scientificName: 'Algirosphaera quadricornu',
  records: 3,
  endOfRecords: true }
Best local match: { target: 'Algirosphaera', rating: 0.6857142857142857 }

What to do with 0 (organism absent) ?

A surprising high number, over 55000 occurrence records, contains 0 as count (organismQuantity).

Now, absence data may also be valuable, but absence is normally not recorded, ie. there is no consistent use of 0 for absent.

Taxon refers to missing sample

Occurrence lines: 146388
Missing sample event metadata: 115

 38 Taxon occurrence refers to missing sample Arcex2016/PHT-063
 36 Taxon occurrence refers to missing sample ICE10-455
  5 Taxon occurrence refers to missing sample MOSJ2014/PHT-077
 13 Taxon occurrence refers to missing sample N-ICE2015/PHT-142
 18 Taxon occurrence refers to missing sample N-ICE2015/PHT-143
  5 Taxon occurrence refers to missing sample N-ICE2015/PHT-192

Stations with different name should not normally share time and position

Complete list,except for N-ICE2015:

["Hornsund-818","Hornsund-820","Arcex2016"]
["Hornsund-820","Hornsund-821","Arcex2016"]
["Storfjorden-859","Storfjorden-860","Arcex2016"]
["Storfjorden-860","Storfjorden-861","Arcex2016"]
["Erik Eriksenstredet-875","Erik Eriksenstredet-877","Arcex2016"]
["Erik Eriksenstredet-877","Erik Eriksenstredet-876","Arcex2016"]
["Polar Front-911","Polar Front-912","Arcex2016"]
["Polar Front-912","Polar Front-913","Arcex2016"]
["Polar Front-929","Polar Front-930","Arcex2016"]
["Polar Front-930","Polar Front-931","Arcex2016"]
["CTD","R5c","MOSJ2012"]
["Kb1","Kb0","MOSJ2015"]
["1","2","UNIS-AB310-2000"]
["old docks","Gåsebu","01M"]
["STATION-MISSING","Kb3","2003UV"]
["Kb3","STATION-MISSING","2003UV"]
["STATION-MISSING","Kb3","2003UV"]
["Kb3","STATION-MISSING","2003UV"]
["Erik Eriksen Strait","Erik Erksen Strait","OTI2003"]
["Kb0","Kb1","BIODAFF-2004"]
["Kb1","Kb2","BIODAFF-2004"]
["Kb2","Kb3","BIODAFF-2004"]
["Kb3","Kb9","BIODAFF-2004"]
["Kb4","Kb5","BIODAFF-2004"]
["Kb5","Kb6","BIODAFF-2004"]
["Kb6","Kb7","BIODAFF-2004"]
["Kb7","Kb8","BIODAFF-2004"]
["Kb10","Kb11","BIODAFF-2004"]
["Kb11","Kb12","BIODAFF-2004"]
["Kb12","Kb13","BIODAFF-2004"]
["Kb13","Kb14","BIODAFF-2004"]
["Kb15","Kb16","BIODAFF-2004"]
["Kb16","Kb17","BIODAFF-2004"]
["Kb17","Kb18","BIODAFF-2004"]
["Kb18","Kb19","BIODAFF-2004"]
["Kb20","Kb21","BIODAFF-2004"]
["Kb21","Kb22","BIODAFF-2004"]
["Kb22","Kb23","BIODAFF-2004"]
["Kb3","Kb5","MariClim-2006"]
["MF3","MF7","Alkekonge-2009b"]
#samples: 24976
#events: 3188
#stationsWithDifferentNameButIdenticalTimeAndPosition: 40

Update functional groups

NDJSON of updated taxon-db from #21

ndjson-cat data/deposit/taxonomy/taxon-db.json | ndjson-split > data/input/taxonomy/taxon-db.ndjson

./bin/ndjson-from-csv < data/deposit/taxonomy/functional-groups.tsv | ndjson-map '{name: d.species}' > /tmp/func-names.ndjson

cat data/input/taxonomy/taxon-db.ndjson | ndjson-map 'd.taxon || d.canonicalName' | sort | uniq | ndjson-map '{ name: d}' > /tmp/names.ndjson
~/npolar/marine-db$ ndjson-join --right 'd.name' /tmp/names.ndjson /tmp/func-names.ndjson | grep null
[null,{"name":"Coxiella pseudoannulata"}]
[null,{"name":"Gymnodinium gaelatum"}]
[null,{"name":"Gymnodinium gracilientum"}]
[null,{"name":"Karenia brevis"}]
[null,{"name":"Protherythropsis vigilans"}]
[null,{"name":"Chaetoceros convulutus"}]
[null,{"name":"Pseudo-nitzschia pseudodelicatisima"}]

Non-unique eventIDs

conrad@nordfjellet:~/npolar/marine-db$ ./bin/sampling-events-mosjify  | ndjson-map d.eventID | sort | uniq -cd
      3 "08817bd2-4180-5cc3-9b45-6cdf015c891e"
      2 "08d55c4d-fce4-5986-976f-2ad844bb0efd"
      3 "0f04da54-3719-5ed6-a70e-efcc513f807f"
      5 "13c9ac48-e4ee-528b-b4be-648178bbb1a4"
      2 "1b645c10-5fee-5dc8-9a0d-8c008a25df47"
      3 "1c86f0c0-ec76-5291-953e-b2b9edfe6235"
      2 "20cf7938-6fc2-58d5-be69-789df6df61ba"
      3 "30db2ee1-65f5-572b-b7d5-a03499e2e37d"
      2 "37717336-59cc-5e30-95ba-51dbd88a23ca"
     10 "3c6c288e-70db-5abc-9560-6f692c0ba848"
      2 "3ef566f3-080e-5c80-9ad8-4a3d3d472741"
      2 "43b08d3e-30ae-5333-a9c4-e27e5e90bb03"
      2 "5cb4d680-956a-5186-a31a-18592bcdab64"
      3 "60beb0a3-72bf-5f75-b33f-3665d3087547"
      3 "6419d686-386d-56f0-b71e-a093509f4bc4"
      2 "679b0ec0-730e-5530-ae31-e826cdd2daca"
      4 "6c99d65c-aee9-5f0b-a2ec-66ef5d468919"
      2 "73aa7509-fc6c-5f61-aee5-1629b582ef8f"
     11 "83fcdd18-3d5b-501b-936c-51b9f1c7add1"
      2 "8e465810-4c5e-58bd-8302-34fba5dc4dc2"
      2 "938d1b8b-0f44-5ce3-85a9-23a37c3009c7"
      2 "9680d820-ac54-58b7-bcec-fbb749e65670"
      2 "9e35b381-4a24-5dfb-b51c-9dd50030fb57"
      2 "a1617564-32d2-55a2-b14e-7e00f6a69a92"
      2 "a44a2a6c-7a3e-59ed-bb0a-83f7bc0dba36"
      2 "a4d1433f-ac1e-586a-a996-dc63279fc244"
      2 "ad4aac1f-a7c2-56ed-9910-646635f5b4b1"
      2 "b1f9c3d5-e817-59fe-be33-7d92454c9a45"
      2 "b7015fc2-13f5-50f2-a6dd-dc626f6504a0"
      2 "bb0793c8-7b4b-5a7f-acab-1d3ebed6b93e"
      2 "bb41e464-2bac-58e3-a939-92144f863f0f"
      3 "bb494ea6-62a9-57b3-97b7-3b0f7025fb63"
     10 "be7e618d-8224-5b64-97c8-ecb8af50e73f"
      6 "bf566495-0b65-5f39-8e94-7f854ce79d1a"
      2 "c757b467-7bf0-5dd5-9013-649db3644e1b"
      2 "d0eeca30-1280-5fb4-88d6-75d1313701e2"
      2 "d4c841f1-33eb-57e4-8ac9-e4ec8738d192"
      2 "d949eb5d-245e-56b7-870e-c01bb558f748"
      2 "d9881d64-2331-51c1-8cb2-5bb2716c6b01"
      2 "e0257498-4609-5a12-8973-747861eb886e"
      2 "e02f6abf-a81e-5f9b-94d6-4c3582458865"
     13 "e093485e-6401-5e8d-8ece-80036b88ec48"
      2 "e30c519a-3163-52d3-b0bf-01d003dc4ecd"
     16 "e55187c4-c0bc-5317-add4-842d0a0ecb63"
      3 "e5e46509-d7be-57a9-8409-3e152389115a"
     12 "ec211c2a-5eaa-588c-a919-ceeab87b6054"
      2 "ecc061ab-7e82-5ac4-827b-dbf3c1c2f4ae"
      2 "ed99dcbd-ea7a-53b6-b979-ad443b3a9141"
      2 "f399a97c-4f34-58f0-903a-9c46135eab6f"
      2 "f522dd1c-a2ae-515c-aed6-d93c2cdbd012"

Handle "aff. X"

"aff. X" means similar to, but distinct from X

​ aff. Telonema sp =>
"scientificName":"Eukaryota incertae sedis",
"identificationQualifier":"​ aff. Telonema sp."​​

Convert IOPAN protist data from ICE2010 into Darwin Core

$ wc -l data/deposit/iopan/protist-biodiversity/*ICE10*
  3444 data/deposit/iopan/protist-biodiversity/konghau_database_completeICE10.csv
   153 data/deposit/iopan/protist-biodiversity/konghau_database_completeICE10-handnet.csv
  3597 total

Protist taxonomy issues from N-ICE2015

Common: "Genus sp1/sp2" => "cf. sp1/sp2" ? Chaetoceros convolutus/concavicornis

Some problems:

"var." "Bacillaria paxillifer var.tumidula"
32 ["aff Catenula",null,null]
"Chrysophyceae cyst 1"
Cocal sphere
67 Spora
1 "Deformed Ciliophora with endosymbionts
1 ["Empty dinophyceae",null,"spores"]
1 ["Empty unknown",null,"cysts"]
1 ["Encysting protist on stem",null,null]
5 ["Fragilariopsis cylindrus/F. Reginae-jahniae",null,null]
99 Incertains taxa
5 ["Unknown",null,"cysts"]
2 ["Unknown",null,"spores"]
11 ["Unknown taxon",null,null]

Create sample type vocab?

Actual types: 62

$ cat data/master/sample-db.ndjson | ndjson-map 'd.sample_type' | sort | uniq -c | sort -rn
   4784 null
   2692 "Nutrients"
   2001 "Pigments"
   1928 "δ18 Oxygen"
   1790 "DIC/AT"
   1675 "Phytoplankton taxonomy"
   1609 "POC/PON"
   1485 "CDOM"
   1355 "Mesozooplankton taxonomy"
   1292 "Ammonium"
   1063 "Flow cytometry"
    843 "Salinity"
    823 "Biogenic silica"
    797 "Particulate absorption"
    795 "Methane"
    698 "Barium"
    557 "CTD"
    514 "DOC/TDN"
    499 "HPLC pigments"
    438 "Microplankton taxonomy"
    362 "N2O/CH4/CO2"
    287 "Iodine"
    258 "Dissolved oxygen"
    254 "HPLC Pigments"
    222 "Macrozooplankton taxonomy"
    216 "Chlorophyll"
    209 "Ice algal taxonomy"
    202 "Mycosporine-like aminoacids"
    177 "Mycosporin-like aminoacids"
    156 "Ice algae taxonomy"
    129 "Amino Acids"
    114 "Fluorescence exitation spectra"
    100 "Handnet 20 µm"
     86 "Lipids"
     83 "Particulate absortion"
     67 "Meiofauna taxonomy"
     67 "Ice fauna taxonomy"
     66 "Genomics"
     61 "Zooplankton taxonomy"
     59 "Stable isotopes"
     59 "Handnet 20 μm"
     56 "Particle absorption"
     50 "POC/TDN"
     41 "Zooplankton genetics"
     37 "Zooplankton physiology"
     37 "Sal_Temp_pH"
     33 "δ13 Carbon"
     32 "Benthos"
     27 "Silicate"
     26 "Genetics"
     22 "Fluorescence excitation spectra"
     20 "Bromoform"
     14 "Uranium"
     12 "Fish abundance"
     11 "Urea"
     11 "Ecotoxocology"
     10 "Bacteria"
      7 "Experiments"
      6 "Silicon isotopes"
      6 "Prokaryote genomics 0.22 μm"
      6 "Amino acids"
      5 "stable isotopes"
      4 "Respiration"
      4 "PAB"
      4 "Eukaryote genomics 0.45 μm"
      3 "Clione & Limacina"
      2 "Limacina helicina"
      2 "Fish"

Validate AeN2018707 protist taxonomy (new taxa)

Stage validation, and what of ova, etc.?

ndjson-filter '/^Calanus/.test(d.taxon)' | ndjson-map 'd.stage' | sort | uniq -c | sort -rn
4910 "CV"
4583 "AF"
4517 "CIV"
4004 "CIII"
3088 "CII"
2938 "CI"
1796 "AM"
480 "AF/AM"
409 "CIV-CV"
229 "CI-CIII"
2 null

MOSJ2016 taxonomy issues

Phytoplankton
Checked 235 names, found 91 in local taxonomy, unknown: 144 [!]

$ cat data/deposit/2016/MOSJ2016/mosj2016_phytoplankton_all_datajw.tsv | ./bin/csv-transform --ndjson | ndjson-map d.takson | sort |uniq | ./bin/gbif-validate-taxon 
cat data/deposit/2016/MOSJ2016/mosj2016_phytoplankton_all_datajw.tsv | ./bin/dwc-occurrence-csv-transform |ndjson-filter 'd.errors' | ndjson-split 'd.errors' | sort | uniq
{"value":"aff. Telonema sp.","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"aff. Telonema sp.","dataPath":".scientificName","message":"should match pattern \"^[A-Z][a-z\\s-]\""}
{"value":"Bifagellates","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Calycomonas wulffii","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Chaetoceros convolutus f. trisetosa","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Chaetoceros holsaticus","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Chrysophycean","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Coccolithophores","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Contricibra","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"","dataPath":".scientificName","message":"should match pattern \"^[A-Z][a-z\\s-]\""}
{"value":"Dinohyceae","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Dinophycean","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Dunaliella salina","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Fecal pellets","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Fourfagellates","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Gyrodinium lahryma","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Heliozoa","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Incertae taxa","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Leprotintinnus","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Pelagococcus","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Spora","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Stenoneis inconspicua","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Strobilidium conicum","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Strobilidium spiralis","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Symbiont Mesodinium rubrum","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Telonema antarctica","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Tintinidae","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Tintinnus","dataPath":".scientificName","message":"should be equal to one of the allowed values"}
{"value":"Uniflagellates","dataPath":".scientificName","message":"should be equal to one of the allowed values"}

Convert IOPAN protist data from "total database" 2009-2013 into Darwin Core

The "total database" contains:

~/npolar/marine-db$ cat data/deposit/iopan/protist-biodiversity/total_database_npi2009-2013.tsv | ./bin/csv-transform --ndjson | ndjson-map d.name| sort | uniq -c
   1468 "ALK09"
    961 "ALK10"
   3424 "ICE10"
   2579 "ICE12"
   1109 "MER09.01"
   1507 "MOSJ11"
   1638 "MOSJ12"
   2268 "MOSJ13"

The following are excluded, since there are alternate sources with more data.

  • ICE2010: (3597 lines in 2 files vs 3424): see #51
  • MOSJ2011: (1537 lines vs 1507): see #45
  • ICE2012 (2670 lines vs 2579 in "total database" :/): see #52

After removal, we are left with:

~/npolar/marine-db$ cat data/input/iopan/2009-2012-2013-protist-biodiversity-iopan.ndjson| ndjson-map [d.year,d.expedition] | sort | uniq -c
   1468 [2009,"Alkekonge-2009"]
   1109 [2009,"MERCLIM-2009"]
    961 [2010,"Alkekonge-2010"]
   1638 [2012,"MOSJ2012"]
   2268 [2013,"MOSJ2013"]

Eva Leu's PhD phytoplankton data

Two data files contributed for publication by Eva Leu in 2019, see #46 for some context (really just that API v1 contains metadata only for 74 phytoplankton samples from 2003/2004, all from Eva Leu)

Early years and data in API v1

In API v1 there is mention of 74 phytoplankton samples as early as 2003/2004, all from Eva Leu, but there is no data, just sample metadata.

All 1307 later phytoplankton samples in API v1 should simply be discarded, since we now sit on the original IOPAN data, see #44 and #45.

year
2009 (142) 2010 (301) 2011 (367) 2012 (288) 2013 (209)

expedition
ICE2011 (238) ICE2012 (224) MOSJ2013 (209) MOSJ2012 (193) Alkekonge-2010 (156) ICE2010 (145) Alkekonge-2009 (104) MERCLIM-2009 (38)

animal_group
Phytoplankton (1094) Microplankton (136) phytoplankton (70) Microplankton (6) Phytoplankton taxonomy (1)

Protist taxonomy interpretation 2009-2013

The 2009-2013 protist biodiversity contains 14954 records, and ~ 352 unique scientific names (~ since there are whitespace and two fields are used; Taxon_full contains 619 unique strings).

Of the original names, 308 were known and 37 were unknown.

Samples log with multiple ids

{
	"sample": "AMM-016h; AMM-017h; AMM-018h",
	"expedition": "GlacierFront2017",
	"station": "KpM6",
	"sampled_from": "Helicopter",
	"cast": 5,
	"transect": "KpM",
	"gps": 11,
	"event": "340e8058-490c-5c11-8ef1-f05e847ea4b1"
}

Detect and treat symbionts

If the /[Ss]ymbiont/ is found in a scientificName
=> move scientificName into organismRemarks
=> set organismScope to "symbiont"

(this is safe since no taxa have [Ss]ymbiont in their name)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.