GithubHelp home page GithubHelp logo

Comments (12)

visead avatar visead commented on June 7, 2024

More explanation needed please.

from sead_change_control.

roger-mahler avatar roger-mahler commented on June 7, 2024

See updates (commented out) in 20170101_DDL_BIBLIO_REFACTOR_MODEL.

from sead_change_control.

roger-mahler avatar roger-mahler commented on June 7, 2024

The linked script refactors the old bibliographic reference model. Three new fields are added to tbl_biblio, and a number of other fields and tables are deprecated.

We need to update the new fields using the deprecated fields and tables are removed. These updates are commented out in the script, since they are not validated.

from sead_change_control.

roger-mahler avatar roger-mahler commented on June 7, 2024

Related to (or same as) #18.

from sead_change_control.

visead avatar visead commented on June 7, 2024

Love and I can look at this, I'd like to prioritize the bibliography for online in the next few months as this is critical for traceability.

from sead_change_control.

Loer0022 avatar Loer0022 commented on June 7, 2024

After scanning through tbl_biblio I discovered that *2126 posts were missing data or containing wrong year in the column “year_published”, however, the correct year are given in other columns (could this be an import issue?). This can be solved using a quick update query. This particular issue doesn’t concern the end-user who will access either authors or full reference which both contain the correct year, this is more a back-end documentation issue for the Bugs dataset references.

There is a clear difference between the Bugs and SEAD ingested references in terms of their format which cause some issue in terms of title and full reference. For example, the first 400 posts lack information about authors to the published texts, I believe some of these are MAL reports. This could therefore easily be corrected if asking the MAL staff for a list of publications that they have occasionally published online in an excel format.

A few posts have misplaced DOI and website information into the wrong columns.

from sead_change_control.

visead avatar visead commented on June 7, 2024

OK, there is a significant risk that the biblio records have not been attached to the correct datasets and sites during the import. We need to check this first, and possibly re-import all the bibliography.

There is a list of all MAL reports on the server at \MAL-data1\MAL Rapporter

from sead_change_control.

Loer0022 avatar Loer0022 commented on June 7, 2024

There are currently 469 biblio_id's without any connection to datasets, six of these are test data that remains within the database from an earlier stage I assume.

384 of these are Bugs references and 79 are SEAD references.

EDIT It appears that some of the SEAD posts that are lacking references belong to the ceramic dataset - check in with Mattias about these

Bilbio_id with no connection.xlsx


I ran this querry to check connections between posts and other tables.

--select string_agg(Chr(10), 'select ''' || table_name || ''' as table_name, biblio_id from  ' || table_name || ' UNION ')
--from clearing_house.view_foreign_keys
--where schema_name = 'public'
--  and column_name = 'biblio_id'
 
with all_biblio_references as (
                          select 'tbl_dataset_masters' as table_name, biblio_id from  tbl_dataset_masters UNION
                          select 'tbl_datasets' as table_name, biblio_id from  tbl_datasets UNION
                          select 'tbl_ecocode_systems' as table_name, biblio_id from  tbl_ecocode_systems UNION
                          select 'tbl_geochron_refs' as table_name, biblio_id from  tbl_geochron_refs UNION
                          select 'tbl_methods' as table_name, biblio_id from  tbl_methods UNION
                          select 'tbl_rdb_systems' as table_name, biblio_id from  tbl_rdb_systems UNION
                          select 'tbl_relative_age_refs' as table_name, biblio_id from  tbl_relative_age_refs UNION
                          select 'tbl_sample_group_references' as table_name, biblio_id from  tbl_sample_group_references UNION
                          select 'tbl_site_other_records' as table_name, biblio_id from  tbl_site_other_records UNION
                          select 'tbl_site_references' as table_name, biblio_id from  tbl_site_references UNION
                          select 'tbl_species_associations' as table_name, biblio_id from  tbl_species_associations UNION
                          select 'tbl_taxa_synonyms' as table_name, biblio_id from  tbl_taxa_synonyms UNION
                          select 'tbl_taxonomic_order_biblio' as table_name, biblio_id from  tbl_taxonomic_order_biblio UNION
                          select 'tbl_taxonomy_notes' as table_name, biblio_id from  tbl_taxonomy_notes UNION
                          select 'tbl_tephra_refs' as table_name, biblio_id from  tbl_tephra_refs UNION
                          select 'tbl_text_biology' as table_name, biblio_id from  tbl_text_biology UNION
                          select 'tbl_text_distribution' as table_name, biblio_id from  tbl_text_distribution UNION
                          select 'tbl_text_identification_keys' as table_name, biblio_id from  tbl_text_identification_keys
) select *
  from tbl_biblio b
  left join all_biblio_references a using (biblio_id)
  where a.biblio_id is null
  ORDER BY biblio_id

from sead_change_control.

Loer0022 avatar Loer0022 commented on June 7, 2024

I'm adding a document to clarify what I mean with my previous comment that when rereading was a bit confusing. But I will check a sample set of biblio records and their connections to some other datasets and sites and get back to you @visead

Tbl_biblio gitbub.xlsx

from sead_change_control.

visead avatar visead commented on June 7, 2024

Biblio test data from Bugsdata 20200302
TCountsheet.CountsheetName = Dataset name in SEAD

SELECT TBiblio.*, TSite.SiteCODE, TSite.SiteName, TSite.Country, TCountsheet.CountsheetCODE, TCountsheet.CountsheetName
FROM (TSite INNER JOIN TCountsheet ON TSite.SiteCODE = TCountsheet.SiteCODE) INNER JOIN (TBiblio INNER JOIN TSiteRef ON TBiblio.REFERENCE = TSiteRef.Ref) ON TSite.SiteCODE = TSiteRef.SiteCODE;
QSiteDataRefs.xlsx

SELECT TBiblio.*, TBiology.CODE, TBiology.Data
FROM TBiblio INNER JOIN TBiology ON TBiblio.REFERENCE = TBiology.Ref;
QRefsBiology.xlsx

SELECT TBiblio.*, TDistrib.CODE, TDistrib.Data
FROM TBiblio INNER JOIN TDistrib ON TBiblio.REFERENCE = TDistrib.Ref;
QRefsDistrib.xlsx

from sead_change_control.

Loer0022 avatar Loer0022 commented on June 7, 2024

I have checked all of the Bugs bibliography and they seem to check out, however, I have not been able to verify some of the newer additions but I know that you and Roger have been handling those. Unfortunately I did not have time to go through the MAL section of the bibliography or any other additions from the newly ingested datasets - these need to be a subject for QA on a later date.

I sent to Roger an update which concerns mainly minor things in tbl_biblio where the wrong year is given or where DOI or URL is contained in the title column instead of their respective columns.

This update regards ca. 1800 posts.

from sead_change_control.

roger-mahler avatar roger-mahler commented on June 7, 2024

Loves updates handled in 20221205_DML_QUALITY_CONTROL_BIBLIO_UPDATES.

from sead_change_control.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.