Comments (12)
More explanation needed please.
from sead_change_control.
See updates (commented out) in 20170101_DDL_BIBLIO_REFACTOR_MODEL.
from sead_change_control.
The linked script refactors the old bibliographic reference model. Three new fields are added to tbl_biblio, and a number of other fields and tables are deprecated.
We need to update the new fields using the deprecated fields and tables are removed. These updates are commented out in the script, since they are not validated.
from sead_change_control.
Related to (or same as) #18.
from sead_change_control.
Love and I can look at this, I'd like to prioritize the bibliography for online in the next few months as this is critical for traceability.
from sead_change_control.
After scanning through tbl_biblio I discovered that *2126 posts were missing data or containing wrong year in the column “year_published”, however, the correct year are given in other columns (could this be an import issue?). This can be solved using a quick update query. This particular issue doesn’t concern the end-user who will access either authors or full reference which both contain the correct year, this is more a back-end documentation issue for the Bugs dataset references.
There is a clear difference between the Bugs and SEAD ingested references in terms of their format which cause some issue in terms of title and full reference. For example, the first 400 posts lack information about authors to the published texts, I believe some of these are MAL reports. This could therefore easily be corrected if asking the MAL staff for a list of publications that they have occasionally published online in an excel format.
A few posts have misplaced DOI and website information into the wrong columns.
from sead_change_control.
OK, there is a significant risk that the biblio records have not been attached to the correct datasets and sites during the import. We need to check this first, and possibly re-import all the bibliography.
There is a list of all MAL reports on the server at \MAL-data1\MAL Rapporter
from sead_change_control.
There are currently 469 biblio_id's without any connection to datasets, six of these are test data that remains within the database from an earlier stage I assume.
384 of these are Bugs references and 79 are SEAD references.
EDIT It appears that some of the SEAD posts that are lacking references belong to the ceramic dataset - check in with Mattias about these
Bilbio_id with no connection.xlsx
I ran this querry to check connections between posts and other tables.
--select string_agg(Chr(10), 'select ''' || table_name || ''' as table_name, biblio_id from ' || table_name || ' UNION ')
--from clearing_house.view_foreign_keys
--where schema_name = 'public'
-- and column_name = 'biblio_id'
with all_biblio_references as (
select 'tbl_dataset_masters' as table_name, biblio_id from tbl_dataset_masters UNION
select 'tbl_datasets' as table_name, biblio_id from tbl_datasets UNION
select 'tbl_ecocode_systems' as table_name, biblio_id from tbl_ecocode_systems UNION
select 'tbl_geochron_refs' as table_name, biblio_id from tbl_geochron_refs UNION
select 'tbl_methods' as table_name, biblio_id from tbl_methods UNION
select 'tbl_rdb_systems' as table_name, biblio_id from tbl_rdb_systems UNION
select 'tbl_relative_age_refs' as table_name, biblio_id from tbl_relative_age_refs UNION
select 'tbl_sample_group_references' as table_name, biblio_id from tbl_sample_group_references UNION
select 'tbl_site_other_records' as table_name, biblio_id from tbl_site_other_records UNION
select 'tbl_site_references' as table_name, biblio_id from tbl_site_references UNION
select 'tbl_species_associations' as table_name, biblio_id from tbl_species_associations UNION
select 'tbl_taxa_synonyms' as table_name, biblio_id from tbl_taxa_synonyms UNION
select 'tbl_taxonomic_order_biblio' as table_name, biblio_id from tbl_taxonomic_order_biblio UNION
select 'tbl_taxonomy_notes' as table_name, biblio_id from tbl_taxonomy_notes UNION
select 'tbl_tephra_refs' as table_name, biblio_id from tbl_tephra_refs UNION
select 'tbl_text_biology' as table_name, biblio_id from tbl_text_biology UNION
select 'tbl_text_distribution' as table_name, biblio_id from tbl_text_distribution UNION
select 'tbl_text_identification_keys' as table_name, biblio_id from tbl_text_identification_keys
) select *
from tbl_biblio b
left join all_biblio_references a using (biblio_id)
where a.biblio_id is null
ORDER BY biblio_id
from sead_change_control.
I'm adding a document to clarify what I mean with my previous comment that when rereading was a bit confusing. But I will check a sample set of biblio records and their connections to some other datasets and sites and get back to you @visead
from sead_change_control.
Biblio test data from Bugsdata 20200302
TCountsheet.CountsheetName = Dataset name in SEAD
SELECT TBiblio.*, TSite.SiteCODE, TSite.SiteName, TSite.Country, TCountsheet.CountsheetCODE, TCountsheet.CountsheetName
FROM (TSite INNER JOIN TCountsheet ON TSite.SiteCODE = TCountsheet.SiteCODE) INNER JOIN (TBiblio INNER JOIN TSiteRef ON TBiblio.REFERENCE = TSiteRef.Ref) ON TSite.SiteCODE = TSiteRef.SiteCODE;
QSiteDataRefs.xlsx
SELECT TBiblio.*, TBiology.CODE, TBiology.Data
FROM TBiblio INNER JOIN TBiology ON TBiblio.REFERENCE = TBiology.Ref;
QRefsBiology.xlsx
SELECT TBiblio.*, TDistrib.CODE, TDistrib.Data
FROM TBiblio INNER JOIN TDistrib ON TBiblio.REFERENCE = TDistrib.Ref;
QRefsDistrib.xlsx
from sead_change_control.
I have checked all of the Bugs bibliography and they seem to check out, however, I have not been able to verify some of the newer additions but I know that you and Roger have been handling those. Unfortunately I did not have time to go through the MAL section of the bibliography or any other additions from the newly ingested datasets - these need to be a subject for QA on a later date.
I sent to Roger an update which concerns mainly minor things in tbl_biblio where the wrong year is given or where DOI or URL is contained in the title column instead of their respective columns.
This update regards ca. 1800 posts.
from sead_change_control.
Loves updates handled in 20221205_DML_QUALITY_CONTROL_BIBLIO_UPDATES.
from sead_change_control.
Related Issues (20)
- tbl_dendro_dates season indicator needs updating
- Multiple problems with sample dimensions HOT 1
- Import 'volume after float' and 'sample note' values from archeobotany excel sheets HOT 1
- Need to figure out bugs seasonality data HOT 1
- Tool tip text lacks explanation for several filters HOT 1
- RDB system filter: är grupperingen korrekt? HOT 4
- Koch ecocode 21285 "FNoDi" has no name HOT 3
- Need better descriptions for some filters HOT 1
- tbl_analysis_entity_ages
- Embargo-system HOT 1
- High abundance counts HOT 3
- Ecocode systems lacks definitions
- Sample contexts and Feature types HOT 2
- Updated facet descriptions HOT 1
- Ceramic shard misspelled as 'sherd' HOT 1
- When selecting Sweden in the country filter, a site in Denmark is also shown. HOT 1
- Sampling methods contains lots of "Temp record" HOT 5
- Biblio sometimes loads publication year twice HOT 1
- Spelling errors in tbl_dimensions HOT 1
- Can't transform coordinates for site 12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sead_change_control.