polifonia-project / kg_data_transformation Goto Github PK
View Code? Open in Web Editor NEWScript and services to extract data from raw files JSON,csv,tsv and create Polifonia KG
Script and services to extract data from raw files JSON,csv,tsv and create Polifonia KG
In the sample ttl file the Harmonic Similarity with id harm_sim_08888_00111
contains only a single ChordProgression, whereas I expect to find two:
For reference, the source data is here:
KG_data_transformation/harmonic_sim_etl/data/harm_sym.json
Lines 44 to 67 in 8949220
Andrea will provide us similarity between lyrics.
Ontology in this issue, along with related discussion about the model.
?artistBirthPlaceCodeIRI a core:Country ;
rdfs:label ?artistBirthPlaceCodeIRI .
See here:
In the KG :
<https://w3id.org/polifonia/resource/Country/AR>
a core:Country ;
rdfs:label <https://w3id.org/polifonia/resource/Country/AR> .
Integrate KGs before:
New file here:
# line lyric
?lyricLineAIRI a mf:LyricLine ;
rdfs:label ?lineALabel ;
mf:isPartOf ?lyricsAIRI ;
mf:isLineLyricOfRecording ?recordingAIRI ;
# mf:hasLineNumber ?lineNumberA ;
cm:isInvolvedinLyricLineSimilarity ?lyricLineSimIRI .
?lyricLineSimIRI a cm:LyricLineSimilarity ;
cm:involvesLyricLine ?linelyricAIRI ;
cm:involvesLyricLine ?linelyricBIRI ;
?lyricLineAIRI
?linelyricAIRI
the triple cm:involvesLyricLine
is not materialized
Binary relations like:
mp:Recording core:hasArtistBirthPlace core:Place
mp:Recording core:hasRecordingProcessSessionPlace core:Place
mp:Recording core:hasArtistCountryPlace core:Place
The three properties should have core:hasPlace
as super property in the ontology. At the moment an edge like :recording_1 core:hasPlace :artist_birthplace
will be explicitly add to the KG to push the process faster for sonar demo purpose .
These proeprties will simplify queries.
Running testbed-0.0.2 on KG output of query-0.0.4
* * * * * * * * * *
* * * KGT run * * *
* * * * * * * * * *
[*] reading testbed .\kg\test\testbed-0.0.2.json
[*] parsing KG .\kg\versions\polifonia-kg-places-0.0.4.ttl
+[*] Running Test 1: Expected Recordings Count
+[+] Test passed: expected rows count correct 725
-[*] Running Test 2: Expected Artist Count
-[!] Test failed: expected number of rows 103 found 49
-[*] Running Test 3: Expected Places Count
-[!] Test failed: expected number of rows 37 found 108
+[*] Running Test 4: Expected Sessions Count
+[+] Test passed: expected rows count correct 673
-[*] Running Test 5: Expected Song with 2 artists
-[!] Test failed: expected number of rows 14 found 9
+[*] Running Test 6: Expected Song with 3 artists
+[+] Test passed: expected rows count correct 1
+[*] Running Test 7: Expected Song with matching attributes (label, titleLabel, ...)
+[+] Test passed: expected rows [{'performerLabel': 'The Beatles', 'recordingTitleLabel': 'I Saw Her Standing There'}, {'performerLabel': 'Dietrich Fischer-Dieskau', 'recordingTitleLabel': 'Gerald Moore'}, {'performerLabel': 'Thomas Allen', 'recordingTitleLabel': 'Roger Vignoles'}] found rows [{'performerLabel': 'Dietrich Fischer-Dieskau', 'recordingTitleLabel': 'Gerald Moore'}, {'performerLabel': 'Thomas Allen', 'recordingTitleLabel': 'Roger Vignoles'}, {'performerLabel': 'The Beatles', 'recordingTitleLabel': 'I Saw Her Standing There'}]
There are duplicates in raw data. And duplicates generated by dataset transformations.
Harmonic transformation
"harmonicSimIRI": "https://w3id.org/polifonia/resource/HarmonicSimilarity/harm_sim_isophonics_173_isophonics_243_00002"
"harmonicSimIRI": "https://w3id.org/polifonia/resource/HarmonicSimilarity/harm_sim_isophonics_243_isophonics_173_00002"
Same instance happens twice, probably generating URI policy not affected by symmetry will extirpate this. It will be an easy solution.
Otherwise we need to change the query to don't considered duplicate. Better do not going over this path.
Lyric Lines Transformation
:
{
"lyrSimId": "lyr_sim_isophonics_45_isophonics_288_178_179",
"compSimScore": null,
"humanSimScore": null,
"lineA": {
"lineLabel": "Ha da da, ha da da ahh",
"recordingId": "isophonics_45",
"lineNumber": "178",
"recordingName": "Don't Stop Me Now",
"artistName": "Queen"
},
"lineB": {
"lineLabel": "Ah da, ah da, ah da, ah da",
"recordingId": "isophonics_288",
"lineNumber": "178",
"recordingName": "Lovely Rita",
"artistName": "The Beatles"
}
},
{
"lyrSimId": "lyr_sim_isophonics_45_isophonics_208_195_196",
"compSimScore": null,
"humanSimScore": null,
"lineA": {
"lineLabel": "Ha da da, ha da da ahh",
"recordingId": "isophonics_45",
"lineNumber": "195",
"recordingName": "Don't Stop Me Now",
"artistName": "Queen"
},
"lineB": {
"lineLabel": "Ah-ah-ah, ah-ah-ahh",
"recordingId": "isophonics_208",
"lineNumber": "195",
"recordingName": "A Day in the Life",
"artistName": "The Beatles"
}
},
{
"lyrSimId": "lyr_sim_isophonics_45_isophonics_208_243_244",
"compSimScore": null,
"humanSimScore": null,
"lineA": {
"lineLabel": "Ha da da, ha da da ahh",
"recordingId": "isophonics_45",
"lineNumber": "243",
"recordingName": "Don't Stop Me Now",
"artistName": "Queen"
},
"lineB": {
"lineLabel": "Ah-ah-ah, ah-ah-ahh",
"recordingId": "isophonics_208",
"lineNumber": "243",
"recordingName": "A Day in the Life",
"artistName": "The Beatles"
}
},
"lineA": {
"lineLabel": "Ha da da, ha da da ahh",
"recordingId": "isophonics_45",
"lineNumber": "245",
"recordingName": "Don't Stop Me Now",
"artistName": "Queen"
},
These are probably not real duplicates but , same phrase appearing more times in the same song. How do we handle them ?
@andreamust what do you think ? Just keeping them there ?
This probably shouldn't be changed not in KG_data_transformation (nice news) but if we want filter out in KG2SONAR app transformation. But according to which criteria ?
Add session label in the transformation rules.
see this example in the ttl file I am using for testing:
The HarmonicSimilarity harm_sim_00424_00339
contains the following ChordProgessions:
00339_Fmaj#_G#5_C#
00424_Fmaj_Cmaj_Fmaj
00339_C#_F#5_C#
00424_GmajB_Cmaj_GmajD
I assume this encodes a similarity based on:
00339_Fmaj#_G#5_C#
<--harmonically similar to --> 00424_Fmaj_Cmaj_Fmaj
00339_C#_F#5_C#
<--harmonically similar to --> 00424_GmajB_Cmaj_GmajD
The problem with this is that, while creating annotations for the sonar demo, I have no way of telling whether 00339_Fmaj#_G#5_C#
is related to 00424_Fmaj_Cmaj_Fmaj
or 00424_GmajB_Cmaj_GmajD
(likewise for all other combinations).
I might be able to use the ordering of the ChordProgressions, but that feels brittle and hacky.
Could we consider restricting HarmonicSimiliarities to a single pair of ChordProgressions (at least for the time being)?
If so, should this be done at the stage of
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.