GithubHelp home page GithubHelp logo

kg_data_transformation's People

Contributors

4562448 avatar andreamust avatar ccolonna avatar delfimpandiani avatar enridaga avatar fiorelaciroku avatar

Watchers

 avatar

kg_data_transformation's Issues

sample HarmSim file contains a HarmSim with single ChordProgression

In the sample ttl file the Harmonic Similarity with id harm_sim_08888_00111 contains only a single ChordProgression, whereas I expect to find two:

cm:involvesChordProgression <https://w3id.org/polifonia/resource/ChordProgression/08888_AmajB_Fmaj_D> ;

For reference, the source data is here:

{
"harm_sim_id":"harm_sim_08888_00111",
"recordingAid":"08888",
"recordingBid":"00111",
"compSimScore":0.5,
"cp_matches": [
{
"cp_match_id":"00003",
"humanSimScore":4,
"cpA": {
"recording_IRI": "https://w3id.org/polifonia/resource/Recording/08888",
"cp_id": "08888_AmajB_Fmaj_D",
"start": "0:09:01.320000",
"end": "0:10:03.700000"
},
"cpB": {
"recording_IRI": "https://w3id.org/polifonia/resource/Recording/00111",
"cp_id": "00111_C#_F#5_C#",
"start": "0:01:01.206300",
"end": "0:01:48.00000"
}
}
]
}

Harmonic similarity RDF

Andrea will provide us similarity between lyrics.
Ontology in this issue, along with related discussion about the model.

Bug in lyrics line similarity transformation

    # line lyric
    ?lyricLineAIRI  a mf:LyricLine ;
                    rdfs:label ?lineALabel ;
                    mf:isPartOf ?lyricsAIRI ;
                    mf:isLineLyricOfRecording ?recordingAIRI ;
#                    mf:hasLineNumber ?lineNumberA ;                    
                    cm:isInvolvedinLyricLineSimilarity ?lyricLineSimIRI .

    ?lyricLineSimIRI a cm:LyricLineSimilarity ;
                    cm:involvesLyricLine ?linelyricAIRI ;
                    cm:involvesLyricLine ?linelyricBIRI ;

?lyricLineAIRI
?linelyricAIRI

the triple cm:involvesLyricLine is not materialized

Add binary relations to places

Binary relations like:

mp:Recording core:hasArtistBirthPlace core:Place
mp:Recording core:hasRecordingProcessSessionPlace core:Place
mp:Recording core:hasArtistCountryPlace core:Place

The three properties should have core:hasPlace as super property in the ontology. At the moment an edge like :recording_1 core:hasPlace :artist_birthplace will be explicitly add to the KG to push the process faster for sonar demo purpose .

These proeprties will simplify queries.

Query with artist birthplace cause test_0.0.2 to fail

Running testbed-0.0.2 on KG output of query-0.0.4

* * * * * * *  * * *
 * * * KGT run * * *
* * * * * * *  * * *

[*] reading testbed .\kg\test\testbed-0.0.2.json
[*] parsing KG .\kg\versions\polifonia-kg-places-0.0.4.ttl

+[*] Running Test 1: Expected Recordings Count
+[+]     Test passed: expected rows count correct 725

-[*] Running Test 2: Expected Artist Count
-[!]     Test failed: expected number of rows 103 found 49

-[*] Running Test 3: Expected Places Count
-[!]     Test failed: expected number of rows 37 found 108

+[*] Running Test 4: Expected Sessions Count
+[+]     Test passed: expected rows count correct 673

-[*] Running Test 5: Expected Song with 2 artists
-[!]     Test failed: expected number of rows 14 found 9

+[*] Running Test 6: Expected Song with 3 artists
+[+]     Test passed: expected rows count correct 1

+[*] Running Test 7: Expected Song with matching attributes (label, titleLabel, ...)
+[+]     Test passed: expected rows [{'performerLabel': 'The Beatles', 'recordingTitleLabel': 'I Saw Her Standing There'}, {'performerLabel': 'Dietrich Fischer-Dieskau', 'recordingTitleLabel': 'Gerald Moore'}, {'performerLabel': 'Thomas Allen', 'recordingTitleLabel': 'Roger Vignoles'}] found rows [{'performerLabel': 'Dietrich Fischer-Dieskau', 'recordingTitleLabel': 'Gerald Moore'}, {'performerLabel': 'Thomas Allen', 'recordingTitleLabel': 'Roger Vignoles'}, {'performerLabel': 'The Beatles', 'recordingTitleLabel': 'I Saw Her Standing There'}]

Bug with duplicate values

There are duplicates in raw data. And duplicates generated by dataset transformations.

Harmonic transformation

"harmonicSimIRI": "https://w3id.org/polifonia/resource/HarmonicSimilarity/harm_sim_isophonics_173_isophonics_243_00002"
 "harmonicSimIRI": "https://w3id.org/polifonia/resource/HarmonicSimilarity/harm_sim_isophonics_243_isophonics_173_00002"

Same instance happens twice, probably generating URI policy not affected by symmetry will extirpate this. It will be an easy solution.
Otherwise we need to change the query to don't considered duplicate. Better do not going over this path.

Lyric Lines Transformation:



        {
            "lyrSimId": "lyr_sim_isophonics_45_isophonics_288_178_179",
            "compSimScore": null,
            "humanSimScore": null,
            "lineA": {
                "lineLabel": "Ha da da, ha da da ahh",
                "recordingId": "isophonics_45",
                "lineNumber": "178",
                "recordingName": "Don't Stop Me Now",
                "artistName": "Queen"
            },
            "lineB": {
                "lineLabel": "Ah da, ah da, ah da, ah da",
                "recordingId": "isophonics_288",
                "lineNumber": "178",
                "recordingName": "Lovely Rita",
                "artistName": "The Beatles"
            }
        },
            
                   {
            "lyrSimId": "lyr_sim_isophonics_45_isophonics_208_195_196",
            "compSimScore": null,
            "humanSimScore": null,
            "lineA": {
                "lineLabel": "Ha da da, ha da da ahh",
                "recordingId": "isophonics_45",
                "lineNumber": "195",
                "recordingName": "Don't Stop Me Now",
                "artistName": "Queen"
            },
            "lineB": {
                "lineLabel": "Ah-ah-ah, ah-ah-ahh",
                "recordingId": "isophonics_208",
                "lineNumber": "195",
                "recordingName": "A Day in the Life",
                "artistName": "The Beatles"
            }
        },
        
                {
            "lyrSimId": "lyr_sim_isophonics_45_isophonics_208_243_244",
            "compSimScore": null,
            "humanSimScore": null,
            "lineA": {
                "lineLabel": "Ha da da, ha da da ahh",
                "recordingId": "isophonics_45",
                "lineNumber": "243",
                "recordingName": "Don't Stop Me Now",
                "artistName": "Queen"
            },
            "lineB": {
                "lineLabel": "Ah-ah-ah, ah-ah-ahh",
                "recordingId": "isophonics_208",
                "lineNumber": "243",
                "recordingName": "A Day in the Life",
                "artistName": "The Beatles"
            }
        },
            
                        "lineA": {
                "lineLabel": "Ha da da, ha da da ahh",
                "recordingId": "isophonics_45",
                "lineNumber": "245",
                "recordingName": "Don't Stop Me Now",
                "artistName": "Queen"
            },

These are probably not real duplicates but , same phrase appearing more times in the same song. How do we handle them ?
@andreamust what do you think ? Just keeping them there ?

This probably shouldn't be changed not in KG_data_transformation (nice news) but if we want filter out in KG2SONAR app transformation. But according to which criteria ?

some HarmonicSimilarities contain 4 ChordProgressions

see this example in the ttl file I am using for testing:

<https://w3id.org/polifonia/resource/HarmonicSimilarity/harm_sim_00424_00339>
a cm:HarmonicSimilarity ;
cm:hasSimScore "0.9"^^<http://www.w3.org/2001/XMLSchema#float> ;
cm:involvesChordProgression <https://w3id.org/polifonia/resource/ChordProgression/00339_Fmaj%23_G%235_C%23> , <https://w3id.org/polifonia/resource/ChordProgression/00424_Fmaj_Cmaj_Fmaj> , <https://w3id.org/polifonia/resource/ChordProgression/00339_C%23_F%235_C%23> , <https://w3id.org/polifonia/resource/ChordProgression/00424_GmajB_Cmaj_GmajD> ;
cm:involvesRecording <https://w3id.org/polifonia/resource/Recording/00339> , <https://w3id.org/polifonia/resource/Recording/00424> .

The HarmonicSimilarity harm_sim_00424_00339 contains the following ChordProgessions:

  • 00339_Fmaj#_G#5_C#
  • 00424_Fmaj_Cmaj_Fmaj
  • 00339_C#_F#5_C#
  • 00424_GmajB_Cmaj_GmajD

I assume this encodes a similarity based on:

  • 00339_Fmaj#_G#5_C# <--harmonically similar to --> 00424_Fmaj_Cmaj_Fmaj
  • 00339_C#_F#5_C# <--harmonically similar to --> 00424_GmajB_Cmaj_GmajD

The problem with this is that, while creating annotations for the sonar demo, I have no way of telling whether 00339_Fmaj#_G#5_C# is related to 00424_Fmaj_Cmaj_Fmaj or 00424_GmajB_Cmaj_GmajD (likewise for all other combinations).

I might be able to use the ordering of the ChordProgressions, but that feels brittle and hacky.

Could we consider restricting HarmonicSimiliarities to a single pair of ChordProgressions (at least for the time being)?
If so, should this be done at the stage of

  1. calculating HarmonicSimilarities @andreamust @jonnybluesman?
    OR
  2. KG data transformation @delfimpandiani @ccolonna @FiorelaCiroku?
    OR
  3. am I missing something and I should be able to deal with this at the Sonar data transformation step?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.