GithubHelp home page GithubHelp logo

anmsajedulalam / audiosignalfingerprinting Goto Github PK

View Code? Open in Web Editor NEW

This project forked from amirta875/audiosignalfingerprinting

0.0 1.0 0.0 50 KB

Audio signals identification algorithm based on spectral features fingerprinting and hash values matching.

MATLAB 100.00%

audiosignalfingerprinting's Introduction

Audio signals identification

Audio signals identification algorithm based on spectral features fingerprinting, promximate peak pairs and hash values matching.

The pipeline of this educational independent project was adapted from the famous SHAZAM algorithm by Dr. Avery Wang.

Introduction

The basic procedure of identifing a short clip of music using a database of songs is described as follow:

  1. Construct a database of spectral based features for multiple full-length songs.
  2. When a clip is to be identified, calculate the corresponding features of the clip.
  3. Search the database for a match with the features of the clip.

The spectral features for each audio signal will be characterized by the location of local peaks of magnitude in a spectrogram using the Short-time Fourier transform (STFT). Where the frequencies and timing of the peaks are stored as features and should be fairly robust to many possible forms of distortion, such as magnitude and phase error in the frequency domain due to the recording process or additive noise. A clip is matched to a song by considering all possible shifts in time and comparing of both their features. In order to mitigate the computational challenges of matching a large amount of features they are first simplified and preprocessed. Lastly Pairs of peaks that are close in both time and frequency are identified resulting in a table consiting of inital time (t1), edge time (t2), inital frequency (f1) and pair frequency (f2).

We then convert each peak pair to a hash value from the vector (f1; f2; t2 โˆ’ t1) which will serve as an index for a hash table for later song matching. In that manner peak pairs with the same frequencies and separation in time are considered a match. The timing t1 and the songid are stored in the hash table.

When a clip is to be identified, the list of pairs of peaks is produced, just as it would have been for a song in the database. Then the hash table is searched for each pair in the clip. This will produce a list of matches, each with different stored values of t1 and songid. Some of these matches will be accidental, either because the same peak pair occurred at another time or in another song, or because the hash table had a collision. However, we expect the correct song match to have a consistent timing offset from the clip. That is, the difference between t1 for the song and t1 for the clip should be the same for all correct matches. Finally The song with the most matches for a single timing offset is considered the best match.

Pipeline procedure

  1. Import data and preprocessing - read the song using 'mp3SongRead'. Average the two channels, subtract the mean, and downsample.
  2. Extract spectral fingerprint - compute the spectrogram of the song using 'spectralFingerprint'.
  3. Find peaks proximal pairs - find the local pairs of proximal peaks in the spectrogram by using the matlab function circshift.
  4. Train database - add the pairs into a hash table using 'size2pixel' and 'addHashTable'
  5. Load HASHTABLE, SONGID and settings that were created by 'buildDatabase' in stpes 1-4.
  6. Prepare a clip of music for identification either by a recording, a random song segment or an extrenal audio file.
  7. Extract the list of frequency pairs from the clip in the same manner done in steps 1-3 .
  8. Recover matches from hash table - look up for matches in the hash table, calculate time offsets, and sort them by song using 'matchSegment'.
  9. Identify the song with the most matches for a single consistent timing offset

Steps 1-4 are executed first in order to build a song data base using the script 'buildDatabase'.

Steps 5-9 can be then be executed using the script 'testProcess' in order to find a match between a clip to a song.

Take note that testing different values of spectral and time\frequency window parameters can be used in order to study the best combination resulting in an optimal time-frequency resolution for optimal identification.

Future work and improvments

  1. Testing a diffrent kernal then the STFT such as the DWT(discrete wavelet transform) for more robust results.
  2. Adding a threshold value using an adaptive filter for peak finding.

audiosignalfingerprinting's People

Contributors

amirta875 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.