GithubHelp home page GithubHelp logo

pkasela / sound-of-data Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 8.81 MB

Pota

License: BSD 2-Clause "Simplified" License

Python 39.64% PigLatin 20.28% Shell 8.42% TeX 28.81% R 2.86%
hadoop pig pig-latin python riak-kv kafka kafka-streams neo4j cypher nlp

sound-of-data's Introduction

TODO LIST:

DataSet Link

Here is the link to the cleaned dataset ready for neo4j, You need to have a GitLab account!!

Or use GDrive Link to the Dataset

Link for the presentation

Use maria_dev account in VM (HDP and not HDF) to recreate the database if needed.

Data Management

  • Import data
    • fix .tsv with get_data.py
    • Decide the tables and their attributes to keep
    • .tsv -> PIG -> clean .tsv (con JOIN e FILTER(GENERATE for PIG)) with PigCleaning.sh
    • clean .tsv -> neo4j neo4j_import.sh
    • index on the graph
    • constraint on the graph for unique gid of the entities
    • Scrape down musicBrainz genres using musicBrainz API
    • Remove the useless genres such as: audiobook to reduce the dimesione of the list
  • Tweet

Data Semantics

  • Analisi tweet
    • costruzione modello/i per filtro
    • analisi prestazioni modello/i (dai abbastanza bene la prestazione)

Analisi

  • Rimozione bot
    • indivuduare utenti e "bannarli" (Botometer)
    • storage di whitelist e blacklist con RiakDB
  • Query interessanti
  • Analisi
    • trovare i cluster sulle parole musicali più twittati. # "comunità" musicali
    • trovare cicli periodici (mattina, pomeriggio, sera, notte)

Data Visualization

  • Plot plot plot plot link to the website
    • Un possibile plot è work cloud (a forma di qualcosa di musica magari)link
    • Barplot per la densità di distribuzione nei vari giorni (e periodo) link
  • Convalida plot
    • noi stessi
    • tante altre persone

Presentation

  • complete the presentation

Per segnare come fatto una casella, aggiungere una X all'interno delle parentesi quadre [ ] -> [X]

sound-of-data's People

Contributors

angusfangus avatar moiraghif avatar pkasela avatar rcrvro avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.