GithubHelp home page GithubHelp logo

mixedemotions / twitter_crawlers Goto Github PK

View Code? Open in Web Editor NEW
11.0 3.0 4.0 53 KB

MixedEmotions module that connects to the Twitter Stream API in order to retrieve Tweets regarding certain keywords or phrases

Python 94.83% Shell 5.17%
mixedemotions-modules twitter-crawler

twitter_crawlers's Introduction

MixedEmotions

MixedEmotions

The MixedEmotions platform is a Big Data Toolbox for multilingual and multimodal emotion analysis. It is built around stand alone docker modules with an orchestrator that links the modules into analysis workflows utilising MESOS for scalable cloud deployment. Core capabilities include emotion extraction from text, audio and video with many other capabilities, such as sentiment analysis, social network analysis, entity detection and linking and sophisticated data visualisation.

Contact

[email protected]

Citation

if you use any of the modules, please cite the following paper: http://ieeexplore.ieee.org/abstract/document/8269329/

Demonstration Site

http://mixedemotions.insight-centre.org/

Modules

The MixedEmotions platform has been developed with several modules in it. Those modules provide diverse functionalities. Mainly, the toolbox is focused on extracting emoitions from different modalities: text, audio and video. However it also provides other kind of functionalities such as Social Network analysis, Knowledge graphs, entity linking and many others.

Some functionalities are provided with open source modules. Others are provided with proprietary modules.

The list of modules can be found in these tables below, with links pointing to resources where you can have more information about them.

Open Source Modules

Id Functionality Modality Language Source Download
m1 Sentiment Extraction Sentiment Extraction Text EN github Dockerhub
m2 Sentiment Extraction Sentiment Extraction Text EN, CS github [Dockerhub]
m4 Sentiment Extraction Sentiment Extraction Text EN, ES github Dockerhub
m5 Emotion recognition Emotion recognition from Text Text EN [github] Dockerhub
m6 Emotion recognition Emotion recognition from Audio Audio EN github Dockerhub
m7 Emotion recognition Emotion recognition from Text Text EN, ES, Multiple github Dockerhub
m8 Entity Extraction Entity Extraction Text ES github Dockerhub
m10 Entity Extraction and Linking Entity Extraction Linking Text EN github Dockerhub
m13 Topic Extraction Topic Extraction Text ES github Dockerhub
m16 Suggestion mining Suggestion Mining Text EN github Dockerhub
m20 Twitter media crawler Twitter Crawler Text n/a github n/a
m21 Fusion Fusion Text/Audio/Video n/a [github] [Dockerhub]
m22 Social Network Analysis Social Network Analysis graph n/a github Dockerhub
m25 Social semantic Knowledge graph KnowledgeGraph graph n/a github Dockerhub
m27 Emotion recognition from Video Emotion recognition from Video Video n/a github +info
m28 Analytics module “Kibi” - - github kibi
m32 Youtube crawler Youtube Crawler Text/Video n/a github n/a

Proprietary Modules

Id Functionality Modality Language Proprietary More info
m3 Sentiment Extraction Sentiment Extraction Text EN, IT ExpertSystem +info
m9 Entity Extraction Entity Extraction Text IT, EN ExpertSystem +info
m11 Topic Extraction Topic Extraction Text IT, EN ExpertSystem +info
m12 Topic Extraction Topic Extraction Text EN NUIG +info
m15 Entity Linking Entity Extraction Linking Text IT, EN ExpertSystem +info
m17 Speech to text Speech To Text Audio EN Phonexia +info
m18 Machine translation Machine Translations Text CS, ES, DE, IT NUIG +info
m23 Emotion recognition from Audio Emotion recognition from Audio Audio DE, EN, CS Phonexia +info
m24 Recommendation engine Text EN ExpertSystem +info
m28 Age estimation from audio Age Estimation Audio n/a Phonexia +info
m29 Gender identification from audio Gender Identification Audio n/a Phonexia +info

Custom Module

You can use your REST service or your own Docker module within the platform. For how to use a Docker module within the platform check the wiki

Orchestrator

Additionally an open source orchestrator has been developed as an starting point on using the MixedEmotions Toolbox. You can find it here.

More information

More information about the platform can be found on this project's wiki page.

Partners

Partner Country
NATIONAL UNIVERSITY OF IRELAND, GALWAY Ireland
UNIVERSIDAD POLITECNICA DE MADRID Spain
UNIVERSITAT PASSAU Germany
EXPERT SYSTEM S.P.A. Italy
PARADIGMA DIGITAL SL Spain
VYSOKE UCENI TECHNICKE V BRNE Czech Republic
SINDICE LIMITED Ireland
DEUTSCHE WELLE Germany
PHONEXIA SRO Czech Republic

Acknowledgement

This development has been funded by the European Union through the MixedEmotions Project (project number H2020 655632), as part of the RIA ICT 15 Big data and Open Data Innovation and take-up programme.

MixedEmotions

EU

http://ec.europa.eu/research/participants/portal/desktop/en/opportunities/index.html

twitter_crawlers's People

Contributors

canademar avatar drevicko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

twitter_crawlers's Issues

use recent version of Twarc

It looks like the crawler is using Twarc version 0.3.3

The current version of Twarc on pypi (1.0.9) has a different API, where a Twarc object has a filter method (with different/more parameters) instead of the stream method used here.

We can keep a branch with the older Twarc while the paradigma demo site is still using it.

crontab entry to ensure crawler restarts if something goes wrong

To ensure continuous crawling, here's a linux crontab entry that would check every 5 minutes to see if it's running:

*/5 * * * * campaigns_twitter_crawler/start_project_twitter_crawler.sh start >> campaigns_twitter_crawler/cronlog.txt

This entry expects campaigns_twitter_crawler to be a subfolder of the users home folder. If this works well, we should add it somewhere, either in the readme or as a comment in the respective launch scripts.

should we be flagging hash tags also?

the current twitter_crawler_project.py script does not flag when hash tags have been identified as synonyms. From my perspective, knowing if a keyword has been detected as a hash tag vs normal word is important, as they can have quite different meanings.

I'd like to change the code that keeps user mentions (eg: @mrfoo) found in the tweet to also keep hash tags (currently only the word part of the tag is reported).

It'd mean two changes:

  • r"@%s\b" changed to r"[#@]%s\b"
  • "@%s" % keyword changed to "%s%s" % (word_match.group()[0], keyword)

Thoughts?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.