GithubHelp home page GithubHelp logo

josielmanzonni / socialcrawler Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 122 KB

Python Script to help in data mining from Twitter, FourSquare and Swarm.

Home Page: https://josielwirlino.github.io/SocialCrawler/

License: GNU General Public License v3.0

Python 99.76% Shell 0.24%

socialcrawler's Introduction

SocialCrawler

It is a python package to help get data from Twitter, Foursquare.

This package was created to facilitate the data mining from Twitter and Foursquare. (Only Linux)

Install (generic way)

	$ python3 -m pip install SocialCrawler

How work ?

Requirements

  • Python >= 3
  • setuptools
  • Foursquare developer credentials ( if you wanna work with)
  • Twitter developer credentials ( if you wanna work with )
  • geckodriver installed and in $PATH (we got this problem with when try run in Linux Mint and Kali)
    $ export PATH=$PATH:<geckodriver-path>

Possibility

  • As the package use tweepy as framework to connect with Twitter we can use Twitter Stream API. Therefore you can search based in :
    • delimited
    • stall_warnings
    • filter_level
    • language
    • follow
    • track
    • locations
    • count
    • with
    • replies
    • stringift_friend_id

As shown in Stream Overview

  • Getting check-ins shared in Twitter or the check-ins of the last week.
    • If you have a Foursquare credential you will be able to track data from specific locations and others.

See Wiki!

  • v 0.1.0

    • fixed module class declaration
  • v 0.0.9

    • fixed syntax erro and hacking method dir output
  • v 0.0.8

    • added selenium as requirements to use foursquare browser request (to avoid rate limit), can not work
    • updated ExtractorData to a full version to allow get (almost) full VENUE info (NewExtractorData)
    • removed urlib2 as requirements
    • updated run flow, now always we will have return just check if the field is NULL, when this happen it is because the data is missing
  • v 0.0.7

    • when VENUE or FOURSQUARE get requests error the program thread will wait 15 minutes to request again
    • Added new except treatments
    • separeted foursquare request and venue request in two try-except blocks
    • fixed write categorie_id bug, missing int to str convert
    • yet in ExtractorData possibility of use other file (non a created by Collector or CollectorV2 ) to consult Foursquare. (not available yet)
  • v 0.0.6

    • Formatted to PEP257 and PEP8 (almost)
    • Implementaded ExtractorData: a simple way to get data from Foursquare using the swarm url code
    • Add HistoricalCollector.CollectorV2 that get all data from json tweet and save as tsv file
    • Add in ExtractorData possibility of use other file (non a created by Collector or CollectorV2 ) to consult Foursquare. (not available yet)
    • added urllib2 as Requirements
  • v 0.0.5

    • Fixed bug in getStoredData function that allow some parameter be None
    • Updated format file name generated
    • Increased time wait request from 15 minutos to 16. ( Sometimes when was tried request again -after 15 minutes - the server responded that don't finished the 15 minutes.
    • Updated the fields saved. Now all field is saved in a file using \tab format as is shown in Wiki.

socialcrawler's People

Contributors

josielmanzonni avatar

Stargazers

 avatar Saadh avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.