GithubHelp home page GithubHelp logo

caitelatte / social-media-data Goto Github PK

View Code? Open in Web Editor NEW
19.0 2.0 8.0 69 KB

How to analyse social media data archives!

Jupyter Notebook 100.00%
data-science jupyter-notebook social-media-analysis personal-data-analysis pyconau

social-media-data's Introduction

Social Media Data Analysis

This is a project for analysing archives of your own personal data as provided by social media sites. I (Caitlin Macleod) made this code for a talk at PyConAU 2018 called "Accessing and analysing your own social media data".

If you want to find out more about the talk, here are some links:

The talk explored what personal data is, how we can download it from social media sites (in particular Facebook and Twitter), and how I learnt some basic data analysis tools in Python to look into the information I received.

WARNING: Please share this data responsibly! While this project is about downloading and analysing your own personal data, the basic premise of social media sites is that you are interacting with other people. Other people may not appreciate it if you share their own personal data as a result of this project. You may have private or sensitive information that others have shared with you in your archive.

Please take this into account when storing and sharing your social media personal data ๐Ÿ˜Ž

Requirements

This project uses various aspects of the Anaconda distribution. Anaconda is a collection of Python + R and tools for data science and machine learning. The key tools that this project currently has a dependency on are:

  • Python Version 3 or above - facebook-analysis.ipynb uses some features of Python that don't exist in Python 2 (eg os.path tools)
  • numpy
  • matplotlib
  • pandas
  • (not yet used) nltk - this is included in Anaconda but the dataset needs to be downloaded manually and will take up a few GB of space. See NLTK 3.3 Documentation: Installing NLTK Data

Personal data archives

This project requires data to look at. your OWN accouts

  • Facebook: please download JSON format data from http://facebook.com/your_information. The categories used are:
    • Likes and reactions (the facebook-analysis.ipynb code looks for this in {FACEBOOK_ARCHIVE_DIR}/likes_and_reactions/posts_and_comments.json)
    • Messages (looks for individual JSON files under {FACEBOOK_ARCHIVE_DIR}/messages/{conversation_name}/message.json)
  • Twitter: please download the tweets.csv file from https://twitter.com/settings/your_twitter_data
    • There's also an advertisers_list available from that menu - not looking at it at the moment but it looks fun.

Setup

Download the archives from the desired social media websites into a subdirectory of this folder and unzip them if necessary.

I used a folder called archive_dir/ with subdirectories for Facebook and Twitter, including a date tag for when it was downloaded. Paths to your directories can be specified in the first code block of each Jupyter Notebook.

Example directory layout:

  • Git repository folder: social-media-data/
    • Jupyter notebook files: facebook-analysis.ipynb, twitter.ipynb
    • Archives folder: archive_dir/
      • Customised archive folders: facebook-caitelatte-20180528/, twitter-caitelatte-20180528/

Running the code!

I was running this code using a Jupyter server. The Jupyter server needs to be started from the repository folder.

It's important that the Jupyter server is started from the correct folder as the notebooks use relative paths to the archive directories. Remember that you can customise the paths yourself inside the notebooks!

  1. Open a terminal or command prompt and run the following commands:

    cd (path to this social-media-data git repository)
    pwd
    # confirm that you are currently in the social-media-data repository
    jupyter-notebook
  2. The jupyter-notebook command should open a browser tab with the Jupyter Notebook's home directory. Open the facebook-analysis.ipynb or twitter.ipynb files.

    • Customise the variables in the first code block to point to the correct directories (starting from the Jupyter server's working directory).
    • Run the code blocks in order!
    • Have fun, change things, hopefully responsibly share cool fun stats about what you've found :D
      • ALSO PLEASE TELL ME IF YOU FOUND A PRIDE REACT IN YOUR FACEBOOK DATA ๐Ÿณ๏ธโ€๐ŸŒˆ

social-media-data's People

Contributors

caitelatte avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.