GithubHelp home page GithubHelp logo

dragosrotaru / ppeforfree Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 5.0 4.56 MB

Collective sensemaking for mutual aid groups manufacturing PPE during COVID.

Home Page: https://ppeforfree.org

License: GNU General Public License v3.0

HTML 7.71% CSS 5.08% TypeScript 87.21%
ppe-initiatives

ppeforfree's People

Contributors

dragosrotaru avatar elijah-ward avatar epsom-software avatar kurtvan avatar mindoodoo avatar ollie-codeaid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ppeforfree's Issues

Processing Facebook Groups Posts

Pre-requisite: Scraping Facebook Groups Posts

See #7

Requirements

  • Don't commit data, private info, credentials, etc.
  • the script should output data to JSON in the data folder with name data/facebook-posts-[timestamp].csv

How do we process the posts? IDK, up to you. Let's get creative. The purpose is to enable a megaphone feature or signal amplifier on the community. We've got to track links being shared (normalized of course, without query parameters) and track measure the virality of the content. We will want to display it on a community board that in an engaging way, contextualized with info about where it came from (maybe). Who posted it first? etc.

This can get really interesting. Can we track the propagation of information through the community? How do we display the info? What sorting features would we want? Work closely with the DataViz contributor.

Scraping Facebook Groups General Information

Note on FB Scraping, Data Privacy, Future Roadmap

See #5

Prerequisite: Seed Data

See #6

Requirements

  • Don't commit data, private info, credentials, etc.
  • write your script in a new folder "scripts/facebook-group-info-scraper
  • use any language you want. Preferably Python.
  • use conservative rate-limiting and a dynamic DOM renderer like selenium or Puppeteer.
  • the script should get FB_USERNAME, FB_PASSWORD + MongoDB credentials via .env file
  • the script should get group ids from #6 (see comments)
  • the script saves data to local MongoDB instance (see schema below)

Scraping Facebook Groups General Information

We need data on all the Facebook groups in the community.

The data available on public FB groups (not including content like posts, pics, events, etc) I have found by manually going through 2 FB group pages includes:

Note: I compiled this by manually going through 2 FB group pages, please go through a few more pages yourself to see if some groups have more, less or differing public data available and we will update our schema

  • id
  • name
  • isPublic
  • description
  • foundedOn
  • memberCount
  • adminCount
  • moderatorCount
  • memberCountIncreaseWeekly
  • postCountIncreaseMonthly
  • postCountIncreaseDaily
  • moderatorList, adminList, memberList, pageList (pages can be in a group! these are lists of ids)

We will not get any other information about individuals other than their facebook id. This data is needed because we want to see how connected groups are (how many individuals they have in common) and we want to reach out to those individuals that are in a shit ton of groups! Very useful for coalition-building

Scraping Posts

I started a script in scripts/facebook-group-posts-scraper using this library:
https://github.com/kevinzg/facebook-scraper

It works well! But! We NEED to collect the timestamp on all the posts. It doesnt work with 100% consistency, you will have to troubleshoot. We will use this data to make a news aggregator and to keep an eye out for more data for coalition-building purposes.

How your script will store and normalize the data

Database will be MongoDB

Schema

type Group = {
  id: UUID,
  name: string,
  foundedOn: TimeStamp,
  public: boolean,
  description: string,
  memberCount: number,
  adminCount: number,
  moderatorCount: number,
  memberCountIncreaseWeekly: number,
  postCountIncreaseMonthly: number,
  postCountIncreaseDaily: number,
  memberList: UUID[],
  adminList: UUID[],
  moderatorList: UUID[],
  pageList: UUID[],
  scrapedAt: TimeStamp,
  scrapeID: UUID,
}

Misc

Random lib I found: https://github.com/ParvJain/Facebook-Group-Scraper (please look through)

Facebook Scraping, Data Privacy, Future Roadmap

As it stands:

  1. I will run the scrapers daily on my computer using my login credentials.
  2. Not all data scraped will be public. Processing scripts will produce publishable data that I will upload to the repo in JSON format for the front-end to use. This is v0.1 of our API.
  3. The original scraped data will be stored on my machine until we have cloud infra in place.
  4. Collaborators can get access to the full dataset by asking for it.

Scraping will be phased out by:

  • getting Public Group API access via FB Dev App program
  • getting Group Admins to "claim" their Facebook page on our site by authorizing our app.

Help / Get Involved

We should have a "help" page that offers multiple ways to engage:

  • Advocate - Share us on social media
  • Local Supplier - Join our Mailing List for updates (need to create a Mailchimp list)
  • Other - Contact Us (see #31 )
  • Join the Team - (see on-boarding process v1.1 in map)

we need to clearly explain how these different types of website visitors can get engaged.

Processing Facebook Groups General Information

Pre-requisite: Scraping Facebook Groups General Information

See #8

Requirements

  • Don't commit data, private info, credentials, etc.
  • write your script in a new folder "scripts/facebook-groups-info-processor
  • use any language you want. Preferably Python or NodeJS/TypeScript. (please)
  • the script should get MongoDB credentials via .env file
  • the script should output data to JSON in the data folder with name data/facebook-posts-[timestamp].csv

How do we process the groupa? IDK, up to you. Let's get creative. The purpose is to enable a community map and directory. Work closely with DataViz.

Make the Scraper Reliable

  • Implement partial scraping - recent members only, detect field change frequency, prioritize based on group size, history of growth
  • don't do batch jobs, let the scraper run as a chron job or background task
  • use counts vs list.length to detect issues
  • save stderr and scraper parameters
  • connect to existing browser or save session (no relogin)
  • make auto-scrolling work by waiting for network idle
  • create phony public and private groups for integration tests
  • deploy scraper on multiple personal computers (Scraping@Home V0.0.1)
  • use exponential backoffs

Group Locations (Array of Cities) to Latitude and Longitude

  • Connect Google Maps Geocoding API
  • Create a function that takes an array of locations, geocodes them, and gives you the Latitude and Longitude of the middle of the smallest jurisdiction that covers them all (are all the cities in the same state? same county? same country)

Scraping Facebook Groups Posts

Note on FB Scraping, Data Privacy, Future Roadmap

See #5

Pre-requisite: Seed Data

See #6

Requirements

  • Don't commit data, private info, credentials, etc.
  • write your script in a new folder "scripts/facebook-group-posts-scraper
  • use any language you want. Preferably Python or NodeJS/TypeScript. (please)
  • use conservative rate-limiting and a dynamic DOM renderer like selenium or Puppeteer.
  • the script should get FB_USERNAME, FB_PASSWORD + MongoDB credentials via .env file
  • the script should get group ids from #6 (see comments)
  • the script saves data to local MongoDB instance (see schema below)

Scraping Facebook Groups Posts

We will use this data to make a news aggregator and to keep an eye out for more data for coalition-building purposes.

I started a script in scripts/facebook-group-posts-scraper using this library:
https://github.com/kevinzg/facebook-scraper

It works ok, but it doesn't work with 100% consistency, you will have to troubleshoot and maybe edit the script.

How your script will store and normalize the data

Database will be MongoDB

Schema

type Post = {
  id: UUID,
  createdAt: TimeStamp,
  text: string,
  link: URL,
  likes: number,
  shares: number,
  comments: number,
  groupID: UUID,
  scrapedAt: TimeStamp,
  scrapeID: UUID,
}

Misc

Random Lib I found: https://github.com/ParvJain/Facebook-Group-Scraper (please look through)

Visualizing Facebook Posts (News Feed)

Pre-requisites

See #9

We need a page on the website with a high-level view of all the Facebook posts and links being shared globally. This is the community MegaPhone or Signal Amplifier.

This could be as simple as a Reddit style News Feed. But we can also visualize where original ideas are coming from? What group leads the community by being the first to post fresh information. ETC.

New Home Page

Our website right now is very cluttered and confusing when we drop in the directory right away.

Prusa Scraper

The Prusa site runs on a GraphQL Endpoint. We want to scrape user IDs (46k) and Full Group Details (A few hundred). Don't publish this data, We are going to scrape the user ids so we can send them 1 message within the Prusa platform asking if they know of any local initiatives. Then we can process any links they reply to us with. Then if they are responsive, we provide them with a link to our site and thank them.

Visualizing Facebook Groups General Information

Pre-requisites

See #10

We need a page on the website with a high-level view of all the Facebook Groups.

Ideas for Views:

  • A Global Dashboard
  • A Map View (location needs to be coordinated with data-processing / scraping contributors)
  • A Table View
  • A Graph View with node size showing the number of members and edge length showing the number of members in common.
  • A Detail view with graphs over time showing an individual group's growth, etc.

Seed Data

Seed Data

OSCMS has a roster on google sheets I have gone through and grabbed every Facebook group (and page) id from (187 total). Roster:

https://docs.google.com/spreadsheets/d/1JH5uL3WW6PwvwFRe4wqXkheK0-jcGYqaPmb9J3Dr6Ac/edit?fbclid=IwAR3FX_xPe-bYbXQmjsXF5FUr7aISp27wGwHXuNIWzh92ScdQQSgVVrbixBo#gid=179139280

The data is available in data/facebook-group-ids-unclean.txt

Not all ID's are for FB groups though, so we need to pre-process them. Salty_Steve wrote a script to do that in Python but it doesn't work for all. Maybe it just needs rate-limiting implemented. See scripts/facebook-group-id-validator.

fix scripts/facebook-group-id-validator and product a clean file of group IDs called data/facebook-group-ids.txt. This is critical, please manually check your work. We need clean data.

Facebook Group Node Graph

A graph (as in graph theory) where vertices represent Groups. The radius of a vertex represents the size of the group, and edge lengths represent the number of members these groups have in common (closer means more members in common).

SEO, Social Media, Branding

  • name (bigger discussion will be had in discord)
  • logo
  • secure socials
  • meta tags
  • google tag manager
  • google analytics
  • search console
  • setup github org

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.