GithubHelp home page GithubHelp logo

skshub-data's Introduction

Sector Knowledge Sharing (SKS) Project Data

Overview

For a more detailed overview of this project, please go to our SKS Hub Roadmap. The Roadmap has a more detailed readme, in addition to showing the upcoming updates, features and the known bugs/issues of the SKS Hub.

You can also find our SKS Hub interface code in this repository and the back-end code in this repository

This project: Ajah wants to explore what shared infrastructure could look like for the nonprofit sector. We are looking to build a common, open-source infrastructure for the information already available about the sector, focusing on existing open data sources about nonprofit organizations, grants, and evaluations. Our goal is to make this information more accessible and usable to anyone in the sector.

The CSVs found here will become the building blocks of a data infrastructure prototype for the nonprofit sector. We hope the prototype will show the benefits of open, sharable and organized data and allow different stakeholders to better understand the organizations that make up our sector and the activities and programs they run.

The data we are currently using comes from the federal government, and thus only allows us to have access to the grants given by the government. The prototype is meant to showcase how data infrastructure can help facilitate research for the sector. We hope that the success of the prototype will convince private foundations to share their data to gain a better picture of the philanthropic landscape in Canada.

  • Entities: This CSV contains information about individual organizations in the nonprofit sector. The data includes all organizations that are registered with the CRA as a Public Foundation, Private Foundation or Charitable Organization.
  • Activities: This CSV contains grant data from the federal Grants & Contributions database. This data allows us to understand more about the activities and programs carried out by organizations in the nonprofit sector.
  • Linkages: This CSV contains the corpus text of those organizations that had a listed website URL. The text was scrapped directly from the organization's website. As we further develop the database, we hope to include other links connected to the organizations such as social media handles and links.

Download

  • To download, navigate to the "data" folder above and select a CSV.
  • From there, click "Raw" (the page will load the raw CSV output)
  • In your browser, save the page with the extension ".csv"
    • For example to download the entities data in Chrome, select 'File' > 'Save Page As' and type "entities.csv" as the filename before saving
  • Your download should complete in less than one minute

Data Sources

Canada Revenue Agency - 2019 List of charities

3.1 Identification

3.6 General Information

3.7 Financial Data

3.13 Compensation

3.2 Charity Contact Web Addresses

Treasury Board of Canada Secretariat - Proactive Disclosure - Grants and Contributions

Proactive Disclosure - Grants and Contributions

Data Dictionary

Entities

BN: Business Number issued by the Canada Revenue Agency (CRA) - [Dataset: 3.1 Identification & Field: BN]

FPE: Fiscal period end of the organizaton - Dataset: 3.6 General Information & Field: FPE

Focus Area: A category broadly defining the sector in which the organization operates, using the 3 programe area codes concatenated into one - [Dataset: 3.6 General Information & Fields: Program #1 Desc, Program #2 Desc, Program #3 Desc]

Legal Status: Active or Inactive (organization has been wound-up, dissolved or terminated) - [Dataset: 3.6 General Information & Field: 1570]

Name: Legal name of organization - [Dataset: 3.1 Identification & Field: Legal Name]

Location municipality: City based on organization's mailing address - [Dataset: 3.1 Identification & Field: City]

Location Postal Code: Postal Code based on organization's mailing address - [Dataset:3.1 Identification & Field: Postal Code]

Legal Designation Type: Designation code (Public Foundation, Private Foundation or Charitable Organization) - [Dataset: 3.1 Identification & Field: Designation]

Location Region: Province based on organization's mailing address - [Dataset: 3.1 Identification & Field: Province]

Location Country: Country based on organization's mailing address - [Dataset: 3.1 Identification & Field: Country]

Revenue: Organizations' total revenue in the given fiscal year - [Dataset: 3.7 Financial Data & Field: 4700]

Employees: Number of permanent, full-time, compensated positions - [Dataset: 3.13 Compensation & Field: 300]

Website: URL of the organization’s website - [Dataset: 3.2 Charity Contact Web Addresses & Field: Contact URL]

Ent SKS ID: Unique ID for use by entities in the Sector Knowledge Sharing (SKS) project only - [Dataset: N/A & Field: Automatically assigned value]

Regulating Authority: This defines where the record was sourced from, all Canadian data is sourced from the CRA - [Dataset: N/A & Field: Automatically assigned value]

Revenue Currency: The currency of the revenue - [Dataset: N/A & Field: Pull from a standard ISO - CAD (124)]

Revenue Year: Year that the revenue given was reported - [Dataset: Dataset: 3.6 General Information & Field: Year of the FPE]

Data Source: URL of where the data was obtained, it is the same for every organization and changes yearly - [Dataset: N/A & Field: Automatically assigned value]

Legal Status Date: Year that the legal status of the organization was recorded (year T3010 form was filed) - [Dataset: 3.6 General Information & Field: Year of the FPE]

Activities

Act SKS ID: Unique ID for use by activities in the Sector Knowledge Sharing (SKS) project only - [Field: Automatically assigned value]

Source ID: ID assigned to the record by the source authority, for the Grants and Contribution data this is the reference number that is populated by each department - [Field: Ref Number]

Source Authority: This defines where the record was sourced from, all Canadian data is sourced from the Proactive Disclosure Grants and Contributions - [Field: Automatically assigned value]

Source URL: URL of where the data was obtained, which is the same for every organization - [Field: Automatically assigned value]

Grant Title: The title of the project or agreement that the recipient is undertaking (In cases where there is no title, the agreement number will be duplicated here) - [Field: Agreement Title]

Funding Amount: The dollar amount given to the organization as stated in the grant or contribution agreement - [Field: Agreement Value]

Funding Type: Indicates what the funding amount corresponds to. For Canadian data this is "Amount Applied For". - [Field: Automatically assigned value of "Amount Applied For"]

Funder: The distinct organization issuing the grant - [Field: Owner Org]

Recipient Organization: The legal name of the recipient organization - [Field: Recipient Legal Name]

Recipient ID: A unique and legal identifier for the recipient Organization, for the Grants and Contribution data this is the Business Number issued by the Canada Revenue Agency (CRA) - [Field: Recipient Business Number]

Grant Description: The description explains why the recipient received funding, what the recipient is undertaking and describes the activities or objectives the recipient organization hopes to achieve with the funds - [Field: Description (English)]

Funder ID The legal ID corresponding to the name of the Funder (currently blank)

Grant Region: Province based on recipient organization's mailing address - [Field: Recipient Province]

Grant Municipality: City based on recipient organization's mailing address - [Field: Recipient City]

Date: The assumed start of the agreement, or when the project is supposed to begin, as captured in the initial agreement - [Field: Agreement Start Date]

Date Type: Indicates the Date field reflects the "Agreement Start" date - [Field: Automatically assigned value of "Agreement Start"]

End Date: The assumed end of the agreement, or when the project is supposed to end, as captured in the initial agreement - [Field: Agreement End Date]

End Date Type: Indicates the Date field reflects the "Agreement End" date - [Field: Automatically assigned value of "Agreement End"]

Expected Results: The assumed final results the recipient organization aims to achieve with the given funds - [Field: Expected Results (English)]

Actual Results: This data is currently not available.

Program Name: The name of the program under which the funds are issued - Field [Program Name (English)]

Ent SKS ID: If the activity is associated with an entity, this is the unique ID of the corresponding entity for use in the Sector Knowledge Sharing (SKS) project only

Linkages

External ID: Business Number issued by the Canada Revenue Agency (CRA) - [Dataset: 3.1 Identification & Field: BN]

Name: Legal name of the organization - [Dataset: 3.1 Identification & Field: Legal Name]

Website: URL of the organization’s website - [Dataset: 3.2 Charity Contact Web Addresses & Field: Contact URL]

Website Text: Text scrapped from the organization’s website - [Done using a scrapper]

Ent SKS ID: This is the unique ID of the corresponding entity for use in the Sector Knowledge Sharing (SKS) project only

skshub-data's People

Contributors

brittwitham avatar dani-ajah avatar

Watchers

 avatar  avatar  avatar

skshub-data's Issues

Add recipient type filed to activities data set

As a user of the platform, it is important to know the type of recipient that is receiving the grant as it allows me to better understand the kinds of activities being conducted.

The Grants and Contributions data contains a "recipient field" type that identifies if the grant recipients are: aboriginal recipients, For-profit organizations, government, international (non-government), nonprofit organizations and charities, individuals, or academia. This information could be useful for certain use cases and could help narrow down results. This field would be added to the activities dataset.

Develop the linkages data model

As a developer of the SKS project, I would like to have a clear and identified Linkages data model in order to continue my work on the Linkages CSV and integrate this data into the interface.

More work is required on developing & implementing the linkages data model. Currently the only part of this that is finished is the web scraper but it isn't fully integrated in a similar way to activities and entities, so it would be best to start up a "process_linkages.py" script or similar that operates similarly to the other data types.

Here is the work so far on the Linkages data model

Deliverable: Scripts that integrate the Linkages data model in a similar way to activities (process_activities.py) and entities (process_entities.py)

Explore finding and adding documents

As a user of the platform, having documents that describe in more depth the activities being conducted or give you more information about the organization would help to supplement the information from the data.

To go forward, a web scraper will be needed to search for and download these documents, then store them in something like an S3 bucket so they can be accessed from the hub. More work is required in planning this before work is started.

Here is the work done so far on incorporating documents into the data model

Overhaul and recreate data cleaning process for activities & entities

As a developer of the data, I would like to make sure that there is a robust data cleaning processes used to ensure all the data is clean when uploaded, and that no records are missing due to insufficient data cleaning efforts.

More details:
There are up to 200k activity records missing from the hub as they didn't correctly upload to Postgres, and therefore aren't in the search engine (which was done to avoid pages redirecting to nowhere). Instead, a robust data cleaning effort should be done to make sure those 200k records can be uploaded correctly, and ideally in bulk using one CSV instead of the current process (which uploads them line-by-line and is particularly slow on the DigitalOcean-hosted database)

Deliverable: A fully cleaned & uploadable CSV of all 565k activities

Another minor consideration that is related to date cleaning:
There is a particular case when the all program data point output is "Charity provided description when other program areas are not applicable", this output should be changed to "Not Available" (unless we can find this 'description' somewhere else?)

Link organizations and activities via an organization's legal name

As a user of this platform, I want to know which organization conducted which activities and vice versa in order to have additional information about the activity or organization.

Currently, organizations and activities are only linked using their Business Number. Linking organizations and activities via their legal name will increase the number of results that are connected given the low number of BN the data currently has.

Add linkages (website text) as 3rd CSV on Github data repository

As a user of the data repository for the SKS project, I would like to access the website text information to read it and conduct further analysis.

The CSV should contain these fields: Organization legal name, Business number, website URL, website text and SKS hub ID

Create a scraper for documents and websites

As a developer of the SKS project, creating a scrapper to search for and find documents or website URLs would help to populate the linkages and documents datasets.

This could be a multipurpose scrapper or could just be 2 different scrappers.

  • We need to find more organizations' websites and then scrape their website text (This could also be an identifier of websites - take a list of names and find a website.)
  • We need to potentially find documents

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.