ajah / skshub-data Goto Github PK

skshub-data's Issues

Link organizations and activities via an organization's legal name

As a user of this platform, I want to know which organization conducted which activities and vice versa in order to have additional information about the activity or organization.

Currently, organizations and activities are only linked using their Business Number. Linking organizations and activities via their legal name will increase the number of results that are connected given the low number of BN the data currently has.

Overhaul and recreate data cleaning process for activities & entities

As a developer of the data, I would like to make sure that there is a robust data cleaning processes used to ensure all the data is clean when uploaded, and that no records are missing due to insufficient data cleaning efforts.

More details:
There are up to 200k activity records missing from the hub as they didn't correctly upload to Postgres, and therefore aren't in the search engine (which was done to avoid pages redirecting to nowhere). Instead, a robust data cleaning effort should be done to make sure those 200k records can be uploaded correctly, and ideally in bulk using one CSV instead of the current process (which uploads them line-by-line and is particularly slow on the DigitalOcean-hosted database)

Deliverable: A fully cleaned & uploadable CSV of all 565k activities

Another minor consideration that is related to date cleaning:
There is a particular case when the all program data point output is "Charity provided description when other program areas are not applicable", this output should be changed to "Not Available" (unless we can find this 'description' somewhere else?)

Consider impact of adding data from other countries

Add readme file to SKS Hub Roadmap and update the readme's on the other related repos

Add recipient type filed to activities data set

As a user of the platform, it is important to know the type of recipient that is receiving the grant as it allows me to better understand the kinds of activities being conducted.

The Grants and Contributions data contains a "recipient field" type that identifies if the grant recipients are: aboriginal recipients, For-profit organizations, government, international (non-government), nonprofit organizations and charities, individuals, or academia. This information could be useful for certain use cases and could help narrow down results. This field would be added to the activities dataset.

Produce documentation

Add linkages (website text) as 3rd CSV on Github data repository

As a user of the data repository for the SKS project, I would like to access the website text information to read it and conduct further analysis.

The CSV should contain these fields: Organization legal name, Business number, website URL, website text and SKS hub ID

Develop the linkages data model

As a developer of the SKS project, I would like to have a clear and identified Linkages data model in order to continue my work on the Linkages CSV and integrate this data into the interface.

More work is required on developing & implementing the linkages data model. Currently the only part of this that is finished is the web scraper but it isn't fully integrated in a similar way to activities and entities, so it would be best to start up a "process_linkages.py" script or similar that operates similarly to the other data types.

Here is the work so far on the Linkages data model

Deliverable: Scripts that integrate the Linkages data model in a similar way to activities (process_activities.py) and entities (process_entities.py)

Create a scraper for documents and websites

As a developer of the SKS project, creating a scrapper to search for and find documents or website URLs would help to populate the linkages and documents datasets.

This could be a multipurpose scrapper or could just be 2 different scrappers.

We need to find more organizations' websites and then scrape their website text (This could also be an identifier of websites - take a list of names and find a website.)
We need to potentially find documents

Explore finding and adding documents

As a user of the platform, having documents that describe in more depth the activities being conducted or give you more information about the organization would help to supplement the information from the data.

To go forward, a web scraper will be needed to search for and download these documents, then store them in something like an S3 bucket so they can be accessed from the hub. More work is required in planning this before work is started.

Here is the work done so far on incorporating documents into the data model

Investigate making a French-compatiable data model

As a developer of the data models, it would be good to have a plan of how we could attempt to make a French-compatible data model and potentially an interface down the line.

ajah / skshub-data Goto Github PK

skshub-data's Issues

Link organizations and activities via an organization's legal name

Overhaul and recreate data cleaning process for activities & entities

Consider impact of adding data from other countries

Add readme file to SKS Hub Roadmap and update the readme's on the other related repos

Add recipient type filed to activities data set

Produce documentation

Add linkages (website text) as 3rd CSV on Github data repository

Develop the linkages data model

Create a scraper for documents and websites

Explore finding and adding documents

Investigate making a French-compatiable data model

Fix the "Back to Results" button

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs