skshub-data's Issues
Link organizations and activities via an organization's legal name
As a user of this platform, I want to know which organization conducted which activities and vice versa in order to have additional information about the activity or organization.
Currently, organizations and activities are only linked using their Business Number. Linking organizations and activities via their legal name will increase the number of results that are connected given the low number of BN the data currently has.
Overhaul and recreate data cleaning process for activities & entities
As a developer of the data, I would like to make sure that there is a robust data cleaning processes used to ensure all the data is clean when uploaded, and that no records are missing due to insufficient data cleaning efforts.
More details:
There are up to 200k activity records missing from the hub as they didn't correctly upload to Postgres, and therefore aren't in the search engine (which was done to avoid pages redirecting to nowhere). Instead, a robust data cleaning effort should be done to make sure those 200k records can be uploaded correctly, and ideally in bulk using one CSV instead of the current process (which uploads them line-by-line and is particularly slow on the DigitalOcean-hosted database)
Deliverable: A fully cleaned & uploadable CSV of all 565k activities
Another minor consideration that is related to date cleaning:
There is a particular case when the all program data point output is "Charity provided description when other program areas are not applicable", this output should be changed to "Not Available" (unless we can find this 'description' somewhere else?)
Consider impact of adding data from other countries
Add readme file to SKS Hub Roadmap and update the readme's on the other related repos
Add recipient type filed to activities data set
As a user of the platform, it is important to know the type of recipient that is receiving the grant as it allows me to better understand the kinds of activities being conducted.
The Grants and Contributions data contains a "recipient field" type that identifies if the grant recipients are: aboriginal recipients, For-profit organizations, government, international (non-government), nonprofit organizations and charities, individuals, or academia. This information could be useful for certain use cases and could help narrow down results. This field would be added to the activities dataset.
Produce documentation
Add linkages (website text) as 3rd CSV on Github data repository
As a user of the data repository for the SKS project, I would like to access the website text information to read it and conduct further analysis.
The CSV should contain these fields: Organization legal name, Business number, website URL, website text and SKS hub ID
Develop the linkages data model
As a developer of the SKS project, I would like to have a clear and identified Linkages data model in order to continue my work on the Linkages CSV and integrate this data into the interface.
More work is required on developing & implementing the linkages data model. Currently the only part of this that is finished is the web scraper but it isn't fully integrated in a similar way to activities and entities, so it would be best to start up a "process_linkages.py" script or similar that operates similarly to the other data types.
Here is the work so far on the Linkages data model
Deliverable: Scripts that integrate the Linkages data model in a similar way to activities (process_activities.py) and entities (process_entities.py)
Create a scraper for documents and websites
As a developer of the SKS project, creating a scrapper to search for and find documents or website URLs would help to populate the linkages and documents datasets.
This could be a multipurpose scrapper or could just be 2 different scrappers.
- We need to find more organizations' websites and then scrape their website text (This could also be an identifier of websites - take a list of names and find a website.)
- We need to potentially find documents
Explore finding and adding documents
As a user of the platform, having documents that describe in more depth the activities being conducted or give you more information about the organization would help to supplement the information from the data.
To go forward, a web scraper will be needed to search for and download these documents, then store them in something like an S3 bucket so they can be accessed from the hub. More work is required in planning this before work is started.
Here is the work done so far on incorporating documents into the data model
Investigate making a French-compatiable data model
As a developer of the data models, it would be good to have a plan of how we could attempt to make a French-compatible data model and potentially an interface down the line.
Fix the "Back to Results" button
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.