opensource-observer / oso Goto Github PK
View Code? Open in Web Editor NEWMeasuring the impact of open source software
Home Page: https://opensource.observer
License: Apache License 2.0
Measuring the impact of open source software
Home Page: https://opensource.observer
License: Apache License 2.0
We should be using the supabase dev database when developing, not working off prod.
package.json scripts for starting up a database and README instructions on how to use it
N/A
A 1-pager that concisely proposes a science retrofunding program
https://github.com/hypercerts-org/hypercerts/blob/main/os-observer/src/events/github_events.py
Thanks Carl for getting this started!
This needs to be rewritten as idempotent Typescript functions that can be plugged into the os-observer CLI and library. See here for how that's set up: hypercerts-org/hypercerts#741
If we write it in TS connected to this harness, it should automatically work in the following environments:
Sticking to Python
Currently starting with commit histories to the main branch in a project repo:
Next steps:
Sometimes (e.g. common ML libraries) that README files or documentation will link to the academic papers that inspired it.
It would be nice to be able to measure citations between software and papers, as easily as we measure citations among papers
TBD
TBD
Would love for our data fetching scripts to run on the regular in GitHub Actions CI.
This includes:
N/A
It should be as easy to integrate the selling of OSS impact certificates, as it is to sell carbon credits today (which many retailers do as part of their checkout experience)
TBD
TBD
I'll be talking about hypercerts for science at the Metascience conference.
https://metascience.info/
Prep the talk
We should produce a README badge for GitHub repos that conveys a project's impact. Some proposals of what it might show:
This number should convey social meaning, ideally more than GitHub stars does today.
A dynamic image server that renders the badge based on reverse dependency data.
TBD
A 1-2 page description that captures the objectives, key activities and success metrics of this initiative between now and FtC
Wilbur Takeaways: https://docs.google.com/document/d/19HFUijQ5LbwV6zoMyHAlx_f-nFVhkotFmD1GuVW2j3Q/edit
No response
Want the CLI to be able to scan the EventSourcePointer table and fetch all data that's marked for autocrawl.
a new yargs command called autocrawl
that just automatically scans the EventSourcePointer table, selects all rows marked for autocrawl, and run the command with args.
This can be run in a scheduled GitHub actions job periodically
N/A
We'll probably want to move the Sentry integration behind this
Also makes it easy to add FullStory, Amplitude etc.
N/A
This is going to take a bit more thought, but we might want to cache some pre-computation, to reduce the load on render.
As a naive strawman, we should check if we can run queries that group by date.
If not, should we pre-compute that?
Probably we want to wait on this till after
#59
TBD
TBD
This is already in progress, just filing an issue to track it
Please comment on this issue with pre-made analyses that could be interesting (okay to try and cull later)
To get it started:
Exploring reverse dependency data is probably pretty useful just as a standalone UX.
We should enumerate some design ideas, but here are some to start:
TBD
For a particular piece of software, we might want to see if it has certain capabilities, which could be:
These could be measured via unit tests or integration tests in GitHub actions
TBD
We're now using Prisma as our ORM. We have been running migrations locally against prod, which isn't great.
This issue depends on finishing this first
#51
Let's have GitHub actions run migrations via CI/CD
N/A
Remove those 2 columns from the schema.
We should be able to infer the args from the Artifact relation
Keep as is
Currently ingesting CSV snapshots that utilize Zerion's interface.
Next steps:
We should first look to see if this data set already exists. If not, we'll need to build it ourselves and store it in our database
First comment on this issue with resources that might be related
Then design a schema for how we'd want to store this data in our database.
TBD
There's 2 things we care about coming from scientific papers:
Using some NLP/ML solution for pulling out that data
TBD
They've done a really good job of exposing package versioned dependencies via API
Maintained by Google
Not a decision, but a placeholder to discuss. Yarn takes a full 2 minutes now in the monorepo.
Some considerations to take into account in this decision:
yarn
=> pnpm
Stick with yarn
or go with npm
Getting
We want to be able to easily migrate the database to new schemas as we iterate on data sources
Jason recommended Prisma
https://www.prisma.io/
https://prisma-client-py.readthedocs.io/en/stable/
TypeORM
Basic UI:
This can be done pretty easily with something like Plotly, once that database is in place.
We can use tracing libraries or code coverage libraries to see what pathways of code are actually being run. When you combine that with reporting infrastructure, you can create a heatmap of which lines of code are used more than others.
Not a perfect measure of importance, but could be interesting. Thanks Nick for the idea!
Need to brainstorm other ways to get at academic value pathways
There's an official embed API
https://help.tableau.com/current/api/embedding_api/en-us/docs/embedding_api_about.html
Worth seeing if that can work
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.