GithubHelp home page GithubHelp logo

denoland / deno_registry2 Goto Github PK

View Code? Open in Web Editor NEW
92.0 92.0 14.0 2.01 MB

The backend for the deno.land/x service

Home Page: https://deno.land/x

License: MIT License

Makefile 0.16% TypeScript 62.74% Dockerfile 0.48% Shell 2.58% HCL 34.03%
deno

deno_registry2's Introduction

deno_registry2

This is the backend for the deno.land/x service.

Limits

There are a few guidelines / rules that you should follow when publishing a module:

  • Please only register module names that you will actually use.
  • Do not squat names. If you do, we might transfer the name to someone that makes better use of it.
  • Do not register names which contain trademarks that you do not own.
  • Do not publish modules containing illegal content.

Additionally to these guidelines there are also hard limits:

  • You can not publish more than 3 different modules from a single repository source.
  • You can not publish more than 15 modules from a single GitHub account or organization.

If you need an increase to these quotas, please reach out to [email protected].

Requirements

Preparing Docker

Make sure to follow the official instructions to login to ECR via the Docker cli - this is needed to push the images used by the Lambda deployment to ECR.

aws ecr get-login-password --region region | docker login --username AWS --password-stdin aws_account_id.dkr.ecr.region.amazonaws.com

Preparing MongoDB Atlas

  1. Create an API key on MongoDB Atlas. The API key should have sufficient privileges to create a new project and configure it afterwards.

Deploy

  1. Install aws CLI.
  2. Sign in to aws by running aws configure
  3. Install Terraform version 0.13 or higher
  4. Copy terraform/terraform.tfvars.example to terraform/terraform.tfvars
  5. Modify terraform/terraform.tfvars: set mongodb_atlas_org_id to your MongoDB Atlas organization ID, and update mongodb_atlas_private_key and mongodb_atlas_public_key with the API key you created earlier.
  6. Move to the terraform/ and comment out the backend section in the meta.tf file (important for first-time apply)
  7. Run the following steps:
terraform init
terraform plan -var-file terraform.tfvars -out plan.tfplan
terraform apply plan.tfplan
aws s3 ls | grep 'terraform-state' # take note of your tf state bucket name
# before the final step, go back and remove the comments from step 5
terraform init -backend-config "bucket=<your-bucket-name>" -backend-config "region=<aws-region>"

Setting up MongoDB

Terraform automatically provisions a MongoDB cluster in a separate project.

  1. In the newly created MongoDB cluster, create a database called production.
  2. In this database create a collection called modules.
  3. In this collection create a new Atlas Search index with the name default and the mapping defined in indexes/atlas_search_index_mapping.json
  4. In this collection create a new index with the name by_owner_and_repo like it is defined in indexes/modules_by_owner_and_repo.json
  5. In this collection create a new index with the name by_is_unlisted_and_star_count like it is defined in indexes/modules_by_is_unlisted_and_star_count.json
  6. In this database create a collection called builds.
  7. In this collection create a new unique index with the name by_name_and_version like it is defined in indexes/builds_by_name_and_version.json

Teardown

Before destroying your staging environment, make sure to:

  1. run terraform state pull to make a local copy of your state file
  2. comment out the backend section of the meta.tf file
  3. re-initialize your terraform workspace by running terraform init -backend-config "region=<aws-region>"
  4. make sure you empty your s3 buckets, otherwise the destroy will fail

You can then run terraform destroy to completely remove your staging environment.

Development

To run tests locally, make sure you have Docker and docker-compose installed. Then run:

make test

deno_registry2's People

Contributors

bartlomieju avatar christophgysin avatar crowlkats avatar kitsonk avatar lucacasonato avatar piscisaureus avatar ry avatar ultirequiem avatar wperron avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deno_registry2's Issues

Support for adding modules from specific GitHub branches?

I'd like to convert some of my npm packages so they're Deno-compatible and publish them on https://deno.land/x. The way I'd like to do that is by creating a deno branch for the Deno version of the package -- that would hopefully allow me to keep both the Node and Deno versions in sync.

Is there a way to add a specific branch of a GitHub repository to the registry?

Improve module search

Successor to #56

Problem

  1. The search results are not biased by star count. Some examples:
    • term: discord: the highest star count module is only on place 6.
  2. There are no filtering or alternative sorting capabilities (last uploaded at, supports typescript, no external imports)
  3. Search is not always very relevant to what was searched.
  4. No filtering by module tags

Solution

  1. We need to include more data in the documents being searched (github tags, last upload date, and other stats/metrics).
  2. Use a better search engine. I would like to use MeiliSearch because of the search customizability and API (cc @erlend-sh). We need to find a way to host it / get access to a hosted instance. This would also involve pushing updates to the documents in the database to the search engine. This can be accomplished relatively easily.

Support for Multi Registry Module Publishing

Hi Ryan, William and Luca.

Could you think of a simple and sustainable option on how module developers could choose additional module registries during the webhook / release based publishing process?

I think many of us want to publish many of their modules e.g. on https://deno.land/ AND on https://nest.land/

Both mechanisms are super straight forward from my perspective. At the same I'd find it even cooler if deno.land offered an option to "register registries" from which the module publisher could choose where to publish the module additionally to deno.land.
The process behind your webhook could imo call the corresponding APIs of the other - chosen - registries.

This could be an additional incentive for module publishers to use deno.land as primary registry which I believe is great for many consumers who can with that have a promising single point of entry where to start their search for specific features / deno modules.

Kind regards
Michael

Can you include the owner name or repository name in the GET /modules response?

About GET /modules

https://github.com/denoland/deno_registry2/blob/a6adde77090a64d7cf1bc594493b6ecd34d671b9/API.md#response-1

Reason

The current response content cannot uniquely identify the repository on Github.

I'm having trouble

I've created a ranking page based on database.json before,
and I want to switch to the current modules API,
but cannot know repository details from module contents.

Maybe

It may be good to create an API that returns detailed module info.

Breaking Denoify

Hi,

The new mechanism is better indeed but it breaks Denoify.
that was heavily reliant on database.json.
The new REST API does allow us to list third party modules but does not allows getting the source GitHub repo or the latest TAG ( which is now the fallback version ).

Is there any play to make the module registry consumable by third party services or do I have to find workarounds?

Regards

Publishing is getting stuck in analyzing_dependencies

A build with these options is getting hung up (build times out after 5 minutes) in analyzing_dependencies:

{
  "type": "github",
  "moduleName": "functional",
  "repository": "sebastienfilion/functional",
  "ref": "v0.5.0",
  "version": "v0.5.0",
  "subdir": "library/"
}

The build ID is 5f591efe00791e5b001a6c92.

I am investigating.

Uploading some modules fails

Currently some file uploads fail with the error with error trying to connect: Connection reset by peer (os error 104). I am not 100% sure what causes this. I assume it is because we are sending requests to S3 too quickly.

Previously we also had issues with DNS rate limiting. I was able to mostly resolve this by introducing a upload rate limit of 6 files a second.

I think both issues could be tackled by reusing the HTTP connections to S3. This is something that needs to be fixed upstream in Deno. It should also increase the overall speed of uploading, because many uploads can be multiplexed onto a single HTTP/2 connection and the client side rate limit can be removed.

Initial release

This is a summary of things that need to be done for the initial release:

  • Validate requests coming from GitHub (#7)
  • Search API (#11)
  • base path parameter (required for std) (#12)
  • Get std working (version prefix) (#13)
  • Upload all database.json modules with all released versions as a batch job
  • Setup Cloudflare to proxy api.deno.land to the AWS API Gateway domain
  • Add limits (#19)

Moderation filters

We should automatically moderate the names of modules people are uploading. I think we can start with these three steps (ordered by priority):

  1. Add a list of reserved module names that can not be registered automatically. Easiest would be a json file with an array of disallowed names. (@lucacasonato)
  2. Check any new module name against a list of 'bad' words. We need to find a list to use (https://www.cs.cmu.edu/~biglou/resources/bad-words.txt is not good as it blocks words everyday words like color, queer, or africa). (up for grabs)
  3. Disallow any module names that have a levenshtein distance of less than 3 to any other existing module name, bad word, or reserved module name. (up for grabs)

About the ranking system

image

The first 8 third party repo do not provide type definitions, are not indexed by a mod.ts files, most do not have Deno specific instructions, some have no README.md at all.

It must be a put-off for people wanting to use the third party module repository.
Besides those modules still the spot of legit Deno modules.

Maybe the star-based ranking system should feature a system of penalty for module that are not "Denoish" enough.

Regards

Roadmap

Stage 0

  • Usability improvements (#33 and #34)
  • Add integration tests (#43)

Stage 1

  • Track download counts (#57)
  • Display download counts on website as a graph (like crates.io)
  • Run deno info --json on ts/js files in the repository to get a dependency graph. Store in $NAME/versions/$VERSION/meta/deps.json. This tree should be a multi entry point graph with nodes and edges, not a simple tree. (#62)
  • Basic moderation. Do not allow people to register modules with "bad" words in the name or description. (Presumably there is some list somewhere one can match against.) (#58)
  • Make deno info --json work with more modules (eg modules using decorators)
  • Let registry quota be increased per user / org

Stage 2

  • Run deno fmt on all files in repo and store list of failed files in $NAME/versions/$VERSION/meta/analysis.json
  • Run deno lint --json on all files in repo and store list of failed files + the diagnostics in $NAME/versions/$VERSION/meta/analysis.json
  • Run deno check on all files in repo and store list of failed files in $NAME/versions/$VERSION/meta/analysis.json
  • Run other checks on repository (LICENCE & README.md file there, are dependencies pinned to specific version) and store this in $NAME/versions/$VERSION/meta/analysis.json.
  • Better search (#69)

Future

  • Run deno doc --json on all files in the repo and store the output in $NAME/versions/$VERSION/docs/$FILEPATH.json
  • Create & store a type stripped stripped version of all files in $NAME/versions/$VERSION/stripped/$FILEPATH.json
  • A proper API
  • Support for GitLab and BitBucket

The total count is different from the total count of hits.

About GET /modules, total_count is defferent from all results length(โ€ป It only returns 100 at a time, so combine the results).

Deno.test("fetchAll hit count sum EQ fetchOne total count", async () => {
  const module = await fetchOne(); // Expect total_count
  const modules = await fetchAll();  // Actual flatMapped array, every per 100
  assert(modules.every((m) => m.success === true)); // Pass

  const sum = modules.map((m) => m.data.results.length)
    .reduce(
      (accumulator, currentValue) => accumulator + currentValue,
      0,
    );
  assertEquals(sum, module.data.total_count); // Fail
});

test result:

[Diff] Actual / Expected
-   919
+   925

fetchAll and fetchOnesource function detail code:
https://github.com/yoshixmk/deno-x-ranking/blob/master/src/repositories/registry_repository.ts#L19-L44

Async module upload

Currently modules are uploaded synchronously when GitHub sends a webhook. This is not great because if a upload takes longer than 10 seconds GitHub will not display the response (which might be an error) in the webhook display. It will just say that the webhook timed out.

To stop this from happening we should do the following when receiving the Github webhook:

  1. Check that the repository is linked to the given name (and link it if not already linked)
  2. Check that the version has not already been uploaded
  3. Check that the options passed (subdir and version_prefix) are valid.
  4. Add a database entry in a builds collection that contains the repo, registry name, ref, version, and subdir.
  5. Add a event to an SQS queue with the ID of this database entry.
  6. Respond to the webhook with a website link where a user can track the upload progress (and see potential errors).
    • This link would be something in the form of: https://deno.land/x/-/status/5dj4hde32a093d3. The page should display if the publishing is queued, has started, succeeded, or has an error (with the error).
    • This could later also be added as a comment to the commit that the release was created from.

At this point the GitHub webhook is done. Asynchronously triggered by the SQS event, a lambda spins up. It does the following:

  1. Get the database entry with the ID from the SQS event
  2. Update the status in the database to started.
  3. Do the cloning and publishing.
  4. Update the status to success or failure depending on what happened, and add a message about what exactly the issue was (or what warnings there were).

Add smoke tests

After deployment to AWS we should run some smoke tests to see if everything works (mainly search)

Invalid encoding of sqs messages

Currently all modules are failing to publish - 5 are currently in the queue waiting to be published.

SQS messages are not correctly encoded. {"buildID":"5f4e2efc003bedbe00f27b79"} turns into %7B%22buildID%22%3A%225f4e2efc003bedbe00f27b79%22%7D . This is an upstream issue with https://deno.land/x/[email protected]. I will be fixing this issue upstream shortly.

scoped urls

Just wondering if it would make sense to have modules scoped to avoid name clashes?

https://deno.land/x/USER/IDENTIFIER@VERSION/FILE_PATH
instead of
https://deno.land/x/IDENTIFIER@VERSION/FILE_PATH

Track download counts

Goal

We want to have a graph of module download counts like crates.io has. This means that we should have a list of the download counts per module (or module version, or per file) per day.

How to implement

Through discussions with @wperron on Discord we came up with two relatively simple solutions:

  1. SQS + Lambda + MongoDB
    • Have the CloudFlare Worker that serves the raw files add an event to a SQS queue every time it serves a file.
    • Have a AWS Lambda take events out of this queue in batches of 500 and persist them into MongoDB.
    • Create an API endpoint that serves the download count per module (or per module version or per file).
  2. Kinesis Firehose + S3 + Lambda + MongoDB
    • Have the CloudFlare Worker that serves the raw files add an event to a Kinesis Firehose stream every time it serves a file.
    • Have Kinesis Firehose persist this data into S3 as batches
    • Have a cron triggered Lambda that takes batches out of S3 and persists them into MongoDB
    • Create an API endpoint that serves the download count per module (or per module version or per file).
  3. Cloudflare Workers + Cloudflare Analytics API + MongoDB

I am personally more in favour of solution 1 because I feel it is relatively simple to set up (haven't used Kinesis Firehose before).

I prefer option 3 if we have access to the Cloudflare Logpull API. You need to be an enterprise customer to make use of it though.

Decisions to make

  • Track download counts per module or per module version or per file?
  • SQS or Kinesis Firehose

Can one add JavaScript project to Third Party Modules?

What are requirements here? Is mod.ts mandatory?

I'm trying to add a module with only JavaScript files. So far I'm importing it through Github raw file url, which works.

All I'm getting is:

No uploaded versions

This module name has been reserved for a repository, but no versions have been uploaded yet. Modules that do not upload a version within 30 days of registration will be removed.

I have created webhook and first tags/releases after creating space on deno.land/x.

Does it have to be TypeScript? I'm waiting for TypeScript version 4.0 to be ready in order to update the project to TS.

CloudWatch Alarms

Set up a few basic CloudWatch Alarms on a couple of critical metrics (for instance Lambda failures) I suggest also creating an SNS topic to trigger with those alerts so that the maintainers who have access to the PROD aws account can subscribe to it.

Module Search Does Not Reflect Value

I believe the search functionality could be improved somewhat, to give results more specific to the value type in the input field.

For example, take this. scenario I did minutes ago:

  1. Go to https://deno.land/x/
  2. Type "ssh" into the search bar
  3. Results seen doesn't seem to match anything related to ssh (I was hoping someone made a ssh client).

What I would expect to see are modules related to ssh, or if none exist (which i don't think there are), no modules should display

kebab-case > camelCase/snake_case

I would have hoped Deno would force kebab case, because then we could just create the same package names that we use for NPM.

But since that's not the case, (get it? ๐Ÿ˜›) and DENO seems to force either snake case or camel case, I'm conflicted which to choose.

I've seen both snake and camel case packages, but it seems like there are more snake case packages for Deno over camel case ones.

In a perfect world we would force to use just one case, because than it's consistent and you don't get things like eg. deno.land/x/caseConverter and deno.land/x/case_converter from 2 different authors and ppl get confused the whole time which they're using.

Any advice on which case we best use?

Module appeared after failed webhook

Hi all,

my apologizes if this is the wrong place for this issue.

I wanted to list a module,three_4_deno however I've never got passed the Add the webhook section of the Adding a module via deno.land/x

Never-the-less the module has still managed to become listed, albeit leading to a page saying that it doesn't exist.

When I initially tried adding the module, I realized that my repo name and module name were different which is what I think has created the issue however after renaming my github repo, I come up against the following error on my webhook:

{"success":false,"error":"module name is registered to a different repository"}

For the moment, I'm deleting the original repo and trying again with a different module name. Hopefully that'll sort it.

Configure a dead letter queue for the builds queue

Configure a second SQS queue to act as a dead letter queue for bad builds. This would avoid a situation where failing events start piling up in the main queue forever and drive the cost of lambda executions up for no reason. There's an example available in the official docs.

There shouldn't be anything else to modify other than the template.yaml, SQS manages the dispatch of messages to the DLQ on its own.

As far as cost is concerned, in a best case scenario where no builds are ever failing or timing out there won't be any difference on the end-of-month bill since SQS only charges by the amount of PUT requests on the queue. In a worst case scenario where a significant number of builds are failing this will actually lower the end-of-month bill because it will avoid reprocessing those events, and thus reduce the Lambda execution costs.

Integration tests

This project is not well tested at the moment. We need to get a integration test for:

TODO:

  • ping event success
  • ping event bad name
  • ping event no name
  • ping event too many modules per repo
  • ping event registered to different repo
  • create event success
  • create event bad name
  • create event no name
  • create event too many modules per repo
  • create event not a tag
  • create event version prefix
  • create event subdir success
  • create event subdir invalid
  • create event version already exists
  • trigger build success
  • /builds/:id success
  • /builds/:id no id
  • /builds/:id not found
  • /modules success
  • /modules limit & page out of bounds

These tests should be able to run on CI without needing to deploy to AWS. We can emulate S3 with minio, and SQS using localstack elasticmq. MongoDB can just be run locally.

Limit for modules registered by one user

There should be a default limit of 5 modules registered per user or GitHub org. Limit increases can be requested from me. The list of user accounts with limits should be stored in the database.

Support for non-git hosted files?

Is there any plans to support packages that are published to npm or nest or their own S3 storage?

As sometimes the result package is the result of compilation, and it isn't nice to have compiled files inside git, as it makes the git changelogs unnecessarily large and makes git diffs filled with junk.

For instance, Bevry maintains about 30 packages now that support Deno, Node, and Web Browsers, via make-deno-edition. This is done by having source code that is compiled to multiple editions that target different environments. Only the source code is committed to git, and not the compiled editions. The compiled code is published via CI to npm, which is then available via many CDNs. This editions technique allows multiple targets to be supported out of the box, without complicated setups by the consumer, and guaranteed support by the publisher.

A prime example of this is the caterpillar package, which source code is published to the multiple editions, with a deno example.

Cannot change sub directory

Hi,
I uploaded my library github.com/mesqueeb/is-what to Deno.
But after making a tag I saw this error in the web hook:

{"success":false,"info":"provided sub directory is not valid as it does not end with a /"}

I remember I wrote dist for the location of the distribution folder, and I guess I needed to write dist/ instead?

How do I change this sub directory now?

When I open my repository on Deno it just says there's no versions:
https://deno.land/x/is_what

Is there any way I can login to the Deno website to edit my repository settings?

Module not published - Failed to run dependency analysis.

Dear Denoland team,

I've just created a module name here:
https://deno.land/x/deno_react_minimal_frontend

And setup a webhook on my repository.
However, after first release for some hours, the module is not showing up.
The latest publish status page, by the following link, showed message: "Published module. Failed to run dependency analysis."
https://deno.land/status/5f3a03db00d00ffe00127d06

Please help check what should I do next.
Best Regards,
Chakrit W.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.