denoland / deno_registry2 Goto Github PK

View Code? Open in Web Editor NEW

92.0 92.0 14.0 2.01 MB

The backend for the deno.land/x service

Home Page: https://deno.land/x

License: MIT License

Makefile 0.16% TypeScript 62.74% Dockerfile 0.48% Shell 2.58% HCL 34.03%

deno

deno_registry2's Introduction

deno_registry2

This is the backend for the deno.land/x service.

Limits

There are a few guidelines / rules that you should follow when publishing a module:

Please only register module names that you will actually use.
Do not squat names. If you do, we might transfer the name to someone that makes better use of it.
Do not register names which contain trademarks that you do not own.
Do not publish modules containing illegal content.

Additionally to these guidelines there are also hard limits:

You can not publish more than 3 different modules from a single repository source.
You can not publish more than 15 modules from a single GitHub account or organization.

If you need an increase to these quotas, please reach out to [email protected].

Requirements

AWS account
MongoDB Atlas account

Preparing Docker

Make sure to follow the official instructions to login to ECR via the Docker cli - this is needed to push the images used by the Lambda deployment to ECR.

aws ecr get-login-password --region region | docker login --username AWS --password-stdin aws_account_id.dkr.ecr.region.amazonaws.com

Preparing MongoDB Atlas

Create an API key on MongoDB Atlas. The API key should have sufficient privileges to create a new project and configure it afterwards.

Deploy

Install aws CLI.
Sign in to aws by running aws configure
Install Terraform version 0.13 or higher
Copy terraform/terraform.tfvars.example to terraform/terraform.tfvars
Modify terraform/terraform.tfvars: set mongodb_atlas_org_id to your MongoDB Atlas organization ID, and update mongodb_atlas_private_key and mongodb_atlas_public_key with the API key you created earlier.
Move to the terraform/ and comment out the backend section in the meta.tf file (important for first-time apply)
Run the following steps:

terraform init
terraform plan -var-file terraform.tfvars -out plan.tfplan
terraform apply plan.tfplan
aws s3 ls | grep 'terraform-state' # take note of your tf state bucket name
# before the final step, go back and remove the comments from step 5
terraform init -backend-config "bucket=<your-bucket-name>" -backend-config "region=<aws-region>"

Setting up MongoDB

Terraform automatically provisions a MongoDB cluster in a separate project.

In the newly created MongoDB cluster, create a database called production.
In this database create a collection called modules.
In this collection create a new Atlas Search index with the name default and the mapping defined in indexes/atlas_search_index_mapping.json
In this collection create a new index with the name by_owner_and_repo like it is defined in indexes/modules_by_owner_and_repo.json
In this collection create a new index with the name by_is_unlisted_and_star_count like it is defined in indexes/modules_by_is_unlisted_and_star_count.json
In this database create a collection called builds.
In this collection create a new unique index with the name by_name_and_version like it is defined in indexes/builds_by_name_and_version.json

Teardown

Before destroying your staging environment, make sure to:

run terraform state pull to make a local copy of your state file
comment out the backend section of the meta.tf file
re-initialize your terraform workspace by running terraform init -backend-config "region=<aws-region>"
make sure you empty your s3 buckets, otherwise the destroy will fail

You can then run terraform destroy to completely remove your staging environment.

Development

To run tests locally, make sure you have Docker and docker-compose installed. Then run:

make test

deno_registry2's People

Contributors

Stargazers

Watchers

Forkers

wperron martonlederer lucacasonato garronej skillz4killz saturninoabril im-beast christophgysin satya-nutella meszarosdezso isabella232 brianinoa mikeatbuilder seanpm2001 chriss-0x01

deno_registry2's Issues

Publishing an unlisted module sets it to listed again

When publishing a version is_unlisted is set from true to false. This should not happen.

Support for adding modules from specific GitHub branches?

I'd like to convert some of my npm packages so they're Deno-compatible and publish them on https://deno.land/x. The way I'd like to do that is by creating a deno branch for the Deno version of the package -- that would hopefully allow me to keep both the Node and Deno versions in sync.

Is there a way to add a specific branch of a GitHub repository to the registry?

Webhooks should also support content-type application/x-www-form-encoded

Would just eliminate another gotcha. Should be relatively easy to do.

Improve module search

Successor to #56

Problem

The search results are not biased by star count. Some examples:
- term: discord: the highest star count module is only on place 6.
There are no filtering or alternative sorting capabilities (last uploaded at, supports typescript, no external imports)
Search is not always very relevant to what was searched.
No filtering by module tags

Solution

We need to include more data in the documents being searched (github tags, last upload date, and other stats/metrics).
Use a better search engine. I would like to use MeiliSearch because of the search customizability and API (cc @erlend-sh). We need to find a way to host it / get access to a hosted instance. This would also involve pushing updates to the documents in the database to the search engine. This can be accomplished relatively easily.

Support for Multi Registry Module Publishing

Hi Ryan, William and Luca.

Could you think of a simple and sustainable option on how module developers could choose additional module registries during the webhook / release based publishing process?

I think many of us want to publish many of their modules e.g. on https://deno.land/ AND on https://nest.land/

Both mechanisms are super straight forward from my perspective. At the same I'd find it even cooler if deno.land offered an option to "register registries" from which the module publisher could choose where to publish the module additionally to deno.land.
The process behind your webhook could imo call the corresponding APIs of the other - chosen - registries.

This could be an additional incentive for module publishers to use deno.land as primary registry which I believe is great for many consumers who can with that have a promising single point of entry where to start their search for specific features / deno modules.

Kind regards
Michael

Database module fails when connection to the mongo cluster can't be established

Noticed this while testing in my environment; My Atlas cluster was in a paused state and didn't notice, the Lambdas were failing but nothing showed in the logs.

I suggest adding a catch statement in the database constructor to avoid having an instance of that class in an invalid state.

.jsx and .tsx imports are broken

Wrong media type

Can you include the owner name or repository name in the GET /modules response?

About GET /modules

https://github.com/denoland/deno_registry2/blob/a6adde77090a64d7cf1bc594493b6ecd34d671b9/API.md#response-1

Reason

The current response content cannot uniquely identify the repository on Github.

I'm having trouble

I've created a ranking page based on database.json before,
and I want to switch to the current modules API,
but cannot know repository details from module contents.

Products
- https://yoshixmk.github.io/deno-x-ranking/
Github
- https://github.com/yoshixmk/deno-x-ranking/

Maybe

It may be good to create an API that returns detailed module info.

spaces in description are replaced with +

This only happens when content-type of the webhook is x-www-form-urlencoded.

latest version has not been released in deno.land/x

I have set up a webhook, and confirmed the successful call to deno api through the hook call log, and got the correct response, but the latest version has not been released in deno.land/x

Breaking Denoify

Hi,

The new mechanism is better indeed but it breaks Denoify.
that was heavily reliant on database.json.
The new REST API does allow us to list third party modules but does not allows getting the source GitHub repo or the latest TAG ( which is now the fallback version ).

Is there any play to make the module registry consumable by third party services or do I have to find workarounds?

Regards

Publishing is getting stuck in analyzing_dependencies

A build with these options is getting hung up (build times out after 5 minutes) in analyzing_dependencies:

{
  "type": "github",
  "moduleName": "functional",
  "repository": "sebastienfilion/functional",
  "ref": "v0.5.0",
  "version": "v0.5.0",
  "subdir": "library/"
}

The build ID is 5f591efe00791e5b001a6c92.

I am investigating.

Uploading some modules fails

Currently some file uploads fail with the error with error trying to connect: Connection reset by peer (os error 104). I am not 100% sure what causes this. I assume it is because we are sending requests to S3 too quickly.

Previously we also had issues with DNS rate limiting. I was able to mostly resolve this by introducing a upload rate limit of 6 files a second.

I think both issues could be tackled by reusing the HTTP connections to S3. This is something that needs to be fixed upstream in Deno. It should also increase the overall speed of uploading, because many uploads can be multiplexed onto a single HTTP/2 connection and the client side rate limit can be removed.

Deno 1.3.2 caused the registry list and search endpoints to go down

I need to do more investigation into what happened. For now I have reverted #100 with #101. It seems our test environment can not mirror our deployment environment as well as I hoped :-/

Initial release

This is a summary of things that need to be done for the initial release:

Validate requests coming from GitHub (#7)
Search API (#11)
base path parameter (required for std) (#12)
Get std working (version prefix) (#13)
Upload all database.json modules with all released versions as a batch job
Setup Cloudflare to proxy api.deno.land to the AWS API Gateway domain
Add limits (#19)

Moderation filters

We should automatically moderate the names of modules people are uploading. I think we can start with these three steps (ordered by priority):

Add a list of reserved module names that can not be registered automatically. Easiest would be a json file with an array of disallowed names. (@lucacasonato)
Check any new module name against a list of 'bad' words. We need to find a list to use (https://www.cs.cmu.edu/~biglou/resources/bad-words.txt is not good as it blocks words everyday words like color, queer, or africa). (up for grabs)
Disallow any module names that have a levenshtein distance of less than 3 to any other existing module name, bad word, or reserved module name. (up for grabs)

About the ranking system

The first 8 third party repo do not provide type definitions, are not indexed by a mod.ts files, most do not have Deno specific instructions, some have no README.md at all.

It must be a put-off for people wanting to use the third party module repository.
Besides those modules still the spot of legit Deno modules.

Maybe the star-based ranking system should feature a system of penalty for module that are not "Denoish" enough.

Regards

Roadmap

Stage 0

Usability improvements (#33 and #34)
Add integration tests (#43)

Stage 1

Track download counts (#57)
Display download counts on website as a graph (like crates.io)
Run deno info --json on ts/js files in the repository to get a dependency graph. Store in $NAME/versions/$VERSION/meta/deps.json. This tree should be a multi entry point graph with nodes and edges, not a simple tree. (#62)
Basic moderation. Do not allow people to register modules with "bad" words in the name or description. (Presumably there is some list somewhere one can match against.) (#58)
Make deno info --json work with more modules (eg modules using decorators)
Let registry quota be increased per user / org

Stage 2

Run deno fmt on all files in repo and store list of failed files in $NAME/versions/$VERSION/meta/analysis.json
Run deno lint --json on all files in repo and store list of failed files + the diagnostics in $NAME/versions/$VERSION/meta/analysis.json
Run deno check on all files in repo and store list of failed files in $NAME/versions/$VERSION/meta/analysis.json
Run other checks on repository (LICENCE & README.md file there, are dependencies pinned to specific version) and store this in $NAME/versions/$VERSION/meta/analysis.json.
Better search (#69)

Future

Run deno doc --json on all files in the repo and store the output in $NAME/versions/$VERSION/docs/$FILEPATH.json
Create & store a type stripped stripped version of all files in $NAME/versions/$VERSION/stripped/$FILEPATH.json
A proper API
Support for GitLab and BitBucket

The total count is different from the total count of hits.

About GET /modules, total_count is defferent from all results length(※ It only returns 100 at a time, so combine the results).

Deno.test("fetchAll hit count sum EQ fetchOne total count", async () => {
  const module = await fetchOne(); // Expect total_count
  const modules = await fetchAll();  // Actual flatMapped array, every per 100
  assert(modules.every((m) => m.success === true)); // Pass

  const sum = modules.map((m) => m.data.results.length)
    .reduce(
      (accumulator, currentValue) => accumulator + currentValue,
      0,
    );
  assertEquals(sum, module.data.total_count); // Fail
});

test result:

[Diff] Actual / Expected
-   919
+   925

fetchAll and fetchOnesource function detail code:
https://github.com/yoshixmk/deno-x-ranking/blob/master/src/repositories/registry_repository.ts#L19-L44

Async module upload

Currently modules are uploaded synchronously when GitHub sends a webhook. This is not great because if a upload takes longer than 10 seconds GitHub will not display the response (which might be an error) in the webhook display. It will just say that the webhook timed out.

To stop this from happening we should do the following when receiving the Github webhook:

Check that the repository is linked to the given name (and link it if not already linked)
Check that the version has not already been uploaded
Check that the options passed (subdir and version_prefix) are valid.
Add a database entry in a builds collection that contains the repo, registry name, ref, version, and subdir.
Add a event to an SQS queue with the ID of this database entry.
Respond to the webhook with a website link where a user can track the upload progress (and see potential errors).
- This link would be something in the form of: https://deno.land/x/-/status/5dj4hde32a093d3. The page should display if the publishing is queued, has started, succeeded, or has an error (with the error).
- This could later also be added as a comment to the commit that the release was created from.

At this point the GitHub webhook is done. Asynchronously triggered by the SQS event, a lambda spins up. It does the following:

Get the database entry with the ID from the SQS event
Update the status in the database to started.
Do the cloning and publishing.
Update the status to success or failure depending on what happened, and add a message about what exactly the issue was (or what warnings there were).

Set up s3 bucket replication policy for the storage bucket

In order to have greater redundancy and backup we should set up s3 to replicate module data to another region

Follow up Lambda timeouts in API Gateway

Discovered while developing: Lambda timeouts get converted to 500 Internal Errors in API Gateway

bug: repositories should have a maximum amount of attached module names

I suggest going with 3 for now. This is to make it more difficult to register many ids that a user will not use.

Add smoke tests

After deployment to AWS we should run some smoke tests to see if everything works (mainly search)

Invalid encoding of sqs messages

Currently all modules are failing to publish - 5 are currently in the queue waiting to be published.

SQS messages are not correctly encoded. {"buildID":"5f4e2efc003bedbe00f27b79"} turns into %7B%22buildID%22%3A%225f4e2efc003bedbe00f27b79%22%7D . This is an upstream issue with https://deno.land/x/[email protected]. I will be fixing this issue upstream shortly.

Add names to the lambdas in cloudformation templates

This will allow version rollback for individual lambdas.

Reenable git recursive flag

Currently disabled due to performance issues when cloning the denoland/deno repo.

Repository name matching should be case insensitive

https://discordapp.com/channels/684898665143206084/684898665151594506/740556533770879048

scoped urls

Just wondering if it would make sense to have modules scoped to avoid name clashes?

https://deno.land/x/USER/IDENTIFIER@VERSION/FILE_PATH
instead of
https://deno.land/x/IDENTIFIER@VERSION/FILE_PATH

Projects with dot in the name

I'm not sure but looks like you broke something in the last update. Modules which have dot in their project names are not available now, for instance:

As a result, other modules that depend on them can no longer be installed if they are not already cached.

Track download counts

Goal

We want to have a graph of module download counts like crates.io has. This means that we should have a list of the download counts per module (or module version, or per file) per day.

How to implement

Through discussions with @wperron on Discord we came up with two relatively simple solutions:

SQS + Lambda + MongoDB
- Have the CloudFlare Worker that serves the raw files add an event to a SQS queue every time it serves a file.
- Have a AWS Lambda take events out of this queue in batches of 500 and persist them into MongoDB.
- Create an API endpoint that serves the download count per module (or per module version or per file).
Kinesis Firehose + S3 + Lambda + MongoDB
- Have the CloudFlare Worker that serves the raw files add an event to a Kinesis Firehose stream every time it serves a file.
- Have Kinesis Firehose persist this data into S3 as batches
- Have a cron triggered Lambda that takes batches out of S3 and persists them into MongoDB
- Create an API endpoint that serves the download count per module (or per module version or per file).
Cloudflare Workers + Cloudflare Analytics API + MongoDB

~~I am personally more in favour of solution 1 because I feel it is relatively simple to set up (haven't used Kinesis Firehose before).~~

I prefer option 3 if we have access to the Cloudflare Logpull API. You need to be an enterprise customer to make use of it though.

Decisions to make

Track download counts per module or per module version or per file?
SQS or Kinesis Firehose

Can one add JavaScript project to Third Party Modules?

What are requirements here? Is mod.ts mandatory?

I'm trying to add a module with only JavaScript files. So far I'm importing it through Github raw file url, which works.

All I'm getting is:

No uploaded versions

This module name has been reserved for a repository, but no versions have been uploaded yet. Modules that do not upload a version within 30 days of registration will be removed.

I have created webhook and first tags/releases after creating space on deno.land/x.

Does it have to be TypeScript? I'm waiting for TypeScript version 4.0 to be ready in order to update the project to TS.

CloudWatch Alarms

Set up a few basic CloudWatch Alarms on a couple of critical metrics (for instance Lambda failures) I suggest also creating an SNS topic to trigger with those alerts so that the maintainers who have access to the PROD aws account can subscribe to it.

Use GitHub repo ID to identify repos

Currently we use owner/repository. This would solve the repo name change issue.

Module Search Does Not Reflect Value

I believe the search functionality could be improved somewhat, to give results more specific to the value type in the input field.

For example, take this. scenario I did minutes ago:

Go to https://deno.land/x/
Type "ssh" into the search bar
Results seen doesn't seem to match anything related to ssh (I was hoping someone made a ssh client).

What I would expect to see are modules related to ssh, or if none exist (which i don't think there are), no modules should display

kebab-case > camelCase/snake_case

I would have hoped Deno would force kebab case, because then we could just create the same package names that we use for NPM.

But since that's not the case, (get it? 😛) and DENO seems to force either snake case or camel case, I'm conflicted which to choose.

I've seen both snake and camel case packages, but it seems like there are more snake case packages for Deno over camel case ones.

In a perfect world we would force to use just one case, because than it's consistent and you don't get things like eg. deno.land/x/caseConverter and deno.land/x/case_converter from 2 different authors and ppl get confused the whole time which they're using.

Any advice on which case we best use?

Module appeared after failed webhook

Hi all,

my apologizes if this is the wrong place for this issue.

I wanted to list a module,three_4_deno however I've never got passed the Add the webhook section of the Adding a module via deno.land/x

Never-the-less the module has still managed to become listed, albeit leading to a page saying that it doesn't exist.

When I initially tried adding the module, I realized that my repo name and module name were different which is what I think has created the issue however after renaming my github repo, I come up against the following error on my webhook:

{"success":false,"error":"module name is registered to a different repository"}

For the moment, I'm deleting the original repo and trying again with a different module name. Hopefully that'll sort it.

Webhooks should handle all event types

If the user selects "Send me everything." or "Just the push event." it should still work.

Configure a dead letter queue for the builds queue

Configure a second SQS queue to act as a dead letter queue for bad builds. This would avoid a situation where failing events start piling up in the main queue forever and drive the cost of lambda executions up for no reason. There's an example available in the official docs.

There shouldn't be anything else to modify other than the template.yaml, SQS manages the dispatch of messages to the DLQ on its own.

As far as cost is concerned, in a best case scenario where no builds are ever failing or timing out there won't be any difference on the end-of-month bill since SQS only charges by the amount of PUT requests on the queue. In a worst case scenario where a significant number of builds are failing this will actually lower the end-of-month bill because it will avoid reprocessing those events, and thus reduce the Lambda execution costs.

Deploy cf worker and DNS via terraform

Integration tests

This project is not well tested at the moment. We need to get a integration test for:

TODO:

These tests should be able to run on CI without needing to deploy to AWS. We can emulate S3 with minio, and SQS using ~~localstack~~ elasticmq. MongoDB can just be run locally.

Hiccups during webhook creation

Limit for modules registered by one user

There should be a default limit of 5 modules registered per user or GitHub org. Limit increases can be requested from me. The list of user accounts with limits should be stored in the database.

Support for non-git hosted files?

Is there any plans to support packages that are published to npm or nest or their own S3 storage?

As sometimes the result package is the result of compilation, and it isn't nice to have compiled files inside git, as it makes the git changelogs unnecessarily large and makes git diffs filled with junk.

For instance, Bevry maintains about 30 packages now that support Deno, Node, and Web Browsers, via make-deno-edition. This is done by having source code that is compiled to multiple editions that target different environments. Only the source code is committed to git, and not the compiled editions. The compiled code is published via CI to npm, which is then available via many CDNs. This editions technique allows multiple targets to be supported out of the box, without complicated setups by the consumer, and guaranteed support by the publisher.

A prime example of this is the caterpillar package, which source code is published to the multiple editions, with a deno example.

Cannot change sub directory

Hi,
I uploaded my library github.com/mesqueeb/is-what to Deno.
But after making a tag I saw this error in the web hook:

{"success":false,"info":"provided sub directory is not valid as it does not end with a /"}

I remember I wrote dist for the location of the distribution folder, and I guess I needed to write dist/ instead?

How do I change this sub directory now?

When I open my repository on Deno it just says there's no versions:
https://deno.land/x/is_what

Is there any way I can login to the Deno website to edit my repository settings?

registering new third party modul seems to have hiccups - see screenshot

... came when saving the webhook for new third party module named telegram_bot_ui

Module size calculation error

Discovered while working working to optimize this snippet. It looks like files that are nested more than one level down the directory tree are counted more than once.

Module not published - Failed to run dependency analysis.

Dear Denoland team,

I've just created a module name here:
https://deno.land/x/deno_react_minimal_frontend

And setup a webhook on my repository.
However, after first release for some hours, the module is not showing up.
The latest publish status page, by the following link, showed message: "Published module. Failed to run dependency analysis."
https://deno.land/status/5f3a03db00d00ffe00127d06

Please help check what should I do next.
Best Regards,
Chakrit W.

Add CLA

Add total number of versions for a modules

should be added to:

the stats endpoint