GithubHelp home page GithubHelp logo

zkat / pacote Goto Github PK

View Code? Open in Web Editor NEW
281.0 9.0 62.0 1.69 MB

programmatic npm package and metadata downloader (moved!)

Home Page: https://github.com/npm/pacote

License: MIT License

JavaScript 100.00%
npm package-management

pacote's Introduction

pacote npm version license Travis AppVeyor Coverage Status

NOTE: This repo has moved to https://github.com/npm/pacote and only exists for archival purposes.

pacote is a Node.js library for downloading npm-compatible packages. It supports all package specifier syntax that npm install and its ilk support. It transparently caches anything needed to reduce excess operations, using cacache.

Install

$ npm install --save pacote

Table of Contents

Example

const pacote = require('pacote')

pacote.manifest('pacote@^1').then(pkg => {
  console.log('package manifest for registry pkg:', pkg)
  // { "name": "pacote", "version": "1.0.0", ... }
})

pacote.extract('http://hi.com/pkg.tgz', './here').then(() => {
  console.log('remote tarball contents extracted to ./here')
})

Features

Contributing

The pacote team enthusiastically welcomes contributions and project participation! There's a bunch of things you can do if you want to contribute! The Contributor Guide has all the information you need for everything from reporting bugs to contributing entire new features. Please don't hesitate to jump in if you'd like to, or even ask us questions if something isn't clear.

API

> pacote.manifest(spec, [opts])

Fetches the manifest for a package. Manifest objects are similar and based on the package.json for that package, but with pre-processed and limited fields. The object has the following shape:

{
  "name": PkgName,
  "version": SemverString,
  "dependencies": { PkgName: SemverString },
  "optionalDependencies": { PkgName: SemverString },
  "devDependencies": { PkgName: SemverString },
  "peerDependencies": { PkgName: SemverString },
  "bundleDependencies": false || [PkgName],
  "bin": { BinName: Path },
  "_resolved": TarballSource, // different for each package type
  "_integrity": SubresourceIntegrityHash,
  "_shrinkwrap": null || ShrinkwrapJsonObj
}

Note that depending on the spec type, some additional fields might be present. For example, packages from registry.npmjs.org have additional metadata appended by the registry.

Example
pacote.manifest('[email protected]').then(pkgJson => {
  // fetched `package.json` data from the registry
})

> pacote.packument(spec, [opts])

Fetches the packument for a package. Packument objects are general metadata about a project corresponding to registry metadata, and include version and dist-tag information about a package's available versions, rather than a specific version. It may include additional metadata not usually available through the individual package metadata objects.

It generally looks something like this:

{
  "name": PkgName,
  "dist-tags": {
    'latest': VersionString,
    [TagName]: VersionString,
    ...
  },
  "versions": {
    [VersionString]: Manifest,
    ...
  }
}

Note that depending on the spec type, some additional fields might be present. For example, packages from registry.npmjs.org have additional metadata appended by the registry.

Example
pacote.packument('pacote').then(pkgJson => {
  // fetched package versions metadata from the registry
})

> pacote.extract(spec, destination, [opts])

Extracts package data identified by <spec> into a directory named <destination>, which will be created if it does not already exist.

If opts.digest is provided and the data it identifies is present in the cache, extract will bypass most of its operations and go straight to extracting the tarball.

Example
pacote.extract('[email protected]', './woot', {
  digest: 'deadbeef'
}).then(() => {
  // Succeeds as long as `[email protected]` still exists somewhere. Network and
  // other operations are bypassed entirely if `digest` is present in the cache.
})

> pacote.tarball(spec, [opts])

Fetches package data identified by <spec> and returns the data as a buffer.

This API has two variants:

  • pacote.tarball.stream(spec, [opts]) - Same as pacote.tarball, except it returns a stream instead of a Promise.
  • pacote.tarball.toFile(spec, dest, [opts]) - Instead of returning data directly, data will be written directly to dest, and create any required directories along the way.
Example
pacote.tarball('[email protected]', { cache: './my-cache' }).then(data => {
  // data is the tarball data for [email protected]
})

> pacote.tarball.stream(spec, [opts])

Same as pacote.tarball, except it returns a stream instead of a Promise.

Example
pacote.tarball.stream('[email protected]')
.pipe(fs.createWriteStream('./pacote-1.0.0.tgz'))

> pacote.tarball.toFile(spec, dest, [opts])

Like pacote.tarball, but instead of returning data directly, data will be written directly to dest, and create any required directories along the way.

Example
pacote.tarball.toFile('[email protected]', './pacote-1.0.0.tgz')
.then(() => /* pacote tarball written directly to ./pacote-1.0.0.tgz */)

> pacote.prefetch(spec, [opts])

THIS API IS DEPRECATED. USE pacote.tarball() INSTEAD

Fetches package data identified by <spec>, usually for the purpose of warming up the local package cache (with opts.cache). It does not return anything.

Example
pacote.prefetch('[email protected]', { cache: './my-cache' }).then(() => {
  // ./my-cache now has both the manifest and tarball for `[email protected]`.
})

> pacote.clearMemoized()

This utility function can be used to force pacote to release its references to any memoized data in its various internal caches. It might help free some memory.

pacote.manifest(...).then(() => pacote.clearMemoized)

> options

pacote accepts the options for npm-registry-fetch as-is, with a couple of additional pacote-specific ones:

opts.dirPacker

Expects a function that takes a single argument, dir, and returns a ReadableStream that outputs packaged tarball data. Used when creating tarballs for package specs that are not already packaged, such as git and directory dependencies. The default opts.dirPacker does not execute prepare scripts, even though npm itself does.

opts.enjoy-by
  • Alias: opts.enjoyBy, opts.before
  • Type: Date-able
  • Default: undefined

If passed in, will be used while resolving to filter the versions for registry dependencies such that versions published after opts.enjoy-by are not considered -- as if they'd never been published.

opts.include-deprecated
  • Alias: opts.includeDeprecated
  • Type: Boolean
  • Default: false

If false, deprecated versions will be skipped when selecting from registry range specifiers. If true, deprecations do not affect version selection.

opts.full-metadata
  • Type: Boolean
  • Default: false

If true, the full packument will be fetched when doing metadata requests. By defaul, pacote only fetches the summarized packuments, also called "corgis".

opts.tag
  • Alias: opts.defaultTag
  • Type: String
  • Default: 'latest'

Package version resolution tag. When processing registry spec ranges, this option is used to determine what dist-tag to treat as "latest". For more details about how pacote selects versions and how tag is involved, see the documentation for npm-pick-manifest.

opts.resolved
  • Type: String
  • Default: null

When fetching tarballs, this option can be passed in to skip registry metadata lookups when downloading tarballs. If the string is a file: URL, pacote will try to read the referenced local file before attempting to do any further lookups. This option does not bypass integrity checks when opts.integrity is passed in.

opts.where
  • Type: String
  • Default: null

Passed as an argument to npm-package-arg when resolving spec arguments. Used to determine what path to resolve local path specs relatively from.

pacote's People

Contributors

agy avatar alexsey avatar andreeib avatar andreineculau avatar armandocanals avatar bridgear avatar calebsander avatar colinrotherham avatar dryganets avatar evocateur avatar iarna avatar imsnif avatar isaacs avatar jozemlakar avatar jviotti avatar keithamus avatar larsgw avatar mcibique avatar mistydemeo avatar nlkluth avatar redonkulus avatar rmg avatar strugee avatar stsvilik avatar tgandrews avatar xqin avatar zarenner avatar zkat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pacote's Issues

feature: "safe mode" for extraction

While the CLI probably doesn't need to worry about this much except in case of catastrophe, there's some user tooling that could really benefit from pacote's default mode refusing to overwrite directory contents on extract.

opts.extractOverwrite should be required for anyone who targets a directory that: A. exists, B. has any contents in it.

It's ok to be racy about this. If two processes shove things in one dir at the same time, so be it. This feature is primarily to protect about what is bound to be a common footgun for users using straight-up pacote (it literally just bit me and I don't wanna even).

If you think this is an interesting bug to pick up, this is what I think is generally the direction to go in:

  • add the extractOverwrite option to lib/util/opt-check -- you won't be able to read it otherwise.

  • add a conditional readdirAsync() call early on in extract.js, before most other work is done. (note: readdirAsync is basically const readdirAsync = BB.promisify(fs.readdir), by convention).

  • If the resulting listing has any items in it, throw an error with a useful code and an explanation of what the user tried to do -- include a mention of opts.extractOverwrite in the error message so it's discoverable.

  • If opts.extractOverwrite is true, bypass the fs.readdirAsync call entirely with a BB.resolve().

Feel free to do it your way, too, if you find a better alternative. The goal is to prevent users from accidentally writing into things they didn't intend to write to.

Skip specifier parsing if we already got a specifier object

Right now, we call realize-package-specifier whenever we call either pacote.tarball or pacote.manifest or pacote.prefetch . While this is convenient for our own testing, and potentially standalone (read: non-npm) uses of the library, the CLI will basically always have a Result object to pass in -- then we can skip the whole parsing process, and we don't make the CLI construct a bullshit pretend-specifier-string like it does in a bunch of places right now.

Implement generic git handler

This one should definitely use git proper for any requests. And probably some sort of specialized caching technique (since it's not just gonna go through the http client)

Can this be made to work in the browser please? :-)

Some days ago on twitter https://twitter.com/serapath/status/856908380731916288

Now I just stumbled upon the module.

It seems it currently does not work in the browser, but if it would, that would be awesome, because I would love to use it.

Other than that - one feature I'd love to use it to prompt a user for a token so that it's possible to actually publish data to npm from the browser (think: in-browser Javascript IDE)

I would also try to implement it myself, but dont know what kind of requests I would need to make or how I can learn about it and on top of that if it's even possible regarding maybe CORS settings.

Distinguish between pacote-bugs, user errors, and server issues

Right now, there's a bunch of errors that get spit out by various deps and such that we use.

The CLI eventually has to make its own decisions about these, but it would be super handy to distinguish between three main error categories:

  • Things the user did wrong (and can fix): auth errors, bad arg or opt syntax, 404s, etc.
  • Things pacote code fucked up with: basically any unexpected conditions
  • Abnormal conditions: bad data, missing content, network timeouts, filesystem failures

This might be a really great first step towards having much richer error reporting data for CLI users to consume -- specially user-level conditions that we might be able to be very specific about.

Add ARCHITECTURE.md

Once the codebase stabilizes a bit, I want to write an ARCHITECTURE.md document that gives an overview of how the project is structured, the purpose of various components, overall design concepts that are good to remember which might not be too obvious, etc.

Better auth support

The current auth stuff is kinda janky. Figure out an auth mechanism that generalizes well and make configuring it more straightforward. :waves-hands:

Move caching code straight into cacache

Honestly, everything in that cache wrapper is a simple call straight to cacache. The main addition of importance here is memoization, and that can reasonably be moved straight into cacache for convenience.

Once the http client has been refactored, it'll probably account for most of the cache-related calls. That client, then, can just call cacache instead of maintaining even more code.

Switch to new npm-package-arg

npm/npm-package-arg#21 will eventually get merged, and it involves a pretty major API change for that library. pacote should switch to using that instead when parsing non-object specs, and expect npa output objects to be the values passed in as spec objects.

Implement local tarball handler

This one should be super simple! Add the tarball to the cache pretty much directly! Manifests, again, will need to get picked up during extraction, though :(

Cannot install git dependency from Bitbucket Server

Steps to reproduce: npm install with the following line in package.json ([REDACTED] is a Bitbucket Server host; this dependency works in npm 4)

"circular-list": "git+ssh://git@[REDACTED]/circular-list.git#v1.0.2",

output:

npm ERR! code 128
npm ERR! Command failed: /usr/local/bin/git clone --depth=1 -q -b v1.0.2^{} ssh://git@[REDACTED]/circular-list.git /Users/matthew.brennan/.npm/_cacache/tmp/git-clone-d9115048
npm ERR! warning: templates not found /var/folders/0m/smmrszcj367g1ds3nkjrv2y42l6kl3/T/pacote-git-template-tmp/git-clone-410ac485
npm ERR! fatal: Remote branch v1.0.2^{} not found in upstream origin
npm ERR!


`npm version`:

```json
{ npm: '5.0.0',
  ares: '1.10.1-DEV',
  http_parser: '2.7.0',
  icu: '57.1',
  modules: '48',
  node: '6.9.2',
  openssl: '1.0.2j',
  uv: '1.9.1',
  v8: '5.1.281.88',
  zlib: '1.2.8' }

cloned from npm/npm#16789

performance instrumentation

Add various bits of noteworthy performance analytics to be collected on the fly, and log them out as things complete.

Some ideas:

  • manifest fetch time
  • tarball fetch time
  • extract time
  • number of finalize-manifest tarball extractions

http://npm.im/request-capture-har might be of use for the network part of this

Support ECONNRESET recovery

So with the http client, it's possible for a request to die mid-stream. Right now, that just kinda implodes and starts the process over. Instead, we should emit reset events on retries. For bonus points, the client should handle http Range requests, which would avoid that reset on http retries -- so the stream can start over exactly where it left off!

Range requests are often supported OOTB by various http servers, and we can just check if our Range was accepted (by looking for Content-Range) and otherwise do the full reset. This should be cool!

Implementing this, though, very likely requires ripping open npm-registry-client, which I guess we should be doing anyway.

Cache the work in `finalize-manifest`

Currently, lib/finalize-manifest not only "fills out" and standardizes the manifest format, but might also potentially request and extract a tarball to make sure _shasum, _shrinkwrap, and bin have the right data in them.

All that heavy lifting of extracting package tarballs during the manifest stage, though, isn't cached at all.

A custom cache key of some sort should be added such that we can cache the results of completeFromTarball only when a tarball extraction is needed. Don't risk hitting the disk unless we really have to. The results of that function can also be memoized, in case we have multiple requests for it.

Include deprecation information

Deprecation warnings in npm are weird: right now, it's the CLI's cache that takes care of this. It's probably not a good idea for the cache itself to be responsible for this. At the same time, deprecation information is passed to the CLI in headers -- so pacote would have to know about them already.

So, do the following: Add a _deprecated: Bool field to the finalized manifest based on that registry header. Let the npm installer take care of the rest.

Authorization header is forwarded on redirect to different host

@simonua @zkat
Some npm registries redirect to another host for package tarball downloads. For example, Microsoft VSTS redirects to Azure Blob.

pacote (or possibly make-fetch-happen/node-fetch) appears to forward authorization headers on a redirect to another host, unlike previous versions of npm. In the specific case of Azure Blob, these credentials are invalid (the correct token is provided in the URI querystring), and an Authorization header must not be present.
This results in an error like:

npm ERR! 400 Authentication information is not given in the correct format. Check the value of Authorization header.

More generally, from a security perspective forwarding credentials (by default, at least) to another host isn't great.

revalidate tarballs on checksum failure

The cache can get into various states where cached data will no longer pass integrity checks. When this happens, the prefetch or extract call fails because local data's bad.

So. Try again.

Document 'refacotr' tag

Both Pacote an Cacache are using this label, but it's not documented in either CONTRIBUTING guide.

Return ETARGET on missing manifest?

npm currently returns an ETARGET error whenever it tries and fails to fetch package metadata: https://github.com/npm/npm/blob/1067febf1875c92d6498ede7c0b20012a0c33d30/lib/fetch-package-metadata.js#L154-L162

For the sake of better general compatibility, should pacote return the same type of error? I've been thinking that the current style of just chucking out ENOENT is bound to cause problems if there's different types of ENOENTs coming from different parts of pacote or cacache.

Manifest cache should be skipped if compatible version not found

If npm has a manifest cached, but fails to find a matching version in a given manifest, it will assume a cache miss and try a full request. See https://github.com/npm/npm/blob/1067febf1875c92d6498ede7c0b20012a0c33d30/lib/fetch-package-metadata.js#L146-L152

This can cause some annoying issues when, for example, someone tries to bump their local version shortly after publishing -- their next install will take some period of time (depending on opts.maxAge) before the manifest request expires and gets re-requested.

This can probably be implemented right into https://github.com/zkat/pacote/blob/latest/lib/registry/manifest.js. The general idea would be to have pacote.manifest() try the usual case of a requested version being found, and after pickManifest, try the request + manifest picking just one more time, after busting the cache.

To cache bust, two things will be needed: one, a way for a cache to invalidate a specific key and nothing else (on disk), and another to bust the memoized version of that key. That can be added to lib/cache/index.js.

Normalize and standardise manifests

pacote should normalize metadata fields to only the things the CLI might need, and standardize those across the different sources. Should also run the manifest through normalize-package-data.

Start benchmark suite

pacote is built for performance. Performance is meaningless without benchmarks and profiling. So. We need benchmarks.

There should be benchmarks for each of the supported types (note: only registry ones are needed for 0.2.0), with small, medium, and large packages (including some variation for number of files vs size of individual files). All of these for both manifest-fetching and extraction.

We should make sure all the benchmarks run hit the following cases too, for each of the groups described above:

  • no shrinkwrap, tarball extract required
  • no shrinkwrap, but with pkg._hasShrinkwrap === false (so no extract)
  • has shrinkwrap, with alternative fetch (so, an endpoint, git-host, etc)
  • has shrinkwrap, tarball extract required
  • cached data, no memoization (lib/cache exports a _clearMemoized() fn for this purpose)
  • memoized manifest data (tarballs are not memoized)
  • cached data for package needing shrinkwrap fetch
  • memoized data for package needing shrinkwrap fetch
  • stale cache data (so, 304s)
  • concurrency of 50-100 for all of the above, to check for contention and starvation (this is usually what the CLI will set its concurrency to).

https://npm.im/benchmark does support async stuff and seems like a reasonable base to build this suite upon.

Marking this as starter because while it's likely to take some time to write, you need relatively little context to be able to write some baseline benchmarks for the above cases. The actual calls are literally all variations of pacote.manifest() and pacote.extract() calls: that's the level these benchmarks should run at, rather than any internals. At least for now.

I would also say that comparing benchmark results across different versions automatically is just a stretch goal, because the most important bit is to be able to run these benchmarks at all.

Add CONTRIBUTING.md

I would really like to have a straightforward CONTRIBUTING.md file folks can check out when they open up this repo -- hacking on pacote is a fairly streamlined thing, and it shouldn't need much explaining. This, combined with the starter tag I'm slapping on stuff, should be a huge help in getting outside contributions <3

offline mode

support an enforced offline mode which errors if any network requests are attempted and tries to use the cache as much as possible

Fill in `bin` directories for manifests

Related to #17, another thing that we need to do in order to have a complete manifest is to fill in the bin field if there is a directories entries with bin in it. And excludes anything that starts with a . in the bin dir. This isn't needed at all if there's already a bin field, or if there's no directories field.

Correct minor spelling mistake in CONTRIBUTING

In CONTRIBUTING.md the second paragraph reads

Please make sure to read the relevant section before making your contribution! It will make it a lot easier for us maintainers to make the most of it and smooth out the experience fo all involved. ๐Ÿ’š

"fo all involved" should be changed to "of all involved"!

Check out the section on Contributing Documentation to discover how to make this contributing!

๐ŸŒž

Split caching request client into a separate module or package

So lib/registry/request has very little about it that is actually registry-specific. It's probably a good idea to fork this out as a standalone server-agnostic caching client (as opposed to how it currently does a bit of registry-specific work mixed in). This should make the client usable for other things like remote tarballs and grabbing git stuff directly.

Implement remote tarball handler

This one should be super straightforward on the tarball side, but will probably need some munging on the manifest front because we need to grab the manifests (probably mid-stream!) from the tarball.

Note: this would allow cached tarball downloads, and it can probably lean right on registry/request.js. manifest can probably be done by a dummy manifest with a _resolved field, and then expand finalize-manifest to fill in the rest of the manifest from the package.json in the tarball :)

Bulk request fetching

bulk stuff is way faster than stream-based stuff. Should do #25 before doing this just so we have solid numbers on the difference, and stream stuff needs to still exist because we want to be able to handle multi-gig files.

Ideally, pacote.extract and pacote.prefetch would only use streams for particularly big packages. pacote.manifest should always use bulk requests.

Write tests for git deps

There's basically no coverage on them, but they might be a bit tricky to write due to having to set up and launch git daemons. Once the mocking utility's done, though, things should go much faster.

This really took a bite out of general coverage for the library so it's best to try and get this done sooner rather than later :\

Better config system

opt-check was a pretty basic option handling mechanism but I'm not feeling great about it: it silently fails if it gets unexpected options (and those options are later requested), it doesn't support types or any sort of verification for options, and it just assumes that everything is gonna want all the options.

But, as it turns out, as we call individual things, they expect other subsets of options. It might be nice to have an opts mechanism where every layer can specify what exactly it wants and needs, so it's easy to see what's using what, and at what level -- specially stuff we're passing to dependencies like cacache, which have a bunch of their own opts!

Implement git handlers

This one's a tricky one. There are many ways to get the contents of a git repo. The best way for pacote to handle git is by judiciously picking which of these to use depending on what's being requested, and trying to avoid a full git clone at all costs.

Semver range dependencies should be resolved according to npm/npm#15308. These can be resolved with a git ls-remote.

These are the possible ways I've found so far to get either full package data, or subsets, and associated caveats:

fullfat clone

$ git clone https://github.com/npm/npm

  • will work pretty much universally
  • may need to try fallback protocols for hosted git
  • is bloody slow (npm/npm: 13s, zkat/cacache: 1.7s (so it won't be so bad on small repos))
  • needs its own entire caching scheme to retain repos, if desired (can be postponed?)

shallow clone

$ git clone https://github.com/npm/npm --depth=1 -b <named-ref>

  • Only works for HEAD or named refs. Any commit hash that has an existing remote ref will work, too.
  • Will always work for semver-range git, because all tags are named refs. ๐ŸŽ‰
  • much much faster than git clone for larger repos but still fairly heavy. (npm/npm: 4.35s, zkat/cacache: 1.52s)
  • github folks might not like us very much if we do this a lot?
  • can tar + cache directly and remove cloned dir from tmp
  • I thought I could fall back to a HEAD clone + git fetch but alas that is also not possible without a named ref.

git archive

$ git archive --format=tar.gz --prefix=package/ --remote=https://github.com/npm/npm <committish>

  • direct tarball download
  • BIG CAVEAT: must be manually enabled server-side. github does not enable it. Seems to fail very quickly when not enabled, though.
  • works on non-hosted git when enabled
  • uses regular git authentication mechanism
  • Probably a really fantastic idea for private corporate git servers
  • package/ prefix can be added with --prefix= option.
  • There is also a terrifying monstrosity that lets you fetch individual files, but I document this here purely for the horror value. It's not worth it.

hosted git tarballs

$ curl -LO https://github.com/npm/npm/archive/<committish>.tgz

  • github caches these! they can be pretty fast! (npm/npm: 0.6s, zkat/cacache: 0.39s)
  • I have no idea right now how to authenticate these for private repos
  • can target any committish (not just named refs)
  • only available on hosted git types supported by hosted-git-info
  • can lean on pacote's existing http caching mechanisms, transparently
  • contents will not be inside package/ by default, so we need to manually add a level

individual direct file download

$ curl -LO https://raw.githubusercontent.com/npm/npm/<committish>/package.json

  • only for fetching package.json and npm-shrinkwrap.json
  • can't even fill out bin when directories.bin is there.
  • only works on hosted gits that support it.
  • allows fast filling out of some manifests without having to fetch a full tarball.

remote ref lookup

$ git ls-remote https://github.com/npm/npm -t -h '*'

  • Fetches a full ref list from a remote
  • Not as fast as you'd think (npm/npm: 0.83s, zkat/cacache: 0.37s)
  • Useful for finding named refs (for semver support or to possibly avoid a full clone)
  • No speed difference between -t -h and -t. The former will use more RAM but increase chances of a non-semver committish matching. The latter will be smaller and be all that's needed for a semver ref.

When I think about implementing this

everybody panic

feature: switch to node-tar@2

There's cool stuff happening over in node-tar land. Once it's solid and released, let's shove it in pacote and bask in its absurdly performant glory.

Write user guide

pacote should have a step-by-step guide on how to use it. This is probably pretty straightforward, since the API surface is relatively small. Still, it's good to have this.

pacote.prefetch

The npm installer currently has 3 major stages relevant to pacote -- but only uses two, right now.

While it's nice to stream end-to-end in a single step with pacote.extract, the whole point of having a multi-stage installer is to take advantage of the predictability and isolation that come from having discrete steps. pacote.extract, as it turns out, does a lot -- and it's used in the extract step of the installer.

There's another step that's been commented out for a while now: fetch, which is intended to be the stage where npm will actually go out into the network to grab any tarballs.

So, the proposal here is to add another toplevel API function to pacote: pacote.prefetch. Its job should be purely to warm up pacote's cache and allow pacote.extract to bypass the cache index and always extract tarballs by digest, since it'll know those tarballs are present.

The nice thing is all the code that pacote.prefetch would need is already there: simply calling the appropriate tarball.js handler and draining the stream into the void (stream.on('data', function () {})) is good enough to put it together. All of that code is already in pacote.extract. ๐ŸŽ‰

Integrate cacache@6

cacache 6 involves some big changes! Most notably, changing a bunch of stuff to be Promise-based, a new on-disk format, and moving all the memoization code out of pacote and back into cacache itself.

As part of this integration, pacote itself should be updated to use Promise, the lib/cache code should be torn out, and cacache should be used directly.

This is gonna have to start before cacache@6 itself is tagged because I really wanna know the API changes are good and we don't need to move anything else in there.

cache fallback for offline modes

preferOffline and offline both change the fetch mechanism to in one way or another, lean towards maximizing use of the local caches.

There is an issue, though, where package metadata may have been fetched into the cache, but a corresponding matching latest package may not have been downloaded. In these cases, it turns out, we may in fact have a semver-compatible tarball available in the cache that at least offline could fall back to in this particular corner case.

This case, though, is probably pretty rare. I think. Just an idea out there that's pretty low-priority. And I might be wrong about how rare that situation really is.

opts.extraHeaders for extensible header-passing

npm-registry-client shoots out all these special headers itself. A lot of them have to do more with specific npm features than just fetching packages.

Move these headers out of pacote and into a single opts.extraHeaders opt in npm.

Examples:

  • npm-scope
  • npm-in-ci
  • referer
  • user-agent (?)

cache invalidation for finalized manifests

there's currently a finalized manifest caching scheme that is keyed off pkg._resolved, assuming it's a globally-unique, immutable key: this is not the case.

Perhaps it would be good to add the _shasum for a tarball to that name, and skip cache reads if we don't have a shasum either in pkg._shasum or opts.digest. Shove the hashAlgorithm in there too for good measure.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.