Spawned from a comment on <a href="https://github.com/nodejs/node/issues/18233#issueco

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

--loader hooks for require(),about nodejs/modules

Comments (31)

giltayar commented on May 23, 2024 5

The migration path is CJS => ESM. Whatever needs to be implemented to ease this transition is important. In this case, implementing loader hooks will help moving code from ESM => CJS. Neat, but not terribly important, IMO.

I would prefer using our resources elsewhere.

Actually, having loader hooks only in ESM may make the transition go faster! (people will move to ESM just to have all the goodies that loader hooks will bring...)

from modules.

ljharb commented on May 23, 2024 4

ftr, I'm totally fine with getters and setters on module.exports not working - I find that an absurd pattern and nothing like it should be possible in ESM anyways. CJS mocking is certainly achievable without getters/setters.

from modules.

mcollina commented on May 23, 2024 2

I think this is a must have. We should provide a simple enough mechanism from APMs to hook in both esm and cjs.

cc @watson

from modules.

mcollina commented on May 23, 2024 2

I do not see a short-term future where a Node.js application will not a have a commonjs module in its dependency tree. Having support for cjs loader hooks will provide a safe transition to ESM.
Most enterprise Node.js deployments include an APM solution, which is currently based on monkey patching.
Without cjs hooks, any apm vendor should have two solutions: one for cjs and one for esm. In fact, not supporting this will slow the adoption curve.

from modules.

ljharb commented on May 23, 2024 1

I think if it’s important enough to warrant first-class support in ESM, it’s important enough to have it in CJS. I also think the inverse is true.

from modules.

giltayar commented on May 23, 2024 1

@mcollina - wouldn't the existing solutions they have work with their existing CJS modules? If they do, and they should, why bother migrating to another, albeit newer and better, solution?

Having said that, it would be nice to have a solution that works similarly and is configured similarly. But I would say that it's a nice to have, not a must have. If, in terms of code, it's not a big thing, then, yes. But if it requires significant work, I would prefer investing energy elsewhere.

Also, could you explain what "APM" means? 😳

from modules.

evanplaice commented on May 23, 2024 1

I see two glaring omission in the recommended implementation for loader hooks

A directory containing package.json will have a termination when crossing the directory.

This assumes that the package boundaries will be exposed by each import statement. ESM doesn't support this functionality due to the nature of facades.

Use Case 1:

rxjs/operators/index.js

export { flat } from './flat';
export { flatMap } from './flatMap';

rxjs/index.js

export { flat, flatMap } from 'rxjs/operators';

app.js

import { flat, flatMap } from 'rxjs';

Which would fire, entry, root, or operators hook; if more than one fire, then in what order?

Use Case 2:

At some point, ESM will have full native support in browsers and Node, eliminating the need to transpile down to ES5. Despite that fact, bundling will still be in wide use where code size and number of requests matters.

Take for instance the following article:
https://medium.com/friendship-dot-js/i-peeked-into-my-node-modules-directory-and-you-wont-believe-what-happened-next-b89f63d21558

It's not uncommon for users of NPM packages to express their disdain for the overly bloated and unnecessarily deep dependency trees that come with some packages. One solution to this approach is for a lib to provide optimized production bundles that tree shake any and all unnecessary code.

If/when ESM is reaches full compatibility between Node and browsers, it'll be more common to see FE workflows bleed into universal modules. Assuming a directory/package structure will remain static may work for CJS workflows but fails to take into consideration how ESM will be used.

I don't really have any solutions to these problems. ESM supports some features that favor new approaches that aren't backward-compatible with CJS.

From what I've seen in the front-end ecosystem, the facade design pattern provides a lot of benefits for library development (ie abstracting API from structure, hiding internals) over the traditional way of doing things (ie deep linking). It's already starting to show up in commonly used libraries.

from modules.

bmeck commented on May 23, 2024 1

@evanplaice I don't really understand your first point since it is laid out depending on package boundaries, but no package.json files are in your minimal case. Per execution order, it is fairly well defined in nodejs/node#18233 .

Which would fire, entry, root, or operators hook; if more than one fire, then in what order?

This appears to be having some names that are missing in the example given.

I'm going to rewrite the question as "What hooks fire when, assuming all the given files have a package boundary in the directory they are located in, and app.js is the entry point?". I'm going to put a leading / in all the paths for my clarity on specifier vs path.

/package.json resolve('rxjs', '/app.js')

I assume ends up resolving to /rxjs/index.js given the example

/rxjs/package.json resolve('./index.js', '/rxjs')

I assume this leaves the path unchanged. That ends up the first resolve and now the loader gets to loading rxjs/index.js.

/rxjs/package.json resolve('rxjs/operators', '/rxjs/index.js')

I assume ends up resolving to /rxjs/operators/index.js given the example

/rxjs/operators/package.json resolve('./index.js', '/rxjs/operators/')

I assume this leaves the path unchanged. That ends up the second resolve and now the loader gets to loading ./flat and ./flatMap.

/rxjs/operators/package.json resolve('./flat', '/rxjs/operators/index.js')

If this doesn't cross a package boundary so is only handled by this loader.

/rxjs/operators/package.json resolve('./flatMap', '/rxjs/operators/index.js')

If this doesn't cross a package boundary so is only handled by this loader.

I'm still unclear on the question and example, but hopefully that gives a clear idea of the order of operations.

For point 2, that article is satire but true. I'm not sure I understand how this relates to supporting --loader hooks to require though.

from modules.

mikesamuel commented on May 23, 2024 1

@bmeck Agreed.

my threat model treats user loaders as trusted code.

from modules.

bmeck commented on May 23, 2024

@ljharb within reason, ESM was not made with CJS interop as a first class citizen. I would prefer if something is only viable in CJS and not ESM it not be put as a priority.

from modules.

devsnek commented on May 23, 2024

@giltayar APM = Application performance management, mostly hooking into node core to time stuff and catch errors

from modules.

bmeck commented on May 23, 2024

There are also other common workflows like Mocking or testing that use similar workflows of replacing a module with a virtual/temporary one.

from modules.

ljharb commented on May 23, 2024

It's absolutely essential to any adoption of ESM that things like proxyquire and jest.mock and others have a way to function - iow, there needs to be a way to "mock out" a default or named import.

Supporting this natively in node for ESM is a must; if we can also support it in a similar fashion for CJS, that's very valuable for a migration path - iow, if half my modules are CJS and the other half ESM, i'd want to mock out a foo whether it's required or imported.

from modules.

bmeck commented on May 23, 2024

@ljharb that seems fine, but various things like setters/getters on the bindings are not possible still even though CJS mocking commonly does that to properties off module.exports. If we have specific cases where they don't work (like having getter/setter named exports) in both I'd prefer not to have the functionality exposed.

from modules.

zackschuster commented on May 23, 2024

@bmeck for my own benefit, this is similar to/drawing from npm's loaderhooks spec?

from modules.

bmeck commented on May 23, 2024

@zackschuster this existed prior to theirs so idk which should be seen as drawing from which.

from modules.

zackschuster commented on May 23, 2024

@bmeck ah, i didn't know that. it sounded similar to their proposal so i wanted to make sure 😄 thanks!

from modules.

bmeck commented on May 23, 2024

@zackschuster theirs is basically an exact replica of Node's, there are some changes to discuss around problems with the hooks and APIs. In particular Node's intentionally does not require/allow runtime mutation of the loader semantics except by the loader instrumenting it themselves.

from modules.

zackschuster commented on May 23, 2024

@bmeck ah, hence the APM references. this is making more sense to me, thank you for taking the time to explain things.

from modules.

mikesamuel commented on May 23, 2024

I have two use cases for loader hooks that covers CJS modules and would be nice to have for ESMs.

Per "Dynamic Bundling", I'd like to be able to write (parent, child) on every load to build a module dependency graph during testing which I later prune and use to generate bundle metadata.

In production, I use the derived metadata to whitelist loads and check resource integrity.

Right now, I'm hacking updateChildren(parent, children, scan) in lib/module.js.

Both steps require some configuration, so it'd be nice to be able to provision a loader with flags or a pointer to a config file.

I'd also like to experiment with ways to allow restricting access to modules with APIs that are prone to misuse.
Sometimes we design very powerful APIs that require a certain level of expertise to use correctly,
and then wrap those in APIs meant for general use which make it harder to footgun one's self.

In other contexts, we've had success by adding static checks in linkers and build systems to enable a workflow where we carefully review and whitelist the wrapper modules to make sure they use the misuse-prone APIs correctly, then disallow direct access by other modules.

I'd like to do a POC in a node context.

I believe it would be sufficient to

be notified on the end of a module load so we can lock down out-of-band loads of sensitive modules like entries in the module cache
be able to veto before a loader introduces a module to a parent based on a predicate over (parent, child).

(1) might be sufficient because there are tricks we can do with proxy objects and stack introspection but each of those have their own problems.

from modules.

bmeck commented on May 23, 2024

@mikesamuel can you clarify

be able to veto before a loader introduces a module to a parent based on a predicate over (parent, child).

Say I have a loader that wants to do this behavior. It is given ~ (request_string, parent_url) and it returns (url). At what point are trying to stop propagation of the import and produce an error?

Currently loaders do not automatically ever call the "parent" loader. So if this is the only loader, it could cause import to fail by:

throwing an Error manually
returning an invalid URL that cannot be loaded

from modules.

bmeck commented on May 23, 2024

If we put the loaders which are already somewhat isolated into a thread we could use a single async resolve for both module systems. The hooks would need to define what mechanism is being used to load things however so that it can switch on various differences like if the specifier is a URL component for import or a path for require.

from modules.

mikesamuel commented on May 23, 2024

@bmeck

Say I have a loader that wants to do this behavior. It is given ~ (request_string, parent_url) and
it returns (url). At what point are trying to stop propagation of the import and produce an error?

Currently loaders do not automatically ever call the "parent" loader. So if this is the only loader,
it could cause import to fail by:

I'm still trying to wrap my head around all the things that 6.1.2 lets you do.

How does the lack of implicit parent loader chaining affect things?
Is that a concern for vm contexts?

Not chaining on a vetoed resolve seems like it would fail safe, but if an error from a loader
causes a fallback to a sibling loader then I'd prefer option (3) below.

throwing an Error manually

returning an invalid URL that cannot be loaded

I would be happy with either of those or even just

Substitutes a powerless module, e.g. one that exports nothing or
a stub that is API compatible but which has no side effects.

I think any solution needs to prevent:

Prevent module initialization code from running
Log sufficient context to make it clear what module didn't load, where the resolution started, and that it was vetoed because of a whitelist constraint.

I'll defer to people who better understand the loading process on whether the runtime should cache the fact that a module load failed and which solution achieves the right balance.

Logs need not reflect multiple attempts to load the same module identifier.

from modules.

bmeck commented on May 23, 2024

@mikesamuel I think the problem might be that you linked to a spec that is no longer being developed to land in browsers or node. Node hooks are minimal and documented at https://nodejs.org/dist/latest-v9.x/docs/api/esm.html

from modules.

mikesamuel commented on May 23, 2024

@bmeck Sorry, I'm an idiot.

That node hook doc makes no mention of "parent loaders." Should I be looking at your "RFC: Per Package Loader Hooks #18233"?

Loaders therefore need to have a concept of a parent loader hooks to defer to, or to ignore.

from modules.

bmeck commented on May 23, 2024

@mikesamuel you can look at that or nodejs/node#18914

from modules.

mikesamuel commented on May 23, 2024

@bmeck.

Thanks. I see the loop at https://github.com/bmeck/node/blob/6b94c9ccbfe70773001260ade15a60f81d583a17/lib/internal/process/modules.js#L56-L111

It looks like each userLoader gets to specify a factory that receives the previous links in the loader chain.

IIUC, earlier resolvers get to control what URL reaches later loaders, but the last loader gets to ultimately decide whether a load resolves to anything.

An error in a resolver would cause line 71 to resolve to a broken promise, so would propagate through the await in the next one.

So it looks like either of your options below would work.

throwing an Error manually

returning an invalid URL that cannot be loaded

An early user loader could defeat either by conspiring with a late user loader but my threat model treats user loaders as trusted code.

from modules.

bmeck commented on May 23, 2024

@mikesamuel can you explain the early loader and late loader vector? Early loaders can already rewrite any loader being setup during bootstrapping.

from modules.

mikesamuel commented on May 23, 2024

@bmeck

If an early loader can find a fixed point URL -- one that the middle loaders return unchanged -- then it can store the specifier in a side table it shares with the late loader and return the fixed point URL.

The middle loaders never see the specifier in the side table.

The late loader can then retrieve the value that the early loader squirreled away from the table.

There side table is just one kind of shared state that serves to make the communication independent of promise resolution order.

// conspiracy.js
exports const urlSideTable = new Map();

// early loader
import urlSideTable from './conspiracy.js'
let counter = 0
export async function resolve(specifier, parentModuleURL, defaultResolver) {
  const substitute = `url-that-other-loaders-will-approve-${counter++}`
  urlSideTable.set(substitute, specifier)
  return {
    url: new URL(substitute).href,
    format: 'esm'
  };
}

// late-loader.js
import urlSideTable from './conspiracy.js'
export async function resolve(specifier, parentModuleURL, defaultResolver) {
  const original = urlSideTable.get(specifier)
  return /* some computation over original */
}

from modules.

bmeck commented on May 23, 2024

@mikesamuel I've thought about completely isolating their global space before, but don't think loaders can be seen as untrusted since you need to set them via CLI flags or environment variables on startup. My suggestion is that you can have your secure loader prevent all the following loaders from sharing the global space by putting them into a vm.Context if you want.

from modules.

bmeck commented on May 23, 2024

going to table this for now, we can come back later and reopen if desired.

from modules.

--loader hooks for require() about modules HOT 31 CLOSED

Comments (31)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs