GithubHelp home page GithubHelp logo

Comments (5)

dpvc avatar dpvc commented on June 2, 2024

Thanks for the report. I've looked into it and here is what is going on:

Because the number of entities (like ©) is fairly large, MathJax only has a limited number of them built in, and those are the ones designed for use in mathematics (e.g., ⊙ and ≥). The others are stored in external files that are loaded as needed. In your case, © is not on the default list, so MathJax will try to load it when you use it.

In a browser, this loading is asynchronous, and so involves some careful hand-shaking to manage the delay while waiting for the file to arrive, and that code surrounding the call to translate the entity must be aware that the delay can occur (and so must any code surrounding that, and so on).

In node applications, since there is no browser DOM, MathJax uses a light-weight implementation of the browser DOM called the LiteDOM. When MathJax tries to determine which handler to use for a particular document, it asks the handlers it knows about to see if they can handle the document. For the LiteDOM handler, it tries to parse the document to see if it can do so, and if it can, then MathJax uses that handler.

But during the parsing of the document containing ©, the LiteDOM has to load the external file for the entities beginning with "c" (the file referenced in the error message you cite). That involves the asynchronous calls for loading files described above; but the code that is is looking for the handler doesn't expect the check to be asynchronous, and that leads to the crash that you are seeing.

That certainly is a bug, and I will have to think about how best to handle it.

In the meantime, one way to work around the problem is to load all the entities up front so that the definition of © will be there when you need it. That can be done by adding

require('mathjax-full/js/util/entities/all.js');

to the tex2svg-page program after the line that loads AllPackages. That will allow the program to process the file. (I'm assuming you are using the copy in the direct subdirectory, here.)

This does lead to a second issue, however, which is that the output of the program will no longer have the entity, but instead will have the actual unicode character instead. This is because, just as in a browser, the LiteDOM translates entities to characters while it is parsing the file. This is necessary because you could have something like

When $x < y$, we have ...

in your file, and the &lt; needs to be converted to <. It is even possible that you have

When &#x24;x &lt; y&#x24, we have...

and the &#x24; must be converted to $ before MathJax looks for math delimiters. So the entity translation is an important step in the processing of the page. That means &copy; will be translated to © during the parsing of the document, and will be output as a unicode character, not a named entity, in the final result.

If you want to have &copy; in the final result, you will have to convert back from unicode characters to entities. That can be done by adding

function toEntity(c) {
  return '&' + c.charCodeAt(0).toString(16).toUpperCase() + ';';
}

const LiteParser = require('mathjax-full/js/adaptors/lite/Parser.js').LiteParser;
LiteParser.prototype.protectHTML = function (text) {
  return text.replace(/&/g, '&amp;')
             .replace(/</g, '&lt;')
             .replace(/>/g, '&gt;')
             .replace(/[^\u0000-\u007E]/g, toEntity);
}

just after the require() statement I gave you above. That will cause all non ASCII characters to be rendered as entities. But they will be numeric entities (like &x#A9;) not named entities (like &copy;).

To get named entities, you can do something like this:

const entityName = {
  0xA9 : 'copy'
};

function toEntity(c) {
  const n = c.charCodeAt(0);
  return '&' + (entityName[n] || '#x' + n.toString(16).toUpperCase()) + ';';
}

const LiteParser = require('mathjax-full/js/adaptors/lite/Parser.js').LiteParser;
LiteParser.prototype.protectHTML = function (text) {
  return text.replace(/&/g, '&amp;')
             .replace(/</g, '&lt;')
             .replace(/>/g, '&gt;')
             .replace(/[^\u0000-\u007E]/g, toEntity);
}

where you list the entities that you want to turn back into named entities. It would also be possible to generate the entityName list from the original name-to-character mapping using

const entities = require('mathjax-full/js/util/Entities.js').entities;
const entityName = {};
Object.keys(entities).forEach((name) => entityName[entities[name].codePointAt(0)] = name);

rather than giving the list yourself. If you want to include (unencoded) unicode in your document, then you might need to adjust the regex in last replace() in the protectHTML function to exclude the characters you don't want encoded, or make the toEntity() function more sophisticated so that it only encodes the characters that you want to.

In any case, I think you can get the results you want this way.

from mathjax-demos-node.

dkumor avatar dkumor commented on June 2, 2024

Thanks for the detailed response! I am happy to allow unicode characters in the output, so the replacement code is not necessary for me.

However, I have tried the suggestion of adding require('mathjax-full/js/util/asyncLoad/node.js'); here: https://github.com/mathjax/MathJax-demos-node/blob/master/direct/tex2svg-page#L36 , but this did not fix the issue of crashing on &copy;. Here is the exact code used:

tex2svg-page modified code
#! /usr/bin/env -S node -r esm

/*************************************************************************
 *
 *  direct/tex2svg-page
 *
 *  Uses MathJax v3 to convert all TeX in an HTML document.
 *
 * ----------------------------------------------------------------------
 *
 *  Copyright (c) 2018 The MathJax Consortium
 *
 *  Licensed under the Apache License, Version 2.0 (the "License");
 *  you may not use this file except in compliance with the License.
 *  You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 *  Unless required by applicable law or agreed to in writing, software
 *  distributed under the License is distributed on an "AS IS" BASIS,
 *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 *  See the License for the specific language governing permissions and
 *  limitations under the License.
 */

//
//  Load the packages needed for MathJax
//
const mathjax = require('mathjax-full/js/mathjax.js').mathjax;
const TeX = require('mathjax-full/js/input/tex.js').TeX;
const SVG = require('mathjax-full/js/output/svg.js').SVG;
const liteAdaptor = require('mathjax-full/js/adaptors/liteAdaptor.js').liteAdaptor;
const RegisterHTMLHandler = require('mathjax-full/js/handlers/html.js').RegisterHTMLHandler;

const AllPackages = require('mathjax-full/js/input/tex/AllPackages.js').AllPackages;
require('mathjax-full/js/util/asyncLoad/node.js');

//
//  Get the command-line arguments
//
var argv = require('yargs')
    .demand(1).strict()
    .usage('$0 [options] file.html > converted.html')
    .options({
        em: {
            default: 16,
            describe: 'em-size in pixels'
        },
        ex: {
            default: 8,
            describe: 'ex-size in pixels'
        },
        packages: {
            default: AllPackages.sort().join(', '),
            describe: 'the packages to use, e.g. "base, ams"'
        },
        fontCache: {
            default: 'global',
            describe: 'cache type: local, global, none'
        }
    })
    .argv;

//
//  Read the HTML file
//
const htmlfile = require('fs').readFileSync(argv._[0], 'utf8');

//
//  Create DOM adaptor and register it for HTML documents
//
const adaptor = liteAdaptor({fontSize: argv.em});
RegisterHTMLHandler(adaptor);

//
//  Create input and output jax and a document using them on the content from the HTML file
//
const tex = new TeX({packages: argv.packages.split(/\s*,\s*/)});
const svg = new SVG({fontCache: argv.fontCache, exFactor: argv.ex / argv.em});
const html = mathjax.document(htmlfile, {InputJax: tex, OutputJax: svg});

//
//  Typeset the document
//
html.render();

//
//  Output the resulting HTML
//
console.log(adaptor.outerHTML(adaptor.root(html.document)));

The error message is:

node_modules/mathjax-full/js/core/HandlerList.js:1
Error: Can't find handler for document

Unfortunately I am not too familiar with the MathJax code - is something more than the require needed?

from mathjax-demos-node.

dpvc avatar dpvc commented on June 2, 2024

Sorry, my fault. I copied the wrong line. It should have been

require('mathjax-full/js/util/entities/all.js');

I will change it in the original message, in case anyone else looks for the solution here.

from mathjax-demos-node.

dkumor avatar dkumor commented on June 2, 2024

Wonderful! This worked great! One little annoyance that was easy to work around is that tex2svg-page makes <!DOCTYPE html> tags disappear, which makes chrome go into quirks mode. I just made it prepend the output file with that tag.

For my purposes, this issue is solved. I will leave it open, since as I understand it does expose a bug, but feel free to close it once it is not useful.

Thank you very much for your help!

from mathjax-demos-node.

dpvc avatar dpvc commented on June 2, 2024

Thanks for confirming that it worked for you.

Good luck with your project.

from mathjax-demos-node.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.