tricinel / highlight-words Goto Github PK
View Code? Open in Web Editor NEWSplit a piece text into multiple chunks based on a search query, allowing you to highlight the matches afterwards.
License: MIT License
Split a piece text into multiple chunks based on a search query, allowing you to highlight the matches afterwards.
License: MIT License
Hi there!
First of all, I want to thank you for this library - really nice API and options for splitting text into chunks! Super helpful for a lot of different use cases ๐ฏ
I wanted to test the waters about a potential option maxLength
, allowing for aborting the matching process early, improving performance when matching small parts of large text content.
It could work as follows:
const chunks = highlightWords({
text: 'General Kenobi, years ago you served my father in the Clone Wars. Now he begs you to help him in his struggle against the Empire. I regret that I am unable to present my father's request to you in person, but my ship has fallen under attack and I'm afraid my mission to bring you to Alderaan has failed. I have placed information vital to the survival of the Rebellion into the memory systems of this R2 unit. My father will know how to retrieve it. You must see this droid safely delivered to him on Alderaan. This is our most desperate hour. Help me, Obi-Wan Kenobi, you're my only hope.',
query: 'o',
clipBy: 3,
// Return an array of chunks, with a total string length of 70 characters or less
maxLength: 70,
});
I would suggest that the algorithm always end on a non-matching chunk, so that there is always context around the matches on either side.
Workaround: I have a workaround that filters after the matching has taken place, but at that point, the performance impact has already been made:
let resultLength = 0;
let resultLengthWithinLimit = true;
const chunks = highlightWords({
text: content,
query: searchQuery,
clipBy: 3,
}).filter((chunk) => {
if (!resultLengthWithinLimit) return false;
resultLength += chunk.text.length;
// Start filtering out chunks at 90 characters (approximately
// 2 lines at 320px width and 14px font size), after the first
// non-matching chunk is encountered.
if (resultLength > 90 && !chunk.match) {
resultLengthWithinLimit = false;
}
return true;
});
Ah I was just thinking more about my use case (more info here: #6), and I was thinking maybe it would help to have an option clipByLength
in addition to clipBy
, which would clip with an ellipsis not by number of words but by number of characters?
Alternative: keep clipBy
for the actual number value, and add an option clipByType
with possible values of "words"
and "characters"
(defaults to "words"
)
Motivation: With only the clipBy
option, users may run into problems with super long "words" such as URLs that don't get clipped reasonably.
const text = 'My dog is a very good boy and is always eating his lunch.';
const chunks = highlightWords({
text,
query: 'is',
clipByLength: 7
});
Value for chunks
:
[
{
"text": "My dog ",
"match": false
},
{
"text": "is",
"match": true
},
{
"text": " a very ... oy and ",
"match": false
},
{
"text": "is",
"match": true
},
{
"text": " always ... ating h",
"match": false
},
{
"text": "is",
"match": true
},
{
"text": " lunch.",
"match": false
}
]
Hi @tricinel !
Just upgraded to Node.js ESM ("type": "module"
in package.json
) and highlight-words
is breaking ๐จ
import highlightWords from 'highlight-words';
// ๐ฅ Error on line below: highlightWords is not a function
highlightWords({
text: 'The quick brown fox jumped over the lazy dog',
query: 'over'
});
Workaround
import highlightWords from 'highlight-words';
highlightWords.default({
text: 'The quick brown fox jumped over the lazy dog',
query: 'over'
});
https://codesandbox.io/s/highlight-words-vanilla-forked-qcgv7l?file=/src/index.js
This is an unexpected result.
Line 15 in 8183ed5
Should also include RegExp
highlight-words library is giving Math.random vulnerability scan as high, can we please fix it soon and get a new version
That probably means you (or a plugin you are using) is using the JS native Math.random() which has been deemed insecure. You can replace those functions with the latest JS crypto stuff here: https://developer.mozilla.org/en-US/docs/Web/API/Crypto/getRandomValues
We are using Material-React-Table, and this is the only high warning we got because highlight-words is used by Material-React-Table. No other libraries used by Material-React-Table gave any warning. That probably means you (or a plugin you are using) is using the JS native Math.random() which has been deemed insecure. You can replace those functions with the latest JS crypto stuff here as a recommendation. What you commented makes sense, but if you can replace Math.random(), it will make our scan clean. We mentioned it only because this is the only warning we got when using Material-React-Table and sometimes companies refuse to use it just because of the warning even though its not anything significant. See the attached screenshot.
Is possible to use a regular expression in query
param?
Regards.
Hey @tricinel ๐ hope things are good with you!
I wanted to see whether you'd be open to also returning a Text Fragment (blog post) in addition to the chunks matched in a string.
Eg.
Modules in Web
for usage in the URL below, for matching and scrolling to that part of the page:details%20about%20the-,API,-%2C%20see%20Experimenting%20with
for usage in the URL below, for matching a specific instance of API in the page:Ideas for implementation can be inspired by the doGenerateFragment
function in fragment-generation-utils.js
from the text-fragments-polyfill
package:
If we have a lot of content to parse, we might incur a performance penalty.
Questions to answer:
Relates to #6
Hi @tricinel ๐ Happy new year! Hope you are well.
I have been experimenting with the "module": "Node16"
option in tsconfig.json
recently and found the declaration file for highlight-words
cannot be found when using these options:
Demo on StackBlitz (run yarn tsc
in the terminal to get the error):
https://stackblitz.com/edit/node-gwywi8?file=package.json,tsconfig.json,yarn.lock
The error message:
$ yarn tsc
index.ts:1:28 - error TS7016: Could not find a declaration file for module 'highlight-words'. '/home/projects/node-gwywi8/node_modules/highlight-words/dist/highlight-words.mjs' implicitly has an 'any' type.
Try `npm i --save-dev @types/highlight-words` if it exists or add a new declaration (.d.ts) file containing `declare module 'highlight-words';`
1 import highlightWords from 'highlight-words';
~~~~~~~~~~~~~~~~~
Found 1 error in index.ts:1
error Command failed with exit code 2.
tsconfig.json
{
"compilerOptions": {
"module": "node16",
"moduleResolution": "node16",
"noEmit": true,
"strict": true
},
"include": ["index.ts"]
}
package.json
{
"name": "node-starter",
"version": "0.0.0",
"type": "module",
"dependencies": {
"highlight-words": "^1.2.1",
"typescript": "^4.9.4"
}
}
index.ts
import highlightWords from 'highlight-words';
console.log(highlightWords);
Hi, nice lib!
just wondering if this work with Internet explorer browser?
Hello, I use the library for a search field and the matching algorithm ignores diacritics (for example apple matches รกpplรจ), but the highlighting generated from this library does not produce a match. An option to ignore diacritics when generating the chunks would be nice, since the input cannot just be stripped as the outputted chunks of course will be stripped as well.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.