axa-group / nlp.js Goto Github PK
View Code? Open in Web Editor NEWAn NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more
License: MIT License
An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more
License: MIT License
Add a library that generates documentation of the api from jsDoc comments, add the task to package.json, generate a first version documentation (for sure it will need to be refined), and upload.
Remember to put the documentation path in the .npmignore, because the docs will not be needed in the npm package.
Is your feature request related to a problem? Please describe.
Japanese currently right now only works with katakana, beacause the Natural stemmer only supports katakana.
Describe the solution you'd like
Support Katakana, Hiragana and Joyo Kanji, perhaps it can be achieve with a Kanji -> hiragana -> katakana translation.
Describe alternatives you've considered
Doing a complete stemming on hiragana and katakana. Synonims over Kanji. Translate to base romanji.
Describe the bug
Some languages are not working with language detection that should. The problem seems to be franc
is using one set of language codes and this package is using another.
To Reproduce
const { Language } = require('node-nlp');
const language = new Language();
console.log(language.guessBest('你叫什么名字?')); // Returns `undefined`
Expected behavior
Should show the language is Chinese.
Additional context
zh
or zho
is language code used in lib/language/languages.json
, but franc
uses cmn
to represent Mandarin Chinese. Updating the 3-character code to match that from franc
seems to work.
Is browser support on the roadmap? I have tried to use this module an Angular 6 application?
Hello, I've built an app that constantly trains new data.
However, it seems I cannot reliably use the NLP server while a train()/save() is in progress.
What is the best solution to overcome this issue?
Describe the solution you'd like
Describe alternatives you've considered
Load lists of synonyms that are available in GH (like word-net)
It's not urgent
Describe the bug
When indicating toCity and date and after replying to the fromCity question, answer misses fromCity even if context is complete.
Same bug when indicating fromCity + date.
May be related to builtin (date) working as final slot? Order in addSlot shouldn't count (?)
To Reproduce
[After upgrading to 2.1.1 (I spent 3h pulling hair with 2 different versions of node-nlp on the same machine)]
I can't say for the MSBot, but the adaptation to terminal exposed this behavior:
travel to mpl today
bot> From where you are traveling?
bcn
bot>You want to travel from to mpl today
travel from bcn today
bot> Where do you want to go?
mpl
{ date: 'today', fromCity: 'bcn', toCity: 'mpl' }
bot> You want to travel from bcn to today
Expected behavior
You want to travel from bcn to mpl today
That only work for:
travel to mpl
bcn
today
and
travel from bcn
mpl
today
Additonal info
now
is correctly interpreted as a datetime entity, but not showing in the final answer AND removing toCity (shifting?):
now
undefined (say(result))
{ toCity: 'mpl', fromCity: 'bcn', datetime: 'now', date: 'now' } (say(context))
bot> You want to travel from to mpl
I wish I could help more in the future and hope QT helps 😄
Describe the bug
It looks like entity is set using builtin enity recognition only.
Therefore, you can't extract what you want using regex.
To Reproduce
NER:
Using the example, {{hashtag}} converts to #proudtobeaxa
Expected behavior
It shloud extract proudtobeaxa
as %hashtag% since the group doesn't include "#" in the NER regex group.
Additional question
Any way to extract 2 groups from regex like /\b\#(\w+)[, ]\#(\w+)\b/ig
to %hastag1% %hastag2%?
Describe the bug
Currently a context object can be provided to the NlpManger.process(). The NLG answers can be conditional based on conditions on context variables. This is described in the tests and the excel provided in the tests "It should use context if conversation id is provided":
https://github.com/axa-group/nlp.js/blob/master/test/recognizer/recognizer.test.js
Also, the Microsoft Recognizer, automatically creates a context manager, so when inside a conversation it adds the last retrieved entities into the conversation.
This allows something like that: Suppose that you have two intents:
This allows to get a conversation like:
user> Who is Spiderman?
bot> Peter parker
user> Which are his powers?
bot> Super-agility
As you can see the second question does not contains the name of the hero because is automatically stored in the context so the user can continue the conversation without having to repeat it in each question.
Is your feature request related to a problem? Please describe.
Domain is a logical aggrupation of intents under a common topic.
This also open the way to have prebuilt domains, example, a domain can be "personality" and with one single line of code, the bot will be prefilled with common personality questions and answers. This will be discussed and develop in another topic.
Describe the solution you'd like
When adding an intent, a domain can be especified. If not especified will be "default", so the name of the domain is optional. As internally can be a charasteristic of an intent, there is no need of adding a class and refactor the NLP manager load and save. So it will be simply a property of each intent.
Describe alternatives you've considered
Another alternative is to have it as a tree, so the NLP Manager has a Domain Manager, and each Domain Manager contains the intents. This will be useful if Domain were a more complex class, with more properties or methods, but is not the case, and the KISS principle must be followed.
1.4.5
to 1.4.6
.This version is covered by your current version range and after updating it in your project the build failed.
brain.js is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.
Adds toFunction()
to RNNTimeStep and a number of fixes to do with hidden layers in recurrent nets.
There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.
Your Greenkeeper Bot 🌴
Describe the bug
Named Entity Recognition is not working as expected:
To Reproduce
const { NlpManager } = require('node-nlp');
const manager = new NlpManager({ languages: ['en'] });
manager.addRegexEntity('mail', /\b(\w[-._\w]*\w@\w[-._\w]*\w\.\w{2,3})\b/ig);
manager.addNamedEntityText('location', 'barcelona', ['en'], ['Barcelona', 'Barna']);
manager.addNamedEntityText('location', 'madrid', ['en'], ['Madrid']);
const result = manager.process('en', 'My mail is [email protected] and i live in madrid', {});
console.log(result);
Expected behavior
Currently it returns:
[ { start: 11,
end: 20,
levenshtein: 0,
accuracy: 1,
option: 'barcelona',
sourceText: 'Barcelona',
entity: 'location',
utteranceText: 'barcelona' },
{ start: 11,
end: 33,
accuracy: 1,
sourceText: '[email protected]',
utteranceText: '[email protected]',
entity: 'mail' } ]
It should return:
[ { start: 11,
end: 33,
accuracy: 1,
sourceText: '[email protected]',
utteranceText: '[email protected]',
entity: 'mail' },
{ start: 48,
end: 53,
levenshtein: 0,
accuracy: 1,
option: 'madrid',
sourceText: 'Madrid',
entity: 'location',
utteranceText: 'madrid' }
]
Describe the bug
The string "Thu, Nov 1, 2018 at 5:06 PM"
is not recognised as a date/datetime, despite being a very common way to describe a date. Ideally I would like to be able to recognise the whole datetime, but even getting the date from "Thu, Nov 1, 2018"
would be helpful.
As a subset of this problem, I have notices that dates in the format "7 Nov 2018"
are not recognised either.
To Reproduce
Steps to reproduce the behavior:
import { NerManager } from "node-nlp";
const manager = new NerManager();
const results = await manager.findEntities("This message was sent: Thu, Nov 1, 2018 at 5:06 PM, and I expect a reply before 7 Nov 2018");
const dates = results.filter(entity => {
return entity.entity === "date";
});
console.log(dates);
// []
Expected behavior
Output something like:
[
{
"start": 0,
"end": 9,
"len": 10,
"accuracy": 0.95,
"sourceText": "Thu, Nov 1, 2018 at 5:06 PM",
"utteranceText": "Thu, Nov 1, 2018 at 5:06 PM",
"entity": "date",
"resolution": {
"type": "date",
"timex": "2018-11-01",
"strValue": "2018-11-01",
"date": "2018-11-01T17:06:00.000Z"
}
},
{
"start": 0,
"end": 9,
"len": 10,
"accuracy": 0.95,
"sourceText": "7 Nov 2018",
"utteranceText": "7 Nov 2018",
"entity": "date",
"resolution": {
"type": "date",
"timex": "2018-11-07",
"strValue": "2018-11-07",
"date": "2018-11-07T00:00:00.000Z"
}
}
]
Screenshots
NA
Desktop (please complete the following information):
Additional context
The rest of the entity extraction works like a charm (numbers, emails, etc.).
I would be happy to help out on this if somebody could point me in the right direction 😀
Describe the bug
https://github.com/axa-group/nlp.js/blob/master/docs/nlp-classifier.md
Must be updated to have Tamil(ta) and 27 instead of 26 languages
Describe the bug
The NLU-Benchmark right now is returning a 0.89 or score instead of 0.90. This is due to the fact that manager.train() is called sync and not awaited, so the process start before training end. Is only to put the await to the manager.train();
To Reproduce
Steps to reproduce the behavior:
Is your feature request related to a problem? Please describe.
Training time can be reduced by using worker_threads (when available based on the node version).
Describe the solution you'd like
Computing Thetas does a descend of gradient that is unique for each classification label, so the calculation of the theta can be theorically executed threaded:
https://github.com/axa-group/nlp.js/blob/master/lib/math/mathops.js#L165
I would like to know, how can I isolate a specific entity. I don't know if it's a bug but I would like to isolate an entity in my intent with this pattern :
'%BOOK% %PAGE_START% %PARAGRAPH_START%'
in the result I have PAGE_START in double and PARAGRAPH_START in double :
...
"intent": "[BOOK] search_paragraph",
"domain": "default",
"score": 0.9987136557407928,
"entities": [
{
"start": 0,
"end": 2,
"len": 3,
"levenshtein": 0,
"accuracy": 1,
"option": "DAILY_PLANET",
"sourceText": "Daily",
"entity": "BOOK",
"utteranceText": "dai"
},
{
"start": 4,
"end": 4,
"len": 1,
"levenshtein": 0,
"accuracy": 1,
"option": "1",
"sourceText": "2",
"entity": "PAGE_START",
"utteranceText": "2"
},
{
"start": 6,
"end": 6,
"len": 1,
"levenshtein": 0,
"accuracy": 1,
"option": "1",
"sourceText": "3",
"entity": "PAGE_START",
"utteranceText": "3"
},
{
"start": 4,
"end": 4,
"len": 1,
"levenshtein": 0,
"accuracy": 1,
"option": "1",
"sourceText": "2",
"entity": "PARAGRAPH_START",
"utteranceText": "2"
},
{
"start": 6,
"end": 6,
"len": 1,
"levenshtein": 0,
"accuracy": 1,
"option": "1",
"sourceText": "3",
"entity": "PARAGRAPH_START",
"utteranceText": "3"
}
],
...
I would like to have only 3 entities in the response (and not the double PAGE_START and PARAGRAPH_START) :
How can I have that please ? It's a bug ?
Hi, how I do Q&A from a dataset like SQuAD? I have a dataset Turkish, I will change configurations for Turkish but ı am not know how ı do it, are you have any Q&A script training data from SQuAD DataSet? Or How I do it?
Is your feature request related to a problem? Please describe.
Currently the documentation is only in the README.md, and as it grows, it is becoming bigger and harder to read.
Describe the solution you'd like
Create a docs folder, and split documentation into smaller md files with partial information, and the README.md should contain the basic information to work with the library (install, basic usage, license...), and a Table of Contents pointing to the correct md file and hash.
DON'T READ, scroll to the end
Describe the bug
Replying with a city name is not correctly interpreted.
To Reproduce
See previous bug report. Same code. No error.
i want to travel today to London
From where you are traveling?{"locale":"en","localeIso2":"en","language":"English","utterance":"i want to travel today to London","classification":[{"label":"travel","value":1}],"intent":"travel","domain":"default","score":1,"entities":[{"start":17,"end":21,"len":5,"accuracy":0.95,"sourceText":"today","utteranceText":"today","entity":"date","resolution":{"type":"date","timex":"2018-10-15","strValue":"2018-10-15","date":"2018-10-15T00:00:00.000Z"}},{"type":"afterLast","start":26,"end":31,"len":6,"accuracy":0.99,"sourceText":"London","utteranceText":"London","entity":"toCity"}],"sentiment":{"score":-0.275,"comparative":-0.03928571428571429,"vote":"negative","numWords":7,"numHits":1,"type":"senticon","language":"en"},"srcAnswer":"From where you are traveling?","answer":"From where you are traveling?","slotFill":{"localeIso2":"en","intent":"travel","entities":[{"start":17,"end":21,"len":5,"accuracy":0.95,"sourceText":"today","utteranceText":"today","entity":"date","resolution":{"type":"date","timex":"2018-10-15","strValue":"2018-10-15","date":"2018-10-15T00:00:00.000Z"}},{"type":"afterLast","start":26,"end":31,"len":6,"accuracy":0.99,"sourceText":"London","utteranceText":"London","entity":"toCity"}],"answer":"You want to travel from to London today","srcAnswer":"You want to travel from {{ fromCity }} to {{ toCity }} {{ date }}","currentSlot":"fromCity"}}. :( (-0.275)
Barcelona
Sorry, I don't understand, {"locale":"en","localeIso2":"en","language":"English","utterance":"Barcelona","classification":[{"label":"travel","value":0.5}],"intent":"None","domain":"default","score":1,"entities":[],"sentiment":{"score":0,"comparative":0,"vote":"neutral","numWords":1,"numHits":0,"type":"senticon","language":"en"}}.
Expected behavior
Record reply as city name and give final answer You want to travel from {{ fromCity }} to {{ toCity }} {{ date }}
Additional context
Updated node to 8.12. No more regex error.
Should I add city names entities?
It looks like currentSlot is forgotten after output.
May be my code flushes the memory state (waiting for input to fill the slot).
Here's the generated model:
{
"settings": {
"fullSearchWhenGuessed": true,
"useNlg": true,
"useNeural": true
},
"languages": [
"en"
],
"intentDomains": {
"travel": "default"
},
"nerManager": {
"settings": {},
"threshold": 0.8,
"builtins": [
"Number",
"Ordinal",
"Percentage",
"Age",
"Currency",
"Dimension",
"Temperature",
"DateTime",
"PhoneNumber",
"IpAddress",
"Boolean",
"Email",
"Hashtag",
"URL"
],
"namedEntities": {
"fromCity": {
"type": "trim",
"name": "fromCity",
"localeFallback": {
"*": "en"
},
"locales": {
"en": {
"conditions": [
{
"type": "between",
"options": {
"skip": [
"travel"
]
},
"leftWords": [
"from"
],
"rightWords": [
"to"
],
"regex": "/(?<= from )(.*)(?= to )/gi"
},
{
"type": "afterLast",
"options": {
"skip": [
"travel"
]
},
"words": [
"from"
]
}
]
}
}
},
"toCity": {
"type": "trim",
"name": "toCity",
"localeFallback": {
"*": "en"
},
"locales": {
"en": {
"conditions": [
{
"type": "between",
"options": {
"skip": [
"travel"
]
},
"leftWords": [
"to"
],
"rightWords": [
"from"
],
"regex": "/(?<= to )(.*)(?= from )/gi"
},
{
"type": "afterLast",
"options": {
"skip": [
"travel"
]
},
"words": [
"to"
]
}
]
}
}
}
}
},
"slotManager": {
"travel": {
"toCity": {
"intent": "travel",
"entity": "toCity",
"mandatory": true,
"locales": {
"en": "Where do you want to go?"
}
},
"fromCity": {
"intent": "travel",
"entity": "fromCity",
"mandatory": true,
"locales": {
"en": "From where you are traveling?"
}
},
"date": {
"intent": "travel",
"entity": "date",
"mandatory": true,
"locales": {
"en": "When do you want to travel?"
}
}
}
},
"classifiers": [
{
"language": "en",
"docs": [
{
"intent": "travel",
"utterance": [
"i",
"want",
"to",
"travel",
"from",
"fromciti",
"to",
"tociti",
"date"
]
}
],
"features": {
"i": 1,
"want": 1,
"to": 2,
"travel": 1,
"from": 1,
"fromciti": 1,
"tociti": 1,
"date": 1
},
"logistic": {
"observations": {
"travel": [
[
1,
2,
3,
4,
5,
6,
7
]
]
},
"labels": [
"travel"
],
"observationCount": 1
},
"useNeural": true,
"neuralClassifier": {
"settings": {
"config": {
"activation": "leaky-relu",
"hiddenLayers": [],
"learningRate": 0.1,
"errorThresh": 0.0005
}
},
"classifierMap": {}
}
}
],
"responses": {
"en": {
"travel": [
{
"response": "You want to travel from {{ fromCity }} to {{ toCity }} {{ date }}"
}
]
}
}
}
We are near to publish the version 2.0.0, with async process, transformations, slot filling... and so many features.
How to evolve to the version 3.0.0? From my point of view, evolving to a monorepo with lerna can be positive.
Pros:
Contras:
What do you think?
hi,
Is your feature request related to a problem? Please describe.
can be possible use addRegexEntity in manager for create a wildcard?
NlpManager produces false positives with score 1
sample code:
`const { NlpManager } = require('node-nlp');
const manager = new NlpManager({
languages: ['de'],
});
manager.addDocument('de', 'ich will auto kaufen', 'buy');
(async () => {
await manager.train();
console.log(await manager.process('ich will auto kaufen'));
console.log(await manager.process('ich habe hunger'));
})();
`
ver. 2.0.2 result
{ locale: 'de', localeIso2: 'de', language: 'German', utterance: 'ich will auto kaufen', classification: [ { label: 'buy', value: 0.9975597509481738 } ], intent: 'buy', domain: 'default', score: 0.9975597509481738, entities: [], sentiment: { score: 0, comparative: 0, vote: 'neutral', numWords: 4, numHits: 0, type: 'senticon', language: 'de' } } { locale: 'de', localeIso2: 'de', language: 'German', utterance: 'ich habe hunger', classification: [ { label: 'buy', value: 0.8180665881599193 } ], intent: 'buy', domain: 'default', score: 0.8180665881599193, entities: [], sentiment: { score: -0.0565, comparative: -0.018833333333333334, vote: 'negative', numWords: 3, numHits: 1, type: 'senticon', language: 'de' } }
with current version:
{ locale: 'de', localeIso2: 'de', language: 'German', utterance: 'ich will auto kaufen', classification: [ { label: 'buy', value: 1 } ], intent: 'buy', domain: 'default', score: 1, entities: [], sentiment: { score: 0, comparative: 0, vote: 'neutral', numWords: 4, numHits: 0, type: 'senticon', language: 'de' } } { locale: 'de', localeIso2: 'de', language: 'German', utterance: 'ich habe hunger', classification: [ { label: 'buy', value: 1 } ], intent: 'buy', domain: 'default', score: 1, entities: [], sentiment: { score: -0.0565, comparative: -0.018833333333333334, vote: 'negative', numWords: 3, numHits: 1, type: 'senticon', language: 'de' } }
Hello,
I have built an app where users constantly train new intents/utterances & answers into the model. And then the model constantly re-trains itself every 4 minutes.
I have noticed that the scores of utterances that users ask keep decreasing.
For example, one user created an utterance/intent for "What is your favorite color?". Typing the exact utterance used to return ~.9 score. Now it returns ~.3 score.
What is the cause of this and how can I reliably solve this problem?
Is your feature request related to a problem? Please describe.
It's not a problem, but it would be much easier to maintain the project with prettier (less changes, single code style)
Describe the solution you'd like
Add prettier, and lint-staged/husky packages for formatting code on commit
What about migrating codebase to TS? Not the whole codebase in one time, but at least add support for tsc to make at possible add types further.
Is your feature request related to a problem? Please describe.
The .nlp files are very big. Most of the space is consumed by the classification matrix of each label and the weights matrix. The classification matrix is a matrix of 0/1 values per cell, when each cell position represents a label, so can be represented in another way more reduced.
Describe the solution you'd like
Currently the classification matrix of each label stored like that (imagine 20 features):
[ [ 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ],
[ 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ],
[ 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1 ] ]
The more features, the more zeros will be in each vector, so the bigger the model is, more space will be saved.
The proposal is to store each vector as an object. Imagine that the feats are labeled "feat0", "feat1", ...
Then the previous matrix can be stored as:
[ { feat3: 1, feat4: 1, feat9: 1 },
{ feat3: 1, feat4: 1, feat8: 1, feat9: 1 },
{ feat4: 1, feat9: 1, feat19: 1 } ]
This compression should be done at the save() method of the NLP Manger, but also a descompression from object to vector. For the descompression to be faster, avoid the use of indexOf in an array of features, and an dictionary object that relates feature name with position should be used. The idea on the descompresion is to generate a zero vector with the length equals to the number of features, and then put ones in the positions of the features.
Is your feature request related to a problem? Please describe.
Not related to a specific problem. It would be awesome to have a way of importing and exporting models from sources other than files. Examples: from a persistent database or from memory.
Describe the solution you'd like
I imagine that the following class methods could work:
NlpManager.import(data)
takes a string
, parses it as JSON, and incorporates it into the class.
NlpManager.export()
returns a JSON as a string
(or maybe a plain object) that can be saved anywhere (a database, a variable, etc.)
Describe alternatives you've considered
I've considered extending the original NlpManager
class. In fact, I've done it (check it out).
Although my alternative definitely works, I feel like this is a feature that would be very useful if it was integrated with the library. Also, any additional changes in how nlp.js
handles models would probably break any class extension if not considered correctly.
Additional context
GIST: Extending NlpManager to add .export() and .import()
I can definitely submit a PR with the .import()
and .export()
methods in NlpManager, but I figured it would be better to submit an issue first in case this is something you've already considered or I'm missing something.
Thank you for this library, it's pretty awesome! 👋
When I try Example Script, getting these error..
mrpeker@MrPeker-MacBook-Air ~/Desktop/nlp.js-master/examples/console-bot node index.js 1 ↵ 3354 02:08:15
Say something!
hello
/Users/mrpeker/Desktop/nlp.js-master/examples/console-bot/index.js:51
if (result.sentiment.score !== 0) {
^
TypeError: Cannot read property 'score' of undefined
at Interface.rl.on (/Users/mrpeker/Desktop/nlp.js-master/examples/console-bot/index.js:51:26)
at Interface.emit (events.js:180:13)
at Interface._onLine (readline.js:285:10)
at Interface._normalWrite (readline.js:433:12)
at ReadStream.ondata (readline.js:144:10)
at ReadStream.emit (events.js:180:13)
at addChunk (_stream_readable.js:274:12)
at readableAddChunk (_stream_readable.js:261:11)
at ReadStream.Readable.push (_stream_readable.js:218:10)
at TTY.onread (net.js:581:20)
Describe the bug
When guessing very short utterance, classification is correct but language is guessed as Spanish, hence not throwing english reply.
User: [email protected]
AxaBot:
Sorry, I don't understand, {"locale":"es","localeIso2":"es","language":"Spanish","utterance":"[email protected]","classification":[{"label":"email2","value":0.9819053927264487},{"label":"email","value":0.6188665942300879},{"label":"realname","value":0.12743325382186388},{"label":"whois","value":0.08484683107101253},{"label":"whereis","value":0.08484683107101253},{"label":"hashtag","value":0.029910769295364337}],"intent":"email2","domain":"default","score":0.9819053927264487,"entities":[{"start":0,"end":14,"accuracy":1,"sourceText":"[email protected]","utteranceText":"[email protected]","entity":"mail"}],"sentiment":{"score":0,"comparative":0,"vote":"neutral","numWords":3,"numHits":0,"type":"senticon","language":"es"}}.
Expected behavior
Default to any chosen language, or best, be able to force language for this input (once we guessed preferred language using prior talks guesses and wrote to user's settings).
Describe the bug
Out of memory error when training on a public set. Full error message below.
To Reproduce
Download, unzip the train.csv file, adapt it to XLS (NER is empty):
NLP:
intent | language | utterance
EAP | en | id26305, This process, however, afforded me no means of ascertaining the dimensions of my dungeon; as I might make its circuit, and return to the point whence I set out, without being aware of the fact; so perfectly uniform seemed the wall.
... (19568 lines!)
NLG:
intent | condition | language | response
EAP | | en | EAP
MWS | | en | MWS
HPL | | en | HPL
Expected behavior
A trained model saved to model.nlp
More globally
Could we have an example of some tests script using XLS?
** Full error message**
Training, please wait..
<--- Last few GCs --->
[5192:000FA808] 298302 ms: Mark-sweep 666.6 (736.9) -> 666.6 (736.9) MB, 596.3 / 0.0 ms allocation failure GC in old space requested
[5192:000FA808] 298985 ms: Mark-sweep 666.6 (736.9) -> 666.6 (720.9) MB, 682.9 / 0.0 ms last resort GC in old space requested[5192:000FA808] 299666 ms: Mark-sweep 666.6 (720.9) -> 666.6 (720.4) MB, 681.5 / 0.0 ms last resort GC in old space requested
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 048961D9 <JSObject>
1: /* anonymous */(aka /* anonymous */) [E:\disctut\node_modules\node-nlp\lib\nlp\nlp-classifier.js:~195] [pc=0C2E84FB](this=33C0417D <undefined>,srcToken=01E45131 <String[7]: id03416>)
2: arguments adaptor frame: 3->1
3: forEach(this=18768569 <JSArray[35184]>)
4: tokensToNeural [E:\disctut\node_modules\node-nlp\lib\nlp\nlp-classifier.js:195] [bytecode=01F6C4AD offset=153](this=061E773...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1: node_module_register
2: v8::internal::Factory::NewFixedArray
3: v8::internal::HashTable<v8::internal::SeededNumberDictionary,v8::internal::SeededNumberDictionaryShape>::IsKey
Describe the bug
The documentation of language support is not updated with the last one for builtin entity extraction.
Language support: https://github.com/axa-group/nlp.js/blob/master/docs/language-support.md
Builtin entity extraction: https://github.com/axa-group/nlp.js/blob/master/docs/builtin-entity-extraction.md
Is your feature request related to a problem? Please describe.
Machine learn, I need to tokenizer and stem words
Describe the solution you'd like
Create a function to tokenizer and stem
Describe alternatives you've considered
NaN
Additional context
My project https://github.com/ran-j/ChatBotNodeJS/blob/master/routes/index.js#L40
Is your feature request related to a problem? Please describe.
Currently a logistic regression classifier is used. Logistic regression classifier provides a great way to classify, but requires CPU intensive process to train.
A Bayes Classifier, on the other hand, don't need a CPU intensive process to train, so can grow in a progressive way without consuming time.
Why a Bayes Classifier is useful?
While a frontend is being developed, that will mean that users will be able to add intents and try the bot in the fly. If a Logistic Regression Classifier is used while the user teach the bot, too much time will be consumed because the user will train so often. One solution is to use a Bayes Classifier while teaching, and when deploying then train with a logistic regression classifier.
Also evaluate: it's possible to combine a Logistic Regression Classifier with a Bayes Classifier to get a better accuracy? Example: I've 30 intents, trained with Logistic Regression Classifier; I add a new intent, so if I write an utterance, it pass through the Bayes Classifier, if is identified as an intent that has been retrained continue with bayes, if is identified as an intent already trained with LRC, then pass through LRC.
Describe the bug
There is an exception being thrown from lib/math/mathops.js:113
(Unable to find minimum
) when training on certain data sets.
To Reproduce
const { NlpManager } = require('./lib');
const manager = new NlpManager({ languages: ['nb'] });
manager.addDocument('nb', 'foo', 'foo');
const input = "Orkanen Florence treffer fredag østkysten av USA som en såkalt kategori 1-orkan."
manager.addDocument('nb', input, 'bar');
manager.train();
It seems to be triggered by the digits in the input - e.g. removing the digit 1
fixes it.
I noticed that some utterances such as "hey" never gets properly trained into the model. Therefore, I can never correctly get the intent for these utterances.
So far I discovered "hey" is one such word.
Are there other utterances that never gets trained into the model? If so, what are they? What are the rules in determining such utterances getting ignored?
Is your feature request related to a problem? Please describe.
Currently we are using Recognizer Text Suite for default named entity extraction. Giving the user the choice between it or duckling is a good idea.
Describe the solution you'd like
In the ner-manager, when retrieving the entities, it should be done based on configuration. If a duckling is configurated, then duckling should be called, and the answer translated to the nlp.js format.
This also means that this part must be done async, because this will have a request to duckling.
Additional context
https://github.com/facebook/duckling
Is your feature request related to a problem? Please describe.
As we want to open the development to integration with APIs, example Duckling in the issue #15 , the process method must be asynchronous, while currently is synchronous.
Describe the solution you'd like
Async/await is welcome to have better syntax.
Another approach is to have NerManager.findEntities and NlpManger.process with versions Sync and Async (following node standards, only the sync version should have the suffix).
Is your feature request related to a problem? Please describe.
Currently we can know the entities related to a given intent by the text in the intent, and the intent structure will have the names of the entities so can be extracted.
The idea is to have slot filling, that is: for each entity we should know if is mandatory or optional, and in case of mandatory for each language we should have the question that the chatbot should ask to the user when the slot is not filled.
Example: Whe have the intent travel and 2 entities: location and date. There are four types of conversation:
The user fills all the slots
user> I want to travel tomorrow to London
bot> Ok, preparing your travel to London for 27/08/2028
The user fills date slot:
user> I want to travel tomorrow
bot> What is your destination?
user> London
bot> Ok, preparing your travel to London for 27/08/2028
The user fills location slot:
user> I want to travel to London
bot> When do you want to travel to London?
user> I want to travel tomorrow
bot> Ok, preparing your travel to London for 27/08/2028
The user does not fill any slot
user> I want to travel to London
bot> What is your destination?
user> I want to go to London
bot> When do you want to travel to London?
user> I want to travel tomorrow
bot> Ok, preparing your travel to London for 27/08/2028
Describe the solution you'd like
I think that can be implemented as a slot manager, so a new hard entity slot should be defined, and inside the NLP manager an implementation of the slot manager is given. Questions in the slots should be templated so they can use things from the context (as in the example when it say When do you want to travel to London ).
Also, the Microsoft Bot Framework Recognizer will need a hard work so when an answer is received with slots to fill, it should be able to take control of the dialog inserting a new artificial prompt or a new dialog state. About where the logic should be implemented, I think that should be send to the NLP manager as contextualized information. The reason is: imagine the previous example the user does not fill any slot so 2 slot questions are received, if the logic for both questions is at the recognizer and in the first question the user answer the 2 slots, the recognizer will not know and will answer the date even if the user already provided it.
Describe the bug
I can run the example and model.nlp is created, but there's no classification, nor answer:
No error is thrown.
To Reproduce
win 10, node.js 8.9.3, npm 5.5.1
npm i node-nlp
await is not working, changed to use .then()
(see screenshot)
Expected behavior
same result as example in readme.
Describe the bug
copy-pasting-adapting test example for slot throws:
SyntaxError: Invalid regular expression: /(?<= from )(.*)(?= to )/: Invalid group
To Reproduce
Copy paste Slot example.
Adapt to use npm package:
const { NlpManager } = require('node-nlp');
const modelName = './model.nlp';
const threshold = 0.7;
const nlpManager = new NlpManager();
nlpManager.addLanguage('en');
const fromEntity = nlpManager.addTrimEntity('fromCity');
fromEntity.addBetweenCondition('en', 'from', 'to', { skip: ['travel'] });
fromEntity.addAfterLastCondition('en', 'from', { skip: ['travel'] });
const toEntity = nlpManager.addTrimEntity('toCity');
toEntity.addBetweenCondition('en', 'to', 'from', { skip: ['travel'] });
toEntity.addAfterLastCondition('en', 'to', { skip: ['travel'] });
nlpManager.slotManager.addSlot('travel', 'toCity', true, {
en: 'Where do you want to go?',
});
nlpManager.slotManager.addSlot('travel', 'fromCity', true, {
en: 'From where you are traveling?',
});
nlpManager.slotManager.addSlot('travel', 'date', true, {
en: 'When do you want to travel?',
});
nlpManager.addDocument(
'en',
'I want to travel from %fromCity% to %toCity% %date%',
'travel'
);
nlpManager.addAnswer(
'en',
'travel',
'You want to travel from {{ fromCity }} to {{ toCity }} {{ date }}'
);
if (fs.existsSync(modelName)) {
nlpManager.load(modelName);
} else {
//nlpManager.loadExcel(excelName);
nlpManager.train()
nlpManager.save(modelName);
}
run and you get:
E:\disctut\node_modules\node-nlp\lib\ner\regex-named-entity.js:113
return new RegExp(str.slice(1, index), str.slice(index + 1));
^
SyntaxError: Invalid regular expression: /(?<= from )(.*)(?= to )/: Invalid group
at new RegExp (<anonymous>)
at Function.str2regex (E:\disctut\node_modules\node-nlp\lib\ner\regex-named-entity.js:113:12)
at languages.forEach.language (E:\disctut\node_modules\node-nlp\lib\ner\trim-named-entity.js:74:33)
at Array.forEach (<anonymous>)
at TrimNamedEntity.addBetweenCondition (E:\disctut\node_modules\node-nlp\lib\ner\trim-named-entity.js:65:15)
at Object.<anonymous> (E:\disctut\server.js:56:12)
at Module._compile (module.js:635:30)
at Object.Module._extensions..js (module.js:646:10)
at Module.load (module.js:554:32)
at tryModuleLoad (module.js:497:12)
at Function.Module._load (module.js:489:3)
at Function.Module.runMain (module.js:676:10)
at startup (bootstrap_node.js:187:16)
at bootstrap_node.js:608:3
BEFORE model is saved.
Expected behavior
working bot 😄
But I guess I messed up again somewhere in my adaptation?
Additional
How to add memory slots to XLS?
Is your feature request related to a problem? Please describe.
Currently the named entity extraction is done in three layers:
This comes with several problems: Recognizer text suite returns the units, metrics, dimensions, etc. translated to the target language, with no option to keep it in english to have a common interface in code. This causes that in code we have a dictionary structure like:
initializeDictionary() {
this.dictionary = {
Año: 'Year',
Mes: 'Month',
Día: 'Day',
Semana: 'Week',
Ans: 'Year',
Mois: 'Month',
Semaines: 'Week',
Jour: 'Day',
Ano: 'Year',
Mês: 'Month',
Dia: 'Day',
};
}
This is far from being done, so what we have to do is continue this work with other entities, completing this dictionary. And also have this dictionary as a json file, and perhaps splitted by language to be more maintainable and also avoid possible collition (same work that exists in different languages meaning different things).
Describe the solution you'd like
Describe the bug
Currently the example at https://github.com/axa-group/nlp.js/tree/master/examples/console-bot is not working properly, because the training now is asynchronous, so when the model is saved into file model.nlp, the weights are still not calculated.
To Reproduce
Steps to reproduce the behavior:
Recomended fix
manager.train() must have an await, so the function must be async.
At index.js, wrap the main code in an async main() and run main().
Is your feature request related to a problem? Please describe.
When a Trim named entity collide with another entity, instead of removing one of the edges, the trim entity can be reduced on size to fit with the other entity.
Example: Supose this utterance:
"I want to travel from Barcelona to London tomorrow".
With those entities:
fromEntity: between from and to or after last from
toEntity: between to and from (skip travel) or after last to
date entity
It will result into three edges:
When the edges are reduced, since "London tomorrow" and "tomorrow" collide, the one with more accuracy or more lenght when equal accuracy is the one that survives, the other is removed.
Describe the solution you'd like
The reduceEdges algorithm must take into account a first loop detecting collisions of Trim Named Entities with another ones, trying to split the Trim Named Entity. After that first loop, the normal edge collision is passed, resulting, for the provided example, in three edges:
Is your feature request related to a problem? Please describe.
Currently the JSON format returned is ok. But there are already NLUs on the market like LUIS, DialogFlow, Wit, RASA or Snips. The idea is to have transformations to the json formats of the market, so users that already have implementations with those formats, can use NLP.js without having to change their code behaviours.
Describe the solution you'd like
Implement a base clase for a transformer, and derivated class at least for LUIS and DialogFlow.
There will be a way to define a transformer to a NlpManager so when we call process, the answer is piped through the transformer.
Is your feature request related to a problem? Please describe.
At the documentation of NlpManger there is no information about how load, save, import or export works.
Is your feature request related to a problem? Please describe.
It's a typical feature that would be very useful to have built-in inside this great lib. As @sys.color in DialogFlow https://dialogflow.com/docs/reference/system-entities
Describe the solution you'd like
Extract colors of an input.
i.e:
"I have a red car"
Output:
{
"locale": "en",
"localeIso2": "en",
"language": "English",
"utterance": "I have a red car",
"classification": [{
"label": "color",
"value": 0.8567240019144264
}],
"intent": "car.getcolor",
"domain": "default",
"score": 0.8567240019144264,
"entities": [{
"start": 9,
"end": 11,
"len": 3,
"levenshtein": 0,
"accuracy": 1,
"option": "color",
"sourceText": "red",
"entity": "general",
"utteranceText": "red"
}],
...,
"srcAnswer": "ok, we took note of your color",
"answer": "ok, we took note of your color"
}
Describe the bug
Currently the save and load process of the NLP Manager take into account EnumNamedEntity and RegexNamedEntity classes, but not TrimNamedEntiy classes.
To Reproduce
Create a new NLP Manager, add some TrimNamedEntity into it, save and load: the model.nlp does not contains info about the trim named entities.
Hello,
After multiple tests, I'm stuck at regex in XLS :(
I searched issues and tried using the NER Manager example before filling this issue.
Adding an intent and response to the regex entity didn't throw any intent and response.
Added code (to answer
):
"Sorry, I don't understand, " + JSON.stringify(result) + ", ";
Input:
my mail is [email protected]
Response:
Sorry, I don't understand, {"locale":"en","localeIso2":"en","language":"English","utterance":"my mail is [email protected]","intent":"None","domain":"default","score":1,"entities":[{"start":11,"end":25,"accuracy":1,"sourceText":"[email protected]","utteranceText":"[email protected]","entity":"mail"}],"sentiment":{"score":0,"comparative":0,"vote":"neutral","numWords":6,"numHits":0,"type":"senticon","language":"en"}}, .
I may write to you
(of course, in the future I'd like I may write to you at %mail%
)
Software | Version |
---|---|
nlp.js |
2.1.0 |
node |
8.9.3 |
npm |
5.5.1 |
Operating System | Win10 Pro |
Thanks for your help!
Is your feature request related to a problem? Please describe.
Currently the NLG return answers based on locale and context conditions, and that is ok. But what if given an intent and conditions, we are able to return an script of predefined actions to be executed?
Examples:
Describe the solution you'd like
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.