Comments (1)
Hi Mike,
Your question is not trivial. I did some research on this in the past. Now,
Vasily (CCed) is continuing the research. Here is what I remember.
A. Regarding the spelling mistakes, it is possible to use a speller, such
as this: https://github.com/mrmarbles/wordsworth
To connect it to limdu, you can use a normalizer:
https://github.com/erelsgl/limdu#input-normalization
The normalizer is a function that takes a word and returns the same word
with spelling-mistakes corrected, according to the training set.
In my research, using a speller had little effect on the performance, so I
dropped it. The reason is, probably, that you need to collect a lot of
training data anyway, and after you collect e.g. 1000 training sentences,
most common spelling-mistakes are already in the train-set.
But, you can try and see whether it works for you.
B. Regarding named locations, this is more complicated. The two main
approaches that I know of are:
- Rule-based: use manually-written rules to detect the locations. Replace
them with a common term, such as "LOCATION". Then, use the classifier to
detect the intent. Then, put the location back into the intent. This may
work for small problems, but it is not very scalable. - The more common approach is "sequence classification". Instead of
classifying each sentence, you classify each word or sequence of words. It
works good in larger applications, but requires a lot of training effort. - You can also read about "information retrieval" or "information
extraction". There are several methods and one of them may fit your needs.
-- Erel
On Sun, Nov 15, 2015 at 12:40 AM, Michael Diarmid [email protected]
wrote:
Firstly, this library looks great!
Secondly, I'm experimenting with some ML for a little side project of mine
to help me learn and just wondering what I could achieve with this library
/ point me in the right direction as you seem extremely knowledgable on the
subject =][
'Where could i get a hot drink in Manchester?',
'whats the best coffee shop in manchestr',
'coolest cafe in Manc?'
]So for my examples you can see the various ways (spelling mistakes
intentional) of asking about a Cafe in a named location.As you can see there is lots of ways X activity can be asked about. At
most I'd have say 6 different types of activities (i.e going to a cafe, a
park, visiting a gym) so the options there are limited, but lots of ways of
describing them.Second part of the query is a location, sometimes an abbreviated location,
I have a list of all the possible locations and subsequent
abbreviations/aliases, how could i extract these as well as handling
spelling mistakes/typos with the potential that multiple locations could be
mentioned in the same sentence, at most 3 - 4 locations.The intent classifier seemed like a good start and basic tests seemed to
work, but unsure how to handle the issues above, i.e spelling (string
distance perhaps?) and named locations?Any pointers, examples etc would be appreciated
Thanks,
Mike—
Reply to this email directly or view it on GitHub
#33.
from limdu.
Related Issues (20)
- online training behaviour for missclassification HOT 2
- SVM wrapper doesn't work HOT 10
- Always the same result for Bayesian classifier HOT 1
- svmlinear fix HOT 8
- work in browser? HOT 2
- limdu.utils.test does not exist. HOT 9
- Label Classification Result correct? HOT 2
- Node.js version HOT 1
- Does limdu support json file for saving and loading classifier? HOT 2
- This package is no longer usable because "brain" library has been removed HOT 7
- Big multi label classifier on db HOT 5
- How do I know the progress of Training? HOT 6
- Visualize Correlations between data? HOT 3
- Dataset mutation
- Missed fields on cross-validation
- Incorrect Accuracy
- Confusion in README
- sprintf deprecation HOT 1
- Feature suggestions HOT 3
- Security Notice & Bug Bounty - Command Injection - huntr.dev HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from limdu.