Question about limdu HOT 1 CLOSED

erelsgl commented on July 24, 2024

Question

from limdu.

Comments (1)

erelsgl commented on July 24, 2024

Hi Mike,

Your question is not trivial. I did some research on this in the past. Now,
Vasily (CCed) is continuing the research. Here is what I remember.

A. Regarding the spelling mistakes, it is possible to use a speller, such
as this: https://github.com/mrmarbles/wordsworth
To connect it to limdu, you can use a normalizer:
https://github.com/erelsgl/limdu#input-normalization
The normalizer is a function that takes a word and returns the same word
with spelling-mistakes corrected, according to the training set.
In my research, using a speller had little effect on the performance, so I
dropped it. The reason is, probably, that you need to collect a lot of
training data anyway, and after you collect e.g. 1000 training sentences,
most common spelling-mistakes are already in the train-set.
But, you can try and see whether it works for you.

B. Regarding named locations, this is more complicated. The two main
approaches that I know of are:

Rule-based: use manually-written rules to detect the locations. Replace
them with a common term, such as "LOCATION". Then, use the classifier to
detect the intent. Then, put the location back into the intent. This may
work for small problems, but it is not very scalable.
The more common approach is "sequence classification". Instead of
classifying each sentence, you classify each word or sequence of words. It
works good in larger applications, but requires a lot of training effort.
You can also read about "information retrieval" or "information
extraction". There are several methods and one of them may fit your needs.

-- Erel

On Sun, Nov 15, 2015 at 12:40 AM, Michael Diarmid [email protected]
wrote:

Firstly, this library looks great!

Secondly, I'm experimenting with some ML for a little side project of mine
to help me learn and just wondering what I could achieve with this library
/ point me in the right direction as you seem extremely knowledgable on the
subject =]

[
'Where could i get a hot drink in Manchester?',
'whats the best coffee shop in manchestr',
'coolest cafe in Manc?'
]

So for my examples you can see the various ways (spelling mistakes
intentional) of asking about a Cafe in a named location.

As you can see there is lots of ways X activity can be asked about. At
most I'd have say 6 different types of activities (i.e going to a cafe, a
park, visiting a gym) so the options there are limited, but lots of ways of
describing them.

Second part of the query is a location, sometimes an abbreviated location,
I have a list of all the possible locations and subsequent
abbreviations/aliases, how could i extract these as well as handling
spelling mistakes/typos with the potential that multiple locations could be
mentioned in the same sentence, at most 3 - 4 locations.

The intent classifier seemed like a good start and basic tests seemed to
work, but unsure how to handle the issues above, i.e spelling (string
distance perhaps?) and named locations?

Any pointers, examples etc would be appreciated

Thanks,
Mike

—
Reply to this email directly or view it on GitHub
#33.

from limdu.

Question about limdu HOT 1 CLOSED

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs