GithubHelp home page GithubHelp logo

Question about limdu HOT 1 CLOSED

erelsgl avatar erelsgl commented on July 24, 2024
Question

from limdu.

Comments (1)

erelsgl avatar erelsgl commented on July 24, 2024

Hi Mike,

Your question is not trivial. I did some research on this in the past. Now,
Vasily (CCed) is continuing the research. Here is what I remember.

A. Regarding the spelling mistakes, it is possible to use a speller, such
as this: https://github.com/mrmarbles/wordsworth
To connect it to limdu, you can use a normalizer:
https://github.com/erelsgl/limdu#input-normalization
The normalizer is a function that takes a word and returns the same word
with spelling-mistakes corrected, according to the training set.
In my research, using a speller had little effect on the performance, so I
dropped it. The reason is, probably, that you need to collect a lot of
training data anyway, and after you collect e.g. 1000 training sentences,
most common spelling-mistakes are already in the train-set.
But, you can try and see whether it works for you.

B. Regarding named locations, this is more complicated. The two main
approaches that I know of are:

  1. Rule-based: use manually-written rules to detect the locations. Replace
    them with a common term, such as "LOCATION". Then, use the classifier to
    detect the intent. Then, put the location back into the intent. This may
    work for small problems, but it is not very scalable.
  2. The more common approach is "sequence classification". Instead of
    classifying each sentence, you classify each word or sequence of words. It
    works good in larger applications, but requires a lot of training effort.
  3. You can also read about "information retrieval" or "information
    extraction". There are several methods and one of them may fit your needs.

-- Erel

On Sun, Nov 15, 2015 at 12:40 AM, Michael Diarmid [email protected]
wrote:

Firstly, this library looks great!

Secondly, I'm experimenting with some ML for a little side project of mine
to help me learn and just wondering what I could achieve with this library
/ point me in the right direction as you seem extremely knowledgable on the
subject =]

[
'Where could i get a hot drink in Manchester?',
'whats the best coffee shop in manchestr',
'coolest cafe in Manc?'
]

So for my examples you can see the various ways (spelling mistakes
intentional) of asking about a Cafe in a named location.

As you can see there is lots of ways X activity can be asked about. At
most I'd have say 6 different types of activities (i.e going to a cafe, a
park, visiting a gym) so the options there are limited, but lots of ways of
describing them.

Second part of the query is a location, sometimes an abbreviated location,
I have a list of all the possible locations and subsequent
abbreviations/aliases, how could i extract these as well as handling
spelling mistakes/typos with the potential that multiple locations could be
mentioned in the same sentence, at most 3 - 4 locations.

The intent classifier seemed like a good start and basic tests seemed to
work, but unsure how to handle the issues above, i.e spelling (string
distance perhaps?) and named locations?

Any pointers, examples etc would be appreciated

Thanks,
Mike


Reply to this email directly or view it on GitHub
#33.

from limdu.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.