Languages supported? about limdu HOT 5 CLOSED

kafechew commented on June 26, 2024

Languages supported?

from limdu.

Comments (5)

erelsgl commented on June 26, 2024

Hi Kai,

limdu should work for any language. If you encounter any specific problem
in working with limdu in your language, please report it and we will check.

Erel

On Sun, Apr 12, 2015 at 1:19 PM, Kai Chew [email protected] wrote:

What are the languages supported for the limdu classifications?
I assume all the language using a-z alphabet.

How's about the Hebrew, Chinese, Hindi, Korea... those are not a-z
alphabet?

Thanks :-)

—
Reply to this email directly or view it on GitHub
#27.

from limdu.

kafechew commented on June 26, 2024

Cool~ Noted with thanks~

from limdu.

kafechew commented on June 26, 2024

I'm trying to analyse the Chinese.
For english, in which 1-gram of word will be simple and straight forward...
like "I am Max" will become "I", "am", "Max"

The problem with Chinese, different from english, it doesn't have "space".
我是马氏 (something like IamMax)
So, it will become "我是马氏" or ("IamMax"), instead of "我", "是", "马", "氏" ("I", "am", "Max")

My temporarily solution:
If the content is in Chinese or Japanese, it will use n-gram of letter,
[limdu.features.NGramsOfLetters(1), limdu.features.NGramsOfLetters(2)]
ps: minor english mixed major chinese will be an issue... Max = "M", "a", "x"

If English as usual (or Hebrew, Korean...), n-gram of word will do.
[limdu.features.NGramsOfWords(1), limdu.features.NGramsOfWords(2)]

Do you have any better solution with Limdu for this kind of issue (Tokenisation)?
Thanks in advanced!

from limdu.

erelsgl commented on June 26, 2024

Hi Kai,

Currently limdu contains only a small number of feature extractors, which
are used mainly as examples. There is a feature extractor that extracts
words: "limdu.features.NGramsOfWords", and one that extracts letters: "
limdu.features.NGramsOfLetters". You can try to use the second one and
see if it works.

On Thu, Apr 16, 2015 at 6:40 PM, Kai Chew [email protected] wrote:

I'm trying to analyse the Chinese.
For english, in which 1-gram of word will be simple and straight
forward...
like "I am Max" will become "I", "am", "Max"

The problem with Chinese, different from english, it doesn't have "space".
我是马氏 (something like IamMax)
So, it will become "我是马氏" or ("IamMax"), instead of "我", "是", "马", "氏"
("I", "am", "Max")

Do you have any solution with Limdu for this kind of issue (Tokenisation)?
Thanks in advanced!

—
Reply to this email directly or view it on GitHub
#27 (comment).

from limdu.

kafechew commented on June 26, 2024

Hi erelsgl,

Yep. Referring to my previous comment, I'm using both at this moment.
Just looking for better recommendations :-)
Thanks~

from limdu.

Languages supported? about limdu HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs