GithubHelp home page GithubHelp logo

Comments (8)

djui avatar djui commented on July 20, 2024

Seems ironically short english terms are the hardest to guess:

  • elastinen vain elämää fin(nish), no problem
  • cae el sol airbag cat(alan), no problem
  • tennis court flume quite off

from franc.

wooorm avatar wooorm commented on July 20, 2024

Great question. The answer might however not be what you’d like. It’s due to the high amount of supported languages that smaller passages are often way off.

Also, “tennis”, “flume”, “court”, are all words which originate from French!

The fact that the other languages seem to work well on short passages: I’m not sure, it may be coincidence. Or not. I’ll investigate 😄

from franc.

djui avatar djui commented on July 20, 2024

Thanks. Anyhow, I think it's a great project! Keep up the good work. Maybe you can draw inspiration from language-detection which seems to use naive bayesian filter. As far as I understand the sourcecode, your's tries to detect from which unicode codepage the characters are from, and codepages should correlate to language (or are shared). Is that roughly correct? Iff, can you algorithm handle Decomposed Unicode characters (NFD) or "only" NFC?

from franc.

odalet avatar odalet commented on July 20, 2024

@Worm. To elaborate on these etymological issues, you'll note that we have here 3 different cases:
flume does notre exist in modern French (I had never heard this word before and it appears it is really very old French).
Tennis comes from 'tenez' (hold) but is used in French with the English meanings (sport and shoes).
Court is also used in French but usually means 'short'. Btw, short means short trousses un French.
I suppose that the fact English is leaking into every other language does not help. And this is especially true with French for it had previously influenced English...

from franc.

wooorm avatar wooorm commented on July 20, 2024

@djui Yeah I’ve seen it, It’s interesting, but it also states, “the more languages, the more difficult”. Which holds true for 49, 168, and 300+ languages.

See unicode-7.0.0 for more information on the used scripts.

from franc.

wooorm avatar wooorm commented on July 20, 2024

@odalet Thanks for more information. Yeah, although not literally French words, my little understanding of the language made me sympathise with franc detecting “tennis court flume” as French 😛

from franc.

odalet avatar odalet commented on July 20, 2024

Any reason why your lib seems to be named after the barbarian people who gave his name to my country? ;)
Anyway, very interesting project. Mixing languages and computing; I love this. Keeping an eye on it!

from franc.

wooorm avatar wooorm commented on July 20, 2024

@odalet Hahaha, I wanted a short name, was thinking about “lingua franca”, and came up with “franc”. Which is short, human-like, and awesome. Only disadvantage is that it’s hard to Google: you have to add “language” or my name!

Thanks for the kind words. It’s really interesting, and I’m looking forward to see where it’s all heading!

from franc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.