Comments (8)
Seems ironically short english terms are the hardest to guess:
elastinen vain elämää
fin(nish), no problemcae el sol airbag
cat(alan), no problemtennis court flume
quite off
from franc.
Great question. The answer might however not be what you’d like. It’s due to the high amount of supported languages that smaller passages are often way off.
Also, “tennis”, “flume”, “court”, are all words which originate from French!
The fact that the other languages seem to work well on short passages: I’m not sure, it may be coincidence. Or not. I’ll investigate 😄
from franc.
Thanks. Anyhow, I think it's a great project! Keep up the good work. Maybe you can draw inspiration from language-detection which seems to use naive bayesian filter. As far as I understand the sourcecode, your's tries to detect from which unicode codepage the characters are from, and codepages should correlate to language (or are shared). Is that roughly correct? Iff, can you algorithm handle Decomposed Unicode characters (NFD) or "only" NFC?
from franc.
@Worm. To elaborate on these etymological issues, you'll note that we have here 3 different cases:
flume does notre exist in modern French (I had never heard this word before and it appears it is really very old French).
Tennis comes from 'tenez' (hold) but is used in French with the English meanings (sport and shoes).
Court is also used in French but usually means 'short'. Btw, short means short trousses un French.
I suppose that the fact English is leaking into every other language does not help. And this is especially true with French for it had previously influenced English...
from franc.
@djui Yeah I’ve seen it, It’s interesting, but it also states, “the more languages, the more difficult”. Which holds true for 49, 168, and 300+ languages.
See unicode-7.0.0 for more information on the used scripts.
from franc.
@odalet Thanks for more information. Yeah, although not literally French words, my little understanding of the language made me sympathise with franc detecting “tennis court flume” as French 😛
from franc.
Any reason why your lib seems to be named after the barbarian people who gave his name to my country? ;)
Anyway, very interesting project. Mixing languages and computing; I love this. Keeping an eye on it!
from franc.
@odalet Hahaha, I wanted a short name, was thinking about “lingua franca”, and came up with “franc”. Which is short, human-like, and awesome. Only disadvantage is that it’s hard to Google: you have to add “language” or my name!
Thanks for the kind words. It’s really interesting, and I’m looking forward to see where it’s all heading!
from franc.
Related Issues (20)
- Improved accuracy for small documents HOT 19
- Regular Expression Denial of Service in trim HOT 1
- Make available for commonJs HOT 2
- How to generate data.js in franc packages HOT 2
- Package update for 5.0.0 HOT 1
- issue
- Link in README broken HOT 1
- Problem to detect language HOT 5
- err "The requested module does not provide an export named 'default'" HOT 1
- Not working for single words like "Hello" HOT 1
- How can I do the same thing on html? HOT 1
- BCI language code
- Franc doesnt work in TypeScript HOT 3
- Norwegian detected as Deutch HOT 3
- Make a test
- Adding a language HOT 3
- "only" parameter returns always score = 1 HOT 3
- How can I install normally? HOT 1
- npm i franc results in 1 high severity vulnerability upon install HOT 2
- the official example of `only` not working HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from franc.