GithubHelp home page GithubHelp logo

cfinke / typo.js Goto Github PK

View Code? Open in Web Editor NEW
494.0 23.0 110.0 2.48 MB

A client-side JavaScript spellchecker that uses Hunspell-style dictionaries.

License: Other

JavaScript 72.95% CSS 3.14% HTML 3.95% Shell 0.08% TypeScript 19.89%

typo.js's Introduction

Typo.js is a JavaScript/TypeScript spellchecker that uses Hunspell-style dictionaries.

Usage

To use Typo in a Chrome extension, simply include the typo.js file in your extension's background page, and then initialize the dictionary like so:

var dictionary = new Typo("en_US");

To use Typo in a standard web application you need to pass a settings object that provides a path to the folder containing the desired dictionary.

var dictionary = new Typo("en_US", false, false, { dictionaryPath: "typo/dictionaries" }),

If using in node.js, load it like so:

var Typo = require("typo-js");
var dictionary = new Typo([...]);

To check if a word is spelled correctly, do this:

var is_spelled_correctly = dictionary.check("mispelled");

To get suggested corrections for a misspelled word, do this:

var array_of_suggestions = dictionary.suggest("mispeling");

// array_of_suggestions == ["misspelling", "dispelling", "misdealing", "misfiling", "misruling"]

Typo.js has full support for the following Hunspell affix flags:

  • PFX
  • SFX
  • REP
  • FLAG
  • COMPOUNDMIN
  • COMPOUNDRULE
  • ONLYINCOMPOUND
  • KEEPCASE
  • NOSUGGEST
  • NEEDAFFIX

Note: The manifest.json file in the root directory of the project is there to simplify testing, as it allows you to load all of the files in the Typo project as a Chrome extension. It doesn't have any purpose if you're using Typo.js in your own project.

Demo

There's a live demo of Typo.js at http://www.chrisfinke.com/files/typo-demo/ and a complete Node.js example file at examples/node/index.js.

Development

The full TypeScript source code and unit test suites are available in the official Typo.js repository at https://github.com/cfinke/Typo.js

To modify Typo.js, make your changes to ts/typo.ts and then run build.sh to generate the JavaScript file typo/typo.js.

IRL

Typo.js has been been used all over in real-world projects, but here are a few examples:

Licensing

Typo.js is free software, licensed under the Modified BSD License.

typo.js's People

Contributors

cfinke avatar timdream avatar slknutson avatar bmehanni avatar debdasgupta avatar jimmywarting avatar pies avatar balderdash avatar thomas101 avatar

Stargazers

 avatar Renatto Vaz avatar Lucas Felinto avatar Tony Stark avatar Mike Zeng avatar Constance Okoghenun avatar  avatar John Qing avatar  avatar Aondongu Tivzenda avatar Jakub Bilko avatar Aron Merkestijn avatar Brian Ernesto avatar Keram Yasin avatar 手塚国光 avatar Mohammed jabbar avatar  avatar Hamed Baatour avatar István Pató avatar Gunther Brunner avatar Leonie avatar Terry Tan avatar Oldrich Svec avatar  avatar fantasticit avatar Enzo avatar Jared Van Valkengoed avatar Kenny Clement avatar CodeWhiteWeb avatar Ryota Murakami avatar  avatar codthing avatar  avatar  avatar Abbderrahmane El Mahmi avatar Ibrahim H. avatar Michael Usachenko avatar Ta Tien Dat (Cody) avatar Edwin Kofler avatar  avatar Vladimír avatar Lucas avatar Georg Abenthung avatar jay avatar Dustin Van Tate Testa avatar Vladyslav Spivakov avatar Michael Kurowski avatar Tomasz Jakut avatar Rauf E'Z avatar WD avatar Eswaramoorthy Karthikeyan avatar Mukesh Manda avatar Vusal Huseynov avatar Edvard Bakken avatar Dang Nguyen avatar Rob VK avatar  avatar Zhenxu Xu avatar Muhammad Farid Zia avatar Zhihao Cui avatar  avatar Dustin Michels avatar Niemes avatar  avatar Rohan avatar Jérémy avatar Christopher Nguyen avatar Francesco avatar Alrik Wendel avatar  avatar Aetinx avatar  avatar Maks avatar Guillaume Gérard avatar Alexandre Muller avatar Juan Julián Merelo Guervós avatar frankfanslc avatar  avatar Kaan Gökdemir avatar George Yong avatar Sonny Lazuardi avatar  avatar Ashraful Rahman Sakil avatar Sohel Ahamed avatar Nazim avatar Ebrahim khan avatar Nur- Alom avatar Md Shahadat Hossain avatar Zonayed Ahmed avatar Tom Denley avatar Yaroslav Serhieiev avatar Vitor Barbosa avatar HARFHO avatar  avatar ulziibadrakh avatar Mitchell Neale avatar Fawad Iqbal avatar Tudor Gavan avatar regorxxx avatar Jiyee Sheng avatar

Watchers

Sebastian Schlatow avatar Nasser Rafie avatar  avatar Favi_ty avatar hunslater avatar Aaron Buchanan avatar YOUNSS avatar Ioannis Pinakoulakis avatar Bobbie Tables avatar Michael Anthony avatar Jiacheng Wang avatar writemonkey avatar  avatar  avatar  avatar  avatar Andrew Albertson avatar Friedhold Matz avatar Fawad Iqbal avatar Eric Tanaka avatar  avatar Emanuel Santiago Herrmann avatar Vladimír avatar

typo.js's Issues

remove Buffer

When i bundle the package with browserify i get a hole lot more then just this package.
I get the hole Buffer.js dependency also wish seems unnecessary b/c it's not even being used

var buffer = new Buffer(stats.size);

Could you avoid using Buffer?
btw, the Buffer constructor was depricated a long time ago

Contractions marked as mispellings

var dictionary = new Typo("en_US", affData, dicData);
var is_spelled_correctly = dictionary.check("aren't") //should be true, but returns false

All contractions appear to fail, e.g. "I'm", "we're", "didn't", etc.

Latin affixes not read correctly

var dictionary = new Typo("la", affData, dicData);
var is_spelled_correctly = dictionary.check("fideles") //should be true, but returns false
// (The word "fideles" is the plural form of "fidelis")

I used the Latin dictionary and affix files provided by OpenOffice.org
http://ftp.services.openoffice.org/pub/OpenOffice.org/contrib/dictionaries/la-pack.zip

In case it is helpful, below is the fragment of Latin that I was proofing. All the words pass spell check in Thunderbird using https://addons.mozilla.org/en-US/thunderbird/addon/latin-dictionary/

"1. Adeste, fideles, Laeti triumphantes

  1. Cantet nunc hymnos Chorus angelorum;
  2. Ergo qui natus die hodierna"

Chrome extension not working

How to get the chrome extension working.
I tried

var dictionary = new Typo("en_US", false, false, { dictionaryPath: "typo/dictionaries" })
 var dictionary = new Typo("en_US");

with typo.js included and dictionaries at the given path.

Both times, script breaks with the console output,

typo.js:255 GET chrome-extension://invalid/ net::ERR_FAILED
Uncaught DOMException: Failed to execute 'send' on 'XMLHttpRequest': Failed to load 

English "-ed" affix not read correctly

var dictionary = new Typo("en_US", affData, dicData);
var is_spelled_correctly = dictionary.check("ordained") //should be true, but returns false

I used the default en_US dictionary and affix files.

The dictionary line reads: ordain/SGLDR

support other language

in git we find that french is supported but when we use npm to get the librairie we find just english, and as u can see i really need a speell checker :D

Case sensitivity for proper nouns

var dictionary = new Typo("en_US", affData, dicData);
var is_spelled_correctly = dictionary.check("Alex") //returns true (as expected)
var is_spelled_correctly = dictionary.check("alex") //should be false but returns true

When the words in the dictionary have uppercase characters, they should be case sensitive when checking against the user text.

French affixes not read correctly

var dictionary = new Typo("la", affData, dicData);
var is_spelled_correctly = dictionary.check("marchons") //should be true, but returns false
// (The word "marchons" is a form of the verb "marcher" as explained here: http://www.wordreference.com/fren/marchons )

I used the French dictionary and affix files installed by default with OpenOffice.org (en). You can obtain the files directly from: http://hg.services.openoffice.org/OOO330/file/b70298db35e1/dictionaries/fr_FR

The dictionary line reads: marcher/a0()

Relative paths for dictionary directories [node]

I found that if I try to give Typo a relative path to my dictionaries in node, it throws an error like

Path ../../dictionaries/en_US/en_US.dic does not exist.

I think that's happening because of this check, which will be relative to node_module/typo-js/typo.js

if (fs.existsSync(path)) {

I think it can be fixed by something like, which works for me locally.

var _path = require('path')
...
path = _path.join(__dirname, path);
if (fs.existsSync(path)) {

I can make a PR to update this

French dictionary marks "espérance" as a misspelling

var dictionary = new Typo("la", affData, dicData);
var is_spelled_correctly = dictionary.check("espérance") //should be true, but returns false
//See http://www.wordreference.com/fren/esp%C3%A9rance

I used the French dictionary and affix files installed by default with OpenOffice.org (en). You can obtain the files directly from: http://hg.services.openoffice.org/OOO330/file/b70298db35e1/dictionaries/fr_FR

The dictionary line reads: espérance/S*()

Found a possible security concern

Hey there!

I belong to an open source security research community, and a member (@yetingli) has found an issue, but doesn’t know the best way to disclose it.

If not a hassle, might you kindly add a SECURITY.md file with an email, or another contact method? GitHub recommends this best practice to ensure security issues are responsibly disclosed, and it would serve as a simple instruction for security researchers in the future.

Thank you for your consideration, and I look forward to hearing from you!

(cc @huntr-helper)

High CPU and memory usage with long misspelled words

Hello, and thank you for developing this module. We need more pure JavaScript solutions like this!

I noticed that when trying to lookup suggestions for long words, the library seems to chug resources and lag pretty hard. The word djfhjfhdjfhskdfhskhdfksjdfhksdjfhksdf takes over 7 seconds to process using typo-js, and the Node process ends up eating over 800 MB of RAM.

Example code:

var Typo = require("typo-js");
var dictionary = new Typo( "en_US" );

var time_start = (new Date()).getTime();
var word = 'djfhjfhdjfhskdfhskhdfksjdfhksdjfhksdf';

var correct = dictionary.check(word);

if (!correct) {
	console.log( word + " is NOT spelled correctly." );
	var suggestions = dictionary.suggest(word);
	console.log( "Suggestions: " + JSON.stringify(suggestions) );
}
else {
	console.log( word + " is spelled correctly." );
}

var elapsed = Math.floor( (new Date()).getTime() - time_start );
var mem = process.memoryUsage();

console.log( elapsed + "ms elapsed, " + mem.rss + " bytes used" );

Output:

djfhjfhdjfhskdfhskhdfksjdfhksdjfhksdf is NOT spelled correctly.
Suggestions: []
7743ms elapsed, 857202688 bytes used

This is on a late 2016 MacBook Pro (2.9 GHz Intel Core i7) with OS X 10.12.3 and Node.js v6.9.1.

All the words are marked false

In my React App, I use it like this:

var Typo = require("typo-js");
var dictionary = new Typo("en_US", false, false, { dictionaryPath: "typo-js/dictionaries" });

the borwser can load en_us files correctly, but when I use dictionary.check(), all the words return the false.

French contractions not always recognized

var dictionary = new Typo("fr", affData, dicData);
var is_spelled_correctly = dictionary.check("j'espère") //should be true, but returns false
// (The word "j'espère" is a form of the verb "j'espèrer" as explained here: http://www.wordreference.com/fren/j%E2%80%99esp%C3%A8re )
var is_spelled_correctly = dictionary.check("C'est") //should be true, but returns false
// (The word "C'est" is a contraction meaning "It is", see http://www.wordreference.com/fren/c%27est )

I used the French dictionary and affix files installed by default with OpenOffice.org (en). You can obtain the files directly from: http://hg.services.openoffice.org/OOO330/file/b70298db35e1/dictionaries/fr_FR

The dictionary line reads: espérer/c2a+()
est/L'D'Q'

Asynchronous file load using the file protocol (file://)

Thank you for taking the time to develop this library.

A small issue I found:
In the _readFile method, the code is checking for status 200 (line 238) in order to resolve the promise.
Some environments, like Cordova iOS, return status 0 as success. See the following post.
Adding (req.status === 200 || req.status === 0) can solve the problem.

It would also be great if you can if you can return the Promise all the way to the initial load of the library. I think is the Typo constructor that calls the readDataFile. This will give the user the option to handle Promise rejections, which are not currently handled by the library.

anyone ported this to an "on the fly correction as you type" mode?

Is there any implementation so that as you type your text, you get suggestions closest to the word you are aiming to complete?
Or any implementation that marks the ones that are not found in the dictionary such as this

This sentance contains some mistokes --> This *sentance contains some *mistokes

en_GB dictionaries not working

I'm having problems with typo.js when using the en_GB dictionaries downloaded from here:
https://cgit.freedesktop.org/libreoffice/dictionaries/tree/en

It would seem that check() is returning TRUE for every word I pass, even random characters like 'wefwef'.

Looking at the code it would appear that the compound rule check (lines 675-684 in typo.js) passes due to two blank compoundRules existing after the en_GB definition files have been parsed.

if (typeof ruleCodes === 'undefined') {
	// Check if this might be a compound word.
	if ("COMPOUNDMIN" in this.flags && word.length >= this.flags.COMPOUNDMIN) {
		for (i = 0, _len = this.compoundRules.length; i < _len; i++) {
			if (word.match(this.compoundRules[i])) {
				return true;
			}
		}
	}
}

Is this an easy fix?

Thank you for a great product by the way!

Can't use Romance languages

Hello, I managed to integrate typo.js into my nw.js + Codemirror project just fine, but can't use it with Romance languages (Italian, Spanish, French ...). When trying to load them, there is a lengthy pause an then my app crashes.

Is there a way to make those dictionaries to work?

Thanks for the great library,

i.

Zulu dictionary hangs browser

var dictionary = new Typo("zu_ZA", affData, dicData);
var is_spelled_correctly = dictionary.check("hamba")

browser hangs when loading the dictionary (sometimes it gives an alert asking if you want to stop the script)

I used the Zulu dictionary and affix files provided by OpenOffice.org and Mozilla
https://addons.mozilla.org/thunderbird/downloads/latest/46490/addon-46490-latest.xpi?src=addondetail
(if you change the .xpi file extension to .zip you can decompress the files)
(here's the source page: https://addons.mozilla.org/en-US/thunderbird/addon/zulu-spell-checker/ )

The suggest function does not seem to suggest words with hyphen correctly

E.g:

.dic file contains:
wi-fi
spell-check
line-break

.aff file is empty.

Expected behaviour:
wifi -> Suggest [wi-fi]
spellcheck -> Suggest [spell-check]
linebreak -> Suggest [line-break]

Actual behaviour:
wifi -> Suggest []
spellcheck -> Suggest []
linebreak -> Suggest []

The suggestions work as intended when using the hunspell 1.7.0 cmd with the same .dic and .aff setup.
Am I missing anything?

Reconcile new Typo.js version?

Hi. We worked on an updated Typo.js version, but its scope is a bit out from a simple fork; it's more of a wholesale port. General changelog:

  • It's now in TypeScript
  • Added support for WebWorkers in browsers
  • Uses the Faroo spell check for much faster results (<1ms even for long words!) and arguably higher quality ones
  • It's currently quite messy as I was playing around with ideas and benchmarks, but we can get it cleaned up...

I wanted to ask if you're interested in merging this back in to make "typo.js 2.0" or if you're happy as is 😄

Cheers,
Connor

Spanish affixes not read correctly for "Heme"

var dictionary = new Typo("es_MX", affData, dicData);
var is_spelled_correctly = dictionary.check(Heme") //should be true, but returns false
// (The word "Heme" is a form of the word haber: see http://www.wordreference.com/es/en/translation.asp?spen=he "~me aquí here I am"  )

I used the Mexican Spanish dictionary and affix files listed on the OpenOffice.org website. You can obtain the files directly from: http://ftp.services.openoffice.org/pub/OpenOffice.org/contrib/dictionaries/es_MX.zip

The dictionary line reads: haber/ÃÇDSÀÂÁÆÄÅ

problematic replacementTable suggest behaviour

In the current implementation, when a suggestion match is found with the replacement table, the search stops, see https://github.com/cfinke/Typo.js/blob/master/typo/typo.js#L767

This is problematic as more matches are possible.

An example is the word 'bere', the current implementation finds a match in
replacementTable[18] -> replacementEntry = ["ere", "ear"]
adds the match 'bear' and exits

Many other matches (ie 'bare' or 'beer') are not given, the correct behavior should be to continue searching.

This is fixed in #45

German affixes not read correctly for "Stärke"

var dictionary = new Typo("de_DE", affData, dicData);
var is_spelled_correctly = dictionary.check("Stärke") //should be true, but returns false
// (The word "Stärke" is actually the root word found in the dictionary file )

I used the German dictionary and affix files listed on the OpenOffice.org website. You can obtain the files directly from: http://extensions.services.openoffice.org/project/dict-de_DE_frami

The dictionary line reads: Stärke/m

What about Node.js?

In node.js, APIs are exported through module.exports command from the library files. All variables defined in loaded js file are unacessible. Therefore I wonder, will there be any support for node.js and/or Require.js?

I can edit it myself of course, but that's not what I meant.

Unable to load affData or wordsData or debug why

I've tried so many different combinations of dictionaryPath value and the relative directory containing the .aff and .dic files, and I still cannot get Typo to (apparently) load anything off disk. Furthermore, the library does not indicate that anything bad has happened. Inspecting the dictionary object reveals a perfectly normal looking dictionary object, except that there are no rules and everything typed gets marked as mis-spelled.

I've looked through the source, and it is very straightforward how dictionaryPath gets used and I know I'm referring to the on-disk locations correctly (as far as I can tell). If I put a location I know does not exist, the dictionary object looks no different than a 'correct' dictionaryPath value. Something is definitely up here.

Two issues I deal with a lot on tablets + one requested addition

  1. Typo crashes the application on an iPad whenever you use Typo.js to get suggestions. It sometimes works the first time, but iOS will close the application if you spike memory usage - such as loading the dictionary and doing suggestion searches. I think this really needs to be optimized.

    As a temporary work-around, I went through the entire .dic file and removed all of the words that my application doesn't care about, getting the .dic file down to 37k.

    For example, if you know that a word has no rules, then dictionaryTable does not need to contain so many empty arrays, or arrays that contain empty arrays. These objects take up memory. Just null everything out. The biggest culprits seem to be the generated words based on the rules in the dictionaryTable. This one modification alone would dramatically reduce memory footprint.

  2. Also, the API needs to be changed so that it's asynchronous and uses promises by default. For the longest time, I had no idea it could be asynchronous and so it was blocking the UI. Upgrading to promises though is just going to be more and more mandatory for libraries. Modern apps in Angular or whatever are all promise-based now, as is most browser databases. So it's actually a bit awkward to see a defaulted synchronous library. I think at this point in time, the people who want to be synchronous should have to jump through hoops, instead of the other way around.

  3. A nice function to have would be stems() that simply accepts a word and returns all of the replacement words for the rules associated with it. For example, if you pass in 'vibrate', it returns something like ["vibrations", "vibration", "vibrating", "vibrates", "vibrated"]. This is very useful for building full text searches. Since you already have the rules and the dictionary, why not make this function public? I see most of the code buried in _parseDIC, but it could be refactored out as a prototype method.

Allow to specify custom url or support version query

I'd like to be able to specify ?v= query when fetching the aff/dic files, IMO should be either via

  • settings.affUrl & settings.dicUrl
  • or settings.version which would append append ?v=${version} to the query (how the query looks like doesn't really matter as long as the version is in there). WTBS, settings.version to allow granular versioning, we'd need dicVersion && affVersion.

Issue with German spell check

I use a German dictionary file and encounter issues with some common words like listed below. They are part of the dictionary file but lined out as misspelled which is wrong in this case.

It would be great if there is a fix available.

11-03-2013 19-55-15

11-03-2013 19-55-56

11-03-2013 19-52-50

API for adding custom words

It would be much appreciated if there were a way to add new custom words to the working dictionary. Right now I'm reloading the whole thing, though I saw on another PR that there is a hackaround to add single words using internal members. It seems like this is an obvious candidate for API extension.

Add a demo page

please add a demo page to the repository and also a link to a live version, it will save people time to download and test it themselves. maybe it's not exactly what they were looking for or whatever.

Cannot resolve module 'fs'

ERROR in ./~/typo-js/typo.js
Module not found: Error: Cannot resolve module 'fs' in C:\git\client\node_modules\typo-js
 @ ./~/typo-js/typo.js 164:12-25

Node v5.12.0
NPM 3.8.6
Win10

Using Webpack

Multiple dictionaries?

I see in the docs that Typo needs to be initialized with a dictionary code. Would it be feasible to enhance this so that it can load multiple dictionaries? If that's not impossible and sounds useful, I would be willing to help :)

Does not suggest accents and don't use `REP` statement (in french)

Hi,

When I ask suggestions of a word that is missing an accent, I don't obtain the right word:

dico.suggest('completement')
// ["comportement"] // instead of "complètement"
dico.suggest('buche')
// ["bruche", "bouche", "muche", "huche", "boche"] // instead of "bûche" 

Also, even though those statement are present in my fr_FR.aff file, they are not used by the suggestion engine:

REP faisez$ faites
REP puit puits

when tried, those words output this:

dico.suggest('faisez')
// ["faisiez", "fraisez", "taisez", "frisez", "baisez"]
dico.suggest('puit')
// ["punit", "puait", "put", "putt", "duit"]

Did I miss a configuration step? Are my .aff + .dic files wrong?

Thank you for your help.

Best,
GJ.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.