GithubHelp home page GithubHelp logo

inaka / sheldon Goto Github PK

View Code? Open in Web Editor NEW
62.0 39.0 10.0 1.56 MB

Very Simple Erlang Spell Checker

License: Apache License 2.0

Erlang 100.00%
erlang sheldon erlang-spell-checker spelling-checker spelling-correction hacktoberfest

sheldon's Introduction

Sheldon

Hex.pm Version Hex.pm Downloads Coverage Status Build Status Erlang Versions

Very Simple Erlang Spell Checker.

Note: Sheldon also suggests correct words when some word is misspelled. That functionality was highly inspired by the Elixir project spell_check.

Contact Us

If you find any bugs or have a problem while using this library, please open an issue in this repo (or a pull request :)).

And you can check all of our open-source projects at inaka.github.io.

Installation

NOTE sheldon only works with Erlang 21 or greater

  1. Clone the repo
  2. rebar3 compile

Usage

Erlang Shell

First of all Sheldon is an application and it needs to be started. You can use rebar3 shell to set the necessary paths, then use sheldon:start/0 or application:ensure_all_started(sheldon) in order to start Sheldon but if you are using Sheldon as a dependency you can let OTP starts it from your_app.app file too.

Sheldon only has two main methods, sheldon:check/1 and sheldon:check/2. As a user, you just need to use those.

1> sheldon:check("I want to check this correct text").
ok
2> sheldon:check("I want to check this misspeled text").
#{bazinga => <<"That's no reason to cry. One cries because one is sad. For example, I cry because others are stupid, and that ma"...>>,
  misspelled_words => [#{candidates => ["misspeed","misspelled"],
     line_number => 1,
     word => "misspeled"}]}

Configuration

sheldon:check/2 works like sheldon:check/1 but it accepts a Configuration parameter. With this Conf parameter we can apply some rules to the text we want to check. Those rules are ignore words, ignore patterns and ignore blocks.

This is the format (see sheldon_config.erl), no key is required:

#{ ignore_words    => [string()]
 , ignore_patterns => [regex()]
 , ignore_blocks   => [ignore_block()]
 , adapters        => [adapter()]
 }.

Then, if we call the previous sheldon:check/1 but with configuration we can skip the error

3> sheldon:check("I want to check this misspeled text", #{ignore_words => ["misspeled"]}).
ok

Adapters

Sometimes we have to check the spelling of formatted text but sheldon handles it as a plain text so we will face problems with that. One example is markdown files, if we try to check them sheldon will complain about things like '##' or '*something*'. For these cases sheldon provides adapters. An adapter is an Erlang module with an adapt/1 function which will receive a line in binary() format and returns that line transformed. For example, sheldon provides markdown_adapter which converts from markdown to plain text.

In order to use them we only have to declare them in the config file:

#{adapters => [markdown_adapter]}.

You can create your own adapter which fits your requirements, you only need to implement the sheldon_adapter behavior and to provide some code to adapt/1 function.

-spec adapt(binary()) -> iodata().
adapt(Line) ->
  ...

You can add all the adapters you want and they will be executed in order.

Examples

Check this out.

Results

sheldon:check/1 and sheldon:check/2 have the same result type, you can see sheldon_result.erl. Sheldon will return the ok atom if the check went well else it'll return

    #{ misspelled_words := [misspelled_word()]
     , bazinga          := string()
     }.

misspelled_word's list will be returned ordered by line number. If more than one misspelled word per line appears they will be ordered by order of appearance.

Dependencies

Required OTP version 23 or or higher. We only provide guarantees that the system runs on OTP23+ since that's what we're testing it in, but the minimum_otp_vsn is "21" because some systems where sheldon is integrated do require it.

sheldon's People

Contributors

amilkr avatar cabol avatar elbrujohalcon avatar ferigis avatar harenson avatar vkatsuba avatar x4lldux avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sheldon's Issues

increase the worker pool timeout

Testing with spellingCI I saw 5 seconds is not good enough. We can increase the default timeout or/and make that number configurable from config file

Handle hyphenated words

sheldon version

0.4.1

OS version

MacOS 11.6

Description

  • Motivation
    I want to be able to write hyphenated words.
  • Proposal
    Analyze stuff like non-breaking as two words: non and breaking.

Current behavior

For instance…

src/spillway_srv.erl:51: The word "non-incrementing" in comment is unknown.

Expected behavior

No warnings.

A word with hyphen is not being suggested by spell checker.

Hi Team,

I have tested this sheldon spell checker utility. I did see, the the hyphen words are not suggested by the spell checker.
for example:
self-ownership
self-paid
self-painter
self-pampered
self-pampering
self-panegyric
self-parasitism
self-parricide
self-partiality
self-paying
self-peace

Is there something i am missing to validate this?.

Make `bazinga` as optional field by config

sheldon version

0.3.0

OS version

Ubuntu

Description

  • Motivation
    Not all users want to see bazinga in response from sheldon. It would be great have configuration for this behavior.
  • Proposal
    Add new config option show_bazinga true/false to skip bazinga message from sheldon in response.

Current behavior

The bazinga field in response always provided.

Expected behavior

The bazinga field in response should be configurable.

improve performance

sheldon is spending almost 40 seconds when starts. The problem is converting from the keys list to a set when starting the dictionary

How to add new Language? Namely Portuguese

I see that the only existing dictionary is English, but I'd like to use it for Portuguese-BR. Seems like it would only need a bazinga.txt and a dictionary.txt, correct?

If I can provide these, would you merge the PR?

Thanks!

Removing try/catch in sheldon_suggestions_server: suggestions/2

in this code

suggestions(MisspelledWords, Lang) ->
  try rpc:pmap({?MODULE, add_suggestions}, [Lang], MisspelledWords) of
    Result -> Result
  catch
    exit:badrpc:Stacktrace ->
      error_logger:error_msg( "~p:~p >> Error: badrpc~n\tStack: ~p"
                            , [?MODULE, ?LINE, Stacktrace]
                            ),
      MisspelledWords
  end.

we can remove the catch since the rpc:pmap/3 doesn't throw an exception

Windows compile issue

sheldon version

0.3.0

OS version

Windows

Steps to reproduce

Compile sheldon on Windows.

Current behavior

C:/ProgramData/Chocolatey/lib/mingw/tools/install/mingw64/bin/mingw32-make -C d:/a/rebar3_sheldon/rebar3_sheldon/_build/default/lib/hoedown libhoedown.a
mingw32-make[1]: Entering directory 'd:/a/rebar3_sheldon/rebar3_sheldon/_build/default/lib/hoedown'
cc -g -O3 -ansi -pedantic -Wall -Wextra -Wno-unused-parameter -Isrc -c -o src/autolink.o src/autolink.c
process_begin: CreateProcess(NULL, cc -g -O3 -ansi -pedantic -Wall -Wextra -Wno-unused-parameter -Isrc -c -o src/autolink.o src/autolink.c, ...) failed.
make (e=2): The system cannot find the file specified.
mingw32-make[1]: *** [Makefile:92: src/autolink.o] Error 2
mingw32-make[1]: Leaving directory 'd:/a/rebar3_sheldon/rebar3_sheldon/_build/default/lib/hoedown'
mingw32-make: *** [Makefile:33: ../priv/emarkdown.so] Error 2
mingw32-make: Leaving directory 'd:/a/rebar3_sheldon/rebar3_sheldon/_build/default/lib/emarkdown/c_src'
===> Hook for compile failed!

Expected behavior

The sheldon can be able compile on Windows.

Typo generator

sheldon version

0.2.0

OS version

Ubuntu

Description

  • Motivation
    Speeding up the library.
  • Proposal
    • Create typo generator for generate typo_dictionary.txt based on dictionary.txt and put it into priv/lang/eng/* or move into separate repo, because the size of file can be large for eg hex.pm. As result for each word from dictionary.txt will be put into eg ets all possible typos. Which in turn will lead to the fact that all possible options will already be generated and available, the shadlon will not need to spend additional time on additional generation, words are either there or not.
    • Also don't forget about association of similar words which must be linked to each other thus one misspelled word can have several possible variants but no more 1 +- 5 or 10.

Current behavior

The sheldon try generate and provide a lot of candidates for each word in real time.

Expected behavior

All possible candidates should be insert before sheldon will be loaded/started.

Config

It might be worth adding an option to control whether the new behavior is turned on or off.

Cache of suggested suggestions

sheldon version

0.2.0

OS version

Ubuntu

Description

  • Motivation
    An incorrect 4-character word can create about +- 250 possible suggestions. In the current implementation, the generating of suggestions happens each time for any duplicates words which ultimately increases the load to system.
  • Proposal
    Add separate table for already generated suggested suggestions as for success as for empty suggestions for each word what was already checked.

Current behavior

Cache of suggested suggestions is missed.

Expected behavior

Cache of suggested suggestions is missed implemented.

Config

Optional: it might be worth considering an additional option to enable and disable the cache of suggested suggestions.

The sheldon crashes when put unicode chars

sheldon version

0.2.1

OS version

Ubuntu

Steps to reproduce

1> sheldon:check("this comment: ’").
** exception error: bad argument
     in function  re:split/2
        called as re:split([116,104,105,115,32,99,111,109,109,101,110,116,58,
                            32,8217],
                           "\n")
        *** argument 1: not an iodata term
     in call from sheldon:do_check/2 (/.../sheldon/src/sheldon.erl, line 58)

Current behavior

The sheldon crashes when put unicode chars.

Expected behavior

The sheldon work with unicode chars.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.