crate-ci / typos Goto Github PK

View Code? Open in Web Editor NEW

2.1K 2.1K 78.0 16.88 MB

Source code spell checker

License: Apache License 2.0

Rust 99.92% Shell 0.08% Dockerfile 0.01% Python 0.01% Julia 0.01% PHP 0.01%

cli code-quality rust spell-checker

typos's People

Contributors

Stargazers

Watchers

typos's Issues

Allow choosing acceptable dialects

Don't try to correct hexadecimal values

Some hex numbers might appear as words and those words might look like typos. We should do like scspell and ignore them.

Layered config

For large projects, it can be helpful to support layered configs.

Token and word output modes

These would exist for helping to debug configurations

Calculate line number / line offset only when typo is found?

Right now we proactively parse out lines and then parse within a line. What if instead we found out our line number by counting the new lines afterwards? This puts the cost on the typo case, which should be rare, rather than on every case when parsing

--threads support

Support more than UTF-8 files

Interactive mode?

See https://github.com/myint/scspell/blob/master/scspell/__init__.py#L363 for how scspell does it.

.ignore support

Support reading from stdin

Resolve clippy error about regex

See https://travis-ci.org/epage/defenestrate/jobs/550805412

Provide a message about skipped binary files

I'm thinking this should be one of the programmatic messages sent to stdout rather than using a logging crate.

Speed up detection of binary files

Currently we load the entire file in memory and search for a null byte through all of it.

See #29 for other implementations for how to speed it up

Optional support "professional" language by blocking "naughty" words?

Seems like some people might like this as an optional feature

See https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words

Support defining new file types in config

Speed up identifier splitting

split_ident allocates into a Vec rather than lazily returning values as an iterator.

Detect invalid file types

Logging support

Define an exit code policy

See https://docs.rs/exit-code/1.0.0/exit_code/

Including broken pipe

Codes of interest

USAGE_ERROR
CONFIG_ERROR
FAILURE for runtime errors?
DATA_ERROR for typos?

Settle on a name

Create pre-built binaries

Custom dictionaries

Source

passed in on cli
found on disc

Include

file type definitions
per file type corrections

Don't get confused by c-escape characters

Scspell will parse the file assumng escape characters exist but let you opt out.
See https://github.com/myint/scspell/blob/master/scspell/__init__.py#L78

Escape character support could instead be a file type setting, like dictionary values

Add gitignore support to comparison

Perf: remove allocation when case correcting by switching to KString

KStringCow has the following states:

Box<str>
'static str
's str
inlined string

If we add a From to it, we can possibly detect being able to use the inline string and write straight to it, avoiding the allocation when case correcting.

In addition, we'd be dropping from 4 machine words to 3 machine words iirc.

Recommended fix should match the users' case

Colored output support

This would be relatively easy while improving the scannability of the results.

Set exit code on typos being found

`--no-binary` should not ignore files explicitly passed in

At least this is how ripgrep does it.

Per-file type identifier rules

We'll to define file types and what traits those file types should have (specialized dictionaries, _ / - as identifier characters, and whether escape sequences are supported (#3).

This can then be extended into a config file that works with custom dictionaries (#9) to allow the user to override existing file type definitions or add their own.

Document comparison with related tools

Config file support

We're developing a lot of flags. It'd be good if we added a config file so people can easily get a consistent experience

Ignore binary files

Copy the flags from ripgrep: https://github.com/BurntSushi/ripgrep/blob/master/src/app.rs#L745

Exposed as https://github.com/BurntSushi/ripgrep/blob/master/src/args.rs#L902

Don't panic on broken pipe

Spell check filenames

Idea comes from codespell
https://github.com/codespell-project/codespell

Custom ignores?

Some times files should just be ignore for spelling but work for all others

Write/diff as we go rather than at the very end of the program

Right now --diff and --write-changes are only after everything is done happening. Not showing results as we progress can confuse the user.

Handke UK vs US english

Audit API

The API has gone through some churn. We should audit it before 1.0 to make sure its something we want.

Support correcting the file

Fill in misspell-go's comparison

Ignores binary / scm, see https://github.com/client9/misspell/blob/master/mime.go#L157
Ignores emails, hostnames, paths, and c-escapes, see https://github.com/client9/misspell/blob/master/notwords.go and https://github.com/client9/misspell/blob/c0b55c8239520f6b5aa15a0207ca8b28027ba49e/replace.go#L103
- And URLs, see https://github.com/client9/misspell/blob/master/url.go
For go, only checks comments, see https://github.com/client9/misspell/blob/master/replace.go#L142

Add benchmarks

Possibly steal ripgreps cases

Compare to scspell, the go one that we took the list from, and some kind of baseline search, like ripgrep

Add codespell to comparison

https://github.com/codespell-project/codespell

Calculate line number / line offset on-demand?

Render corrections in non-ASCII code correctly

We list the byte offset and not the column
We point to the byte offset, and not the column, in the line

Correctly detect words when c-escapes are used

foo\nbar will look like foo and nbar without special handling.

Support file types embedded in file types?

With #14, we're going to have special handing of different file types but one file isn't always a single type

markdown files that have code fences
- treat markdown as non-code (no identifier support), `` as generic code, and code-fences as the specified language
rust comments have markdown which have code fences
mako files are a mixture of python and whatever the generated type will be.

crate-ci / typos Goto Github PK

typos's People

Contributors

Stargazers

Watchers

Forkers

typos's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs