GithubHelp home page GithubHelp logo

spellcheck-vietnamese's Introduction

spellcheck-vn

Spellcheck for vietnamese or a custom dictionary.

Install dependencies

pip3 install -r requirements.txt

Usage

Init dictionary from vocab file

python3 --infile="a_vocab_file" \
	--update-dict=True \
	--odfile="your_dict_file.json" \
	--ovfile="o_vocab_file"

Update dicitonary

python3 --infile="a_vocab_file" \
	--dict-file="i_dict_file.json" \
	--update-dict=True \
	--odfile="o_dict_file.json" \
	--ovfile="o_vocab_file"

Example output logging:

INFO:__main__:Adding `bênh` to dict
INFO:__main__:`bệnh` not in dictionary
INFO:__main__:Adding `bệnh` to dict
INFO:__main__:`béo` not in dictionary
INFO:__main__:Adding `béo` to dict
INFO:__main__:`bèo` not in dictionary
INFO:__main__:Adding `bèo` to dict
INFO:__main__:`bếp` not in dictionary
INFO:__main__:Adding `bếp` to dict
INFO:__main__:`bẹp` not in dictionary
INFO:__main__:Adding `bẹp` to dict

Spell checking

python3 --infile="a_vocab_file" \
	--dict-file="i_dict_file.json" \
	--odfile="o_dict_file.json" \
	--ovfile="o_vocab_file"

Example results:

INFO:__main__:xu_dong_duong.txt:41 -- Tá c giả: Paul Doumer
INFO:__main__:`c` not found, but maybe cằn|cúi|cúng|cúc|cút|cựu|cực|cài|cày|cà|cào|cành|càng|cãi|còi|còn|còng|cò|còm|cợt|cõi|cõng|cổng|cổ|cuộn|cuộc|cuối|cuốn|cuống|cuốc|cung|cuồn|cuồng|cua|cùng|cù|cùm|cũi|cũng|cũ|cội|cộng|cộ|cộc|cột|cải|cảnh|cảng|cả|cảo|cảm|cớ|câu|cây|cân|câm|cậu|cận|cập|cậy|cật|cứu|cứng|cứ|cồn|cồng|chằng|chúa|chúng|chú|chút|chúi|chúc|chen|che|chìa|chì|chìm|chỉnh|chỉn|chỉ|chãi|chã|chão|chòi|chòng|chợn|chợ|chợt|chõng|chổng|chua|chuồn|chuồng|chung|chu|chuẩn|chuỗi|chuộng|chuộc|chuột|chuối|chuốc|chuốt|chuyên|chuyển|chuyến|chuyện|chuông|chính|chín|chí|chích|chùa|chùng|chùm|chiêu|chiêng|chiêm|chia|chinh|chi|chiểu|chiều|chiền|chiếu|chiến|chiếc|chiếm|chim|chế|chếch|chết|chực|chội|chộp|chẳng|chới|chớp|chớ|chớm|chề|châu|chân|châm|chệch|chậu|chập|chậm|chật|chứa|chứng|chứ|chức|chồng|chồm|chênh|chê|chặn|chặng|chặt|chấp|chấn|chấm|chất|chạ|chạp|chạy|chạm|chạnh|chối|chốn|chống|chốc|chốt|chở|chởm|chửa|chửi|chừa|chừng|chẩy|chéo|chép|chén|chém|chọi|chọn|chọc|chảo|chảy|chải|chèo|chè|chào|chày|chàng|chài|chàm|chủng|chủ|choáng|chong|cho|choàng|chĩa|chụp|chục|chụm|cháu|cháy|chánh|chán|chác|chẻ|chểnh|chắp|chắn|chắc|chịu|chị|chịt|chau|chao|chan|cha|chai|chay|chóng|chó|chóp|chói|chót|chóc|chữa|chững|chữ|chỗ|chôn|chông|chơi|chơ|chầu|chầy|chầm|chẽn|chẽ|chưởng|chướng|chước|chư|chương|chưa|chỏm|chăn|chăng|chăm|chờn|chờ|cạo|cạnh|cạn|cạm|cối|cống|cố|cốc|cốt|cởi|cừ|cẩu|cẩn|cẩm|cặp|cặn|cửu|cửa|của|củng|coi|con|cong|co|com|cụ|cục|cụm|cụt|cáu|cáo|cánh|cá|cám|cáp|cái|cát|cách|các|cấu|cấp|cấy|cấm|cất|cao|canh|can|ca|cai|cay|cam|cỡ|cắp|cắn|cắm|cắt|cữ|cỗi|côn|công|cô|cơn|cơ|cơm|cói|cóng|có|cầu|cầy|cần|cầm|cọng|cọp|cọ|cọc|cọt|cưa|cưỡi|cưỡng|cưng|cưới|cướp|cước|cưu|cược|cương|cười|cường|cỏi|cỏn|cỏ|căn|căng|căm|cờ !
INFO:__main__:`paul` not in dictionary
INFO:__main__:`doumer` not in dictionary

spellcheck-vietnamese's People

Contributors

sinhnn avatar

Stargazers

TiNyX3k avatar vinhht avatar Gnol  avatar Nguyễn Xuân Hoàng avatar Doraneko avatar Le Duc Linh avatar  avatar

Watchers

 avatar Hoa Thiên Vũ avatar

spellcheck-vietnamese's Issues

Replace dpath.util.search and dpath.util.get by custom function

def search (*path, dict, validpath=[]):
if(len(path)) == 0: return validpath
try:
return search(path[1:], dict[path[0]], validpath=validpath + [path[0]])
except:
print("Invalid key")
return validpath[0:-1]

def get(*path, dict):
if(len(path) == 0) : return dict
try:
return get(path[1:], dict[path[0]])
except:
print("Invalid key '{}' ).format(path[0]])
return False

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.