GithubHelp home page GithubHelp logo

vocabulary's Introduction

Learning Vocabulary

最近意识到英语的重要性,工作中越来越多的资料都是英文的,所以决定开始学习英语。

鉴于我这不堪入目的英语水平(词汇量 3000),第一步必须得先把词汇量提升上去,因此写了个工具辅助我背单词。

这个工具的目的是帮我走捷径(尽量少背单词),功能主要有如下几个:

1、解析英文 PDF,抽取出单词(用于筛选出专业术语)

2、通过 Stanford CoreNLP 分词,还原词源

3、过滤掉认识的单词

最终生成的单词列表会导入扇贝单词,生成单词书进行背诵。

仅过滤掉认识的单词

使用 coca/filter.js

1、设置这 2 个文件的路径

// 需要处理的原始文件
const rawFile = dir + '/../data/vocabulary/COCA20000.txt';
// 存放结果的文件
const filteredFile = dir + '/data/20000_filtered.txt';

2、执行 coca/filter.js

通过 Stanford CoreNLP 分词,还原词源后,过滤掉认识的单词

使用 statFrequency/index.js

1、将初步过滤好的单词写入 statFrequency/data/unknownWords.txt

2、执行 Stanford CoreNLP 命令:

cd /d/software/stanford-corenlp-4.5.5 && \
java -mx50g -cp '*' edu.stanford.nlp.pipeline.StanfordCoreNLP \
-annotators "tokenize,pos,lemma" \
-outputFormat json   \
-outputDirectory /d/git/vocabulary/statFrequency/data \
-file /d/git/vocabulary/statFrequency/data/unknownWords.txt && \
cd /d/git/vocabulary

3、执行 statFrequency/index.js

即可在 statFrequency/data/final.txt 中看到最终的单词列表

手动筛选认识的单词

1、将初步过滤好的单词写入 data/process/toBeFiltered.txt

2、执行脚本 classify/generateWords.js

3、打开 classify/index.html

4、勾选认识的单词,点击“导出简单词并复制到剪贴板”

5、将剪贴板中的内容粘贴到 classify/easy.js 和 data/easy.txt

6、执行上面的 仅过滤掉认识的单词 的步骤

TODO LIST

  • easy 词表只保留一份
  • 简单词的筛选做成拖动版,可以拖入单词,拖出单词
  • 实时存储,尽量减少手动文件操作
  • GPT 一键单词解释、造句功能
  • Stanford CoreNLP 单词词源查询工具
  • 修改文件结构和命名
  • 优化代码
  • 文章生成器(根据生词和主题,生成文章)

vocabulary's People

Contributors

zhouchangju avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.