GithubHelp home page GithubHelp logo

mirfani340 / word2vec-id-wikipedia Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 22 KB

word embedding using word2vec model, trained using wikipedia-id for indonesia language

Jupyter Notebook 100.00%
deep-learning python3 word-embeddings word2vec word2vec-model

word2vec-id-wikipedia's Introduction

word2vec-id-wikipedia

What is this

word2vec using custom windows size and custom datasets from wikipedia-id, this projecy are foccused on indonesia (ID) language only.

Datasets used are bellow:

Source : https://dumps.wikimedia.org/idwiki/latest/

File : idwiki-latest-pages-articles.xml.bz2

Pre-trained word2vec models

If you only want to use the pre-trained model for your project, you can use mine instead of manually training your own model.

Always Verify What You Download

Compare what you download with the md5sum that i put on the download link

Link: https://mega.nz/folder/6y5XXY6K#mjaEBjGBETEWrYuL13hS_Q

How to use

Just place the datasets into datasets folder then follow the main.ipynb code

Example of the project folder

.
├── datasets
│   ├── idwiki-latest-pages-articles.xml.bz2
│   └── idwiki_new_lower.txt
├── main.ipynb
├── model
│   ├── idwiki_word2vec_200_skip-gram_window_2_new_lower.model
│   ├── idwiki_word2vec_200_skip-gram_window_2_new_lower.model.syn1neg.npy
│   └── idwiki_word2vec_200_skip-gram_window_2_new_lower.model.wv.vectors.npy
└── README.md

2 directories, 7 files

Posible Error Found:

TypeError: __randomstate_ctor() takes from 0 to 1 positional arguments but 2 were given

Fix

https://stackoverflow.com/questions/75490275/gensim-pickle-error-enable-to-load-the-saved-topic-model

Contact Me

Telegram: @shadow1graves

word2vec-id-wikipedia's People

Contributors

mirfani340 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.