GithubHelp home page GithubHelp logo

bkirev / stop-words-list Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ddhira123/stop-words-list

0.0 0.0 0.0 101 KB

The stop words list for all languages around the world made by the contributors around the world! Start your contributions now!

License: MIT License

Python 100.00%

stop-words-list's Introduction

Stop-Words-List

About How to contribute Rules

License: MIT Open Source Love svg1 PRs Welcome GitHub contributors GitHub Hacktoberfest combined status first-timers-onlycontributions welcome ย  MaintenanceGitHub forks GitHub Repo stars

A beginner friendly project to help you in open source contributions. An attempt to bring the stop words lists from all languages around the world.

What is stop word?

^ back to top ^

In computing, stop words are words which are filtered out before or after processing of natural language data.

- Wikipedia -

In SEO terminology, stop words are the most common words that most search engines avoid, for the purposes of saving space and time in processing of large data during crawling or indexing. This helps search engines to save space in their databases. For example, at, which, is, the, and are some words categorized as stop words.

How to Contribute?

^ back to top ^

There are 3 ways to contribute in this repo:

  • Add new stop words list file.
  • Edit and do some improvements to existing stop words list.
  • Enhance the python script parser.py so it can sort the words for all languages based on the respective language dictionary.

Here are the steps to contribute to this repo:

  1. Fork this repository

  2. Clone the repository to your local

    git clone https://github.com/<YOUR-USERNAME>/Stop-Words-List.git

  3. Create a .txt file in list/ directory and rename it to following format: [YOUR_LANGUAGE_IN_ENGLISH].txt. For example:

    • english.txt
    • chinese.txt
    • arabic.txt

    Ignore this step if your language stop words list has already exist in this repo.

  4. Put the stop words list in the respective file you have made on step 3/existing stop words list file. Place only one word in one line! If you are editing the existing stop words list file, please DO NOT DELETE/EDIT anything that already exist. Please ensure that the words you want to add to list have not exist yet in the txt file.

  5. Don't forget to put your name in CONTRIBUTORS.md and follow the format there.

  6. Save the files.

  7. Install the dependencies for parser.py and run the script.

    pip install -r requirements.txt
    python parser.py
    

    This script helps you to rearrange the list and sanitize the words.

  8. Commit and push to your forked repository.

  9. Create the pull request.

  10. Congratulations! You have made the priceless contribution.

Contributing Rules

^ back to top ^

  • Place only one word in one line in the stop words list txt file.

  • To be counted as a contribution, you need to add at least 10 lines in your respective language file.

  • Please double-check the whole list and ensure the list satisfies these requirements:

    • No any duplicate words.
    • All the words in the list, if they are considered as alphabet/LATIN then they must be lowercase.
    • Make sure the word list is sorted according to the dictionary.
  • DO NOT DELETE the previous contributors' names in the CONTRIBUTIORS.md

  • When filling the CONTRIBUTORS.md, please make sure the list is arranged in dictionary order based on the language name.

  • PRs will be merged if and only if it satisfies all the rules.

stop-words-list's People

Contributors

0zymandias11 avatar aayush-bhatt avatar avouros avatar bkirev avatar dahjah avatar ddhira123 avatar freddsr avatar gpsantoz avatar hungthezorba avatar itsashna avatar japneetsingh5 avatar kovacro avatar mesps avatar myqbr avatar rafidrd avatar ranahaani avatar reactorboy avatar shreyventure avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.