GithubHelp home page GithubHelp logo

foscraft / beatrice-project Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 3.26 MB

BeatriceVec is a powerful Python package/tool designed for generating word embeddings in the dimension of 600, without relying on any third-party packages.

License: Apache License 2.0

Python 99.02% Shell 0.98%

beatrice-project's Introduction

BeatriceVec

Python Github Downloads

BeatriceVec Logo
BeatriceVec is a powerful Python package/tool designed for generating word embeddings in the dimension of 600, without relying on any third-party packages. Word embeddings are vector representations of words that capture semantic relationships and meaning in a numerical format, enabling various natural language processing (NLP) tasks such as word similarity, text classification, and information retrieval.

With BeatriceVec, users can transform textual data into meaningful vector representations. These embeddings can capture semantic relationships between words, enabling algorithms and models to understand context and similarities between different words. This capability proves particularly useful in tasks such as sentiment analysis, language translation, and recommendation systems.

Create your embeddings with BeatriceVec and use them to query your model locally without using the internet.

It utilizes a dimensionality of 600, providing a rich representation space that can capture nuanced semantic information. By incorporating a higher dimensionality, the embeddings can potentially encode more complex relationships and capture finer-grained distinctions between words, leading to improved performance in downstream NLP tasks.

The package offers a user-friendly interface and straightforward API, making it accessible for both beginners and experienced practitioners. It provides functions to train custom word embeddings on user-specific text corpora, allowing users to fine-tune embeddings according to their specific domain or application requirements.

It empowers developers and researchers to explore the world of word embeddings and leverage the power of contextual word representations in their NLP projects. Its self-contained implementation, high-dimensional embeddings, and ease of use make it a valuable tool for tasks such as text analysis, information retrieval, and language understanding.

Overall, BeatriceVec is a reliable and efficient Python package for generating word embeddings, offering flexibility, performance, and ease of use to enhance various NLP applications and empower developers in the field of natural language processing.

Installation

Install package or wheel, both are found in dist folder

#WHEEL
pip install beatricevec-1.0.1-py3-none-any.whl
#PACKAGE
pip install beatricevec-1.0.1.tar.gz

Download the wheel or package here

Usage

from beatricevec import BeatriceVec

corpus = ["I am learning", "Natural language processing", "with BeatriceVec"]
embedder = BeatriceVec(corpus)
embedder.build_vocab()
embedder.initialize_word_vectors()
embedder.train()

embeddings = embedder.get_embeddings()

for embedding in embeddings:
    print(embedding)

Documentation

Methods
  • build_vocab(): Builds the vocabulary from the corpus.
  • initialize_word_vectors(): Initializes the word vectors with random values.
  • train(): Trains the embedding model using the Word2Vec algorithm.
  • update_vector(vector: list, context_vector: list): Updates the target word vector using gradient descent.
  • get_embeddings() -> list: Retrieves the embeddings for all words in the vocabulary.
  • get_embedding(word: str) -> list: Retrieves the embedding vector for a given word.

License

BeatriceVec is released under the Apache2.0 License.

How to CONTRIBUTE

beatrice-project's People

Contributors

foscraft avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.