GithubHelp home page GithubHelp logo

wordlit's Introduction

Wordlit.net

Wordlit.net is aimed at demystifying the decisions and behaviors of algorithms in Natural Language Processing (NLP). It visualizes the relationships and entities extracted from text, offering insights into how NLP algorithms interpret and process language. The application is helpful for NLP researchers, data scientists, and enthusiasts keen on understanding the workings of computational linguistics. Wordlit.net currently supports input for file types of PDF, Word, and TXT.

Wordlit-Home

Key Features

  • Entity Extraction: Leverages spaCy's NLP capabilities to identify entities in the text.
  • Knowledge Graph Construction: Builds a graph using NetworkX, linking entities based on their relationships.
  • Interactive Visualization: Utilizes Plotly and Streamlit for dynamic graph visualization.
  • Customizable Graph Parameters: Offers options to adjust layout spacing, color scheme, node size, and more.
  • Graph Analytics: Provides statistics like node and edge counts, graph density and centrality measures.
  • Text Analytics: Calculates various text statistics such as token counts, sentence lengths, and unique tokens.

Installation

To use this tool, you need to install the following dependencies:

pip install spacy networkx transformers streamlit plotly matplotlib pandas

Install dependencies

Don't forget to download the spaCy language model:

python -m spacy download en_core_web_sm

Download the spaCy language model

Usage

1. Start the Streamlit App: Run the app using Streamlit

streamlit run wordlit.py

Run the app using Streamlit

2. Input Text: You can input the text by uploading a file, inputting a website URL or pasting it directly into the text area provided.

Input Text

3. Customize Graph: Adjust the graph parameters like layout spacing, node size, and color scheme using the sidebar options.

Customize Graph

4. Generate Graph: Select 'Generate Graph' to visualize the knowledge graph based on your text.

Generate Graph

Generate Graph

5. Explore Graph Analytics: View various statistics and metrics related to the generated graph and the input text.

Explore Graph Analytics

Examples

Below is an example of a knowledge graph generated from a file. The nodes represent entities, and edges represent their relationships. Each node's size corresponds to its connection degree, and colors vary based on the selected color scheme.

Upload.a.File.mp4

An example of a knowledge graph generated from text.

Enter.Text.Manually.mp4

An example of a knowledge graph generated from a website URL.

Enter.Website.URL.mp4

Tech Stack

Python: The entire code is written in Python.

Spacy: An open-source software library for advanced Natural Language Processing (NLP) in Python. It is used for tokenization, named entity recognition (NER), part of speech tagging, and dependency parsing.

NetworkX: A Python library used for building and analyzing network graphs.

Streamlit: An open-source Python library used to build and run the web application.

Plotly: This is a graphing library used for creating interactive knowledge graph visualizations.

Pandas: An open-source data analysis and manipulation tool built on top of the Python programming language.

Time Module: A Python module that is used here for tracking processing time.

Python-Docx: A Python library for creating and updating Microsoft Word (.docx) files.

Pdfplumber: Used for extracting text from PDF files. It allows detailed access to text, tables, and metadata in PDFs.

Requests: A simple HTTP library for Python, used to send HTTP requests easily.

Beautiful Soup (bs4): A Python library used here to parse HTML content.

Contributing

Contributions to enhance Wordlit.net are welcome. Feel free to fork the repository, make changes, and create a pull request.

License

All code contributed to Wordlit.net © 2024 by Sahir Maharaj is licensed under Attribution 4.0 International

When using the code from Wordlit.net, please credit as follows:

Code sourced from Wordlit.net, authored by Sahir Maharaj, 2024.

Contact

Report a bug or request a feature: [email protected]

LinkedIn: Sahir Maharaj

wordlit's People

Contributors

sahirmaharaj avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.