GithubHelp home page GithubHelp logo

takapy0210 / nlplot Goto Github PK

View Code? Open in Web Editor NEW
232.0 4.0 13.0 6.7 MB

Visualization Module for Natural Language Processing

License: MIT License

Python 100.00%
visualization nlp plotly wordcloud analytics python

nlplot's Introduction

📝 nlplot

nlplot: Analysis and visualization module for Natural Language Processing 📈

Description

Facilitates the visualization of natural language processing and provides quicker analysis

You can draw the following graph

  1. N-gram bar chart
  2. N-gram tree Map
  3. Histogram of the word count
  4. wordcloud
  5. co-occurrence networks
  6. sunburst chart

(Tested in English and Japanese)

Requirement

Installation

pip install nlplot

I've posted on this blog about the specific use. (Japanese)

And, The sample code is also available in the kernel of kaggle. (English)

Quick start - Data Preparation

The column to be analyzed must be a space-delimited string

# sample data
target_col = "text"
texts = [
    "Think rich look poor",
    "When you come to a roadblock, take a detour",
    "When it is dark enough, you can see the stars",
    "Never let your memories be greater than your dreams",
    "Victory is sweetest when you’ve known defeat"
    ]
df = pd.DataFrame({target_col: texts})
df.head()
text
0 Think rich look poor
1 When you come to a roadblock, take a detour
2 When it is dark enough, you can see the stars
3 Never let your memories be greater than your dreams
4 Victory is sweetest when you’ve known defeat

Quick start - Python API

import nlplot
import pandas as pd
import plotly
from plotly.subplots import make_subplots
from plotly.offline import iplot
import matplotlib.pyplot as plt

%matplotlib inline

# target_col as a list type or a string separated by a space.
npt = nlplot.NLPlot(df, target_col='text')

# Stopword calculations can be performed.
stopwords = npt.get_stopword(top_n=30, min_freq=0)

# 1. N-gram bar chart
fig_unigram = npt.bar_ngram(
    title='uni-gram',
    xaxis_label='word_count',
    yaxis_label='word',
    ngram=1,
    top_n=50,
    width=800,
    height=1100,
    color=None,
    horizon=True,
    stopwords=stopwords,
    verbose=False,
    save=False,
)
fig_unigram.show()

fig_bigram = npt.bar_ngram(
    title='bi-gram',
    xaxis_label='word_count',
    yaxis_label='word',
    ngram=2,
    top_n=50,
    width=800,
    height=1100,
    color=None,
    horizon=True,
    stopwords=stopwords,
    verbose=False,
    save=False,
)
fig_bigram.show()


# 2. N-gram tree Map
fig_treemap = npt.treemap(
    title='Tree map',
    ngram=1,
    top_n=50,
    width=1300,
    height=600,
    stopwords=stopwords,
    verbose=False,
    save=False
)
fig_treemap.show()


# 3. Histogram of the word count
fig_histgram = npt.word_distribution(
    title='word distribution',
    xaxis_label='count',
    yaxis_label='',
    width=1000,
    height=500,
    color=None,
    template='plotly',
    bins=None,
    save=False,
)
fig_histgram.show()


# 4. wordcloud
fig_wc = npt.wordcloud(
    width=1000,
    height=600,
    max_words=100,
    max_font_size=100,
    colormap='tab20_r',
    stopwords=stopwords,
    mask_file=None,
    save=False
)
plt.figure(figsize=(15, 25))
plt.imshow(fig_wc, interpolation="bilinear")
plt.axis("off")
plt.show()


# 5. co-occurrence networks
npt.build_graph(stopwords=stopwords, min_edge_frequency=10)
# The number of nodes and edges to which this output is plotted.
# If this number is too large, plotting will take a long time, so adjust the [min_edge_frequency] well.
# >> node_size:70, edge_size:166
fig_co_network = npt.co_network(
    title='Co-occurrence network',
    sizing=100,
    node_size='adjacency_frequency',
    color_palette='hls',
    width=1100,
    height=700,
    save=False
)
iplot(fig_co_network)


# 6. sunburst chart
fig_sunburst = npt.sunburst(
    title='sunburst chart',
    colorscale=True,
    color_continuous_scale='Oryel',
    width=1000,
    height=800,
    save=False
)
fig_sunburst.show()


# other
# The original data frame of the co-occurrence network can also be accessed
display(
    npt.node_df.head(), npt.node_df.shape,
    npt.edge_df.head(), npt.edge_df.shape
)

Document

TBD

Test

cd tests
pytest

Other

nlplot's People

Contributors

ams-ony avatar chottokun avatar nyk510 avatar snrsw avatar takapy0210 avatar thatch avatar upura avatar wakame1367 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

nlplot's Issues

ビルド実行時にエラーが出る

ビルド実行時にエラーが出る

ビルドを実行すると下記のようなエラーが出てしまいます。その他ビジュアライズは正常に動きますが、ビルドが出来ないため共起ネットワークとサンバーストが描けませんでした。

npt.build_graph(
    stopwords=stopwords,
    min_edge_frequency=10
)
     83     if (n_communities < 1) or (n_communities > N):
     84         raise ValueError(
---> 85             f"n_communities must be between 1 and {N}. Got {n_communities}"
     86         )
     87 

ValueError: n_communities must be between 1 and 0. Got 1

Missing files in sdist

It appears that the manifest is missing at least one file necessary to build
from the sdist for version 1.0.5. You're in good company, about 5% of other
projects updated in the last year are also missing files.

+ /tmp/venv/bin/pip3 wheel --no-binary nlplot -w /tmp/ext nlplot==1.0.5
Looking in indexes: http://10.10.0.139:9191/root/pypi/+simple/
Collecting nlplot==1.0.5
  Downloading http://10.10.0.139:9191/root/pypi/%2Bf/268/60eab173095a1/nlplot-1.0.5.tar.gz (966 kB)
    ERROR: Command errored out with exit status 1:
     command: /tmp/venv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-wheel-52cua8_n/nlplot/setup.py'"'"'; __file__='"'"'/tmp/pip-wheel-52cua8_n/nlplot/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-wheel-52cua8_n/nlplot/pip-egg-info
         cwd: /tmp/pip-wheel-52cua8_n/nlplot/
    Complete output (7 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-wheel-52cua8_n/nlplot/setup.py", line 26, in <module>
        install_requires=read_requirements(),
      File "/tmp/pip-wheel-52cua8_n/nlplot/setup.py", line 11, in read_requirements
        with open(reqs_path, 'r') as f:
    FileNotFoundError: [Errno 2] No such file or directory: './requirements.txt'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

pyLDAvisのサポートをやめる

概要

現在のpyLDAvisの最新バージョンは3.3.1だが、これに付随するライブラリのインストール要件が、機械学習系の他ライブラリと競合しており、個人的にはtensorflowなどを優先した方が良いと判断したため.

3.3.1時点のrequirements(cf. https://github.com/bmabey/pyLDAvis/blob/master/requirements.txt

numpy>=1.20.0
scipy
pandas>=1.2.0
joblib
jinja2
numexpr
future
funcy
sklearn
scikit-learn
gensim
setuptools

最新版をインストールすると以下のようなエラーがでる

tensorflow 2.4.1 requires numpy~=1.19.2, but you have numpy 1.21.0 which is incompatible.
pdpbox 0.2.1 requires matplotlib==3.1.1, but you have matplotlib 3.4.0 which is incompatible.
optuna 2.6.0 requires numpy<1.20.0, but you have numpy 1.21.0 which is incompatible.
matrixprofile 1.1.10 requires protobuf==3.11.2, but you have protobuf 3.15.6 which is incompatible.
bokeh 2.3.0 requires tornado>=5.1, but you have tornado 5.0.2 which is incompatible.
autogluon-core 0.1.0 requires numpy==1.19.5, but you have numpy 1.21.0 which is incompatible.

関連ISSUE

[Question]Does the library only work with notebook?

The following line of code will give you an error if you run it in a normal Python execution environment.

pyLDAvis.enable_notebook()

tests/test_nlplot.py:None (tests/test_nlplot.py)
test_nlplot.py:4: in <module>
    from nlplot import NLPlot
..\nlplot\__init__.py:1: in <module>
    from nlplot.nlplot import *
..\nlplot\nlplot.py:21: in <module>
    pyLDAvis.enable_notebook()
..\..\..\.virtualenvs\nlplot\lib\site-packages\pyLDAvis\_display.py:311: in enable_notebook
    formatter = ip.display_formatter.formatters['text/html']
E   AttributeError: 'NoneType' object has no attribute 'display_formatter'

import nlplotでエラーが発生する。

cf. takapy0210/geek_blog#1 (comment)


pip install nlplot自体はエラーなく完了しましたが、import nlplotを実行すると以下のエラーが発生してしまいます。

環境はwin10, Python 3.6.5 |Anacondaです。

import nlplot
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\******myenv\lib\site-packages\nlplot_init_.py", line 1, in
from nlplot.nlplot import *
File "C:\Users\******\myenv\lib\site-packages\nlplot\nlplot.py", line 15, in
import seaborn as sns
File "C:\Users\******\myenv\lib\site-packages\seaborn_init_.py", line 2, in
import matplotlib as mpl
File "C:\Users******\myenv\lib\site-packages\matplotlib_init_.py", line 174, in
check_versions()
File "C:\Users\******\myenv\lib\site-packages\matplotlib_init
.py", line 159, in _check_versions
from . import ft2font
ImportError: DLL load failed: 指定されたモジュールが見つかりません。

エラー原因わかりますでしょうか?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.