GithubHelp home page GithubHelp logo

kensk8er / chicksexer Goto Github PK

View Code? Open in Web Editor NEW
82.0 7.0 29.0 131 KB

A Python package for gender classification.

License: MIT License

Python 99.89% Shell 0.11%
natural-language-processing python gender-classification machine-learning tensorflow deep-learning recurrent-neural-networks lstm nlp neural-network

chicksexer's Introduction

chicksexer - Python package for gender classification

Chicksexer

chicksexer is a Python package that performs gender classification. It receives a string of person name and returns the probability estimate of its gender as follows:

>>> from chicksexer import predict_gender
>>> predict_gender('John Smith')
{'female': 0.0027230381965637207, 'male': 0.9972769618034363}

Several merits of using the classifier instead of simply looking up known male/female names are:

  • Sometimes simple name lookup does not work. For instance, "Miki" is a Japanese female name, but also a Croatian male name.
  • Can predict the gender of a name that does not exist in the list of male/female names.
  • Can deal with a typo in a name relatively easily.

You can also get an estimate as a simple string as follows:

>>> predict_gender('Oliver Butterfield', return_proba=False)
'male'
>>> predict_gender('Naila Ata', return_proba=False)
'female'
>>> predict_gender('Saldivar Anderson', return_proba=False)
'neutral'
>>> predict_gender('Ponyo', return_proba=False)  # name of a character from the film
'male'
>>> predict_gender('Ponya', return_proba=False)  # modify the name such that it sounds like a female name
'female'
>>> predict_gender('Miki Suzuki', return_proba=True)  # Suzuki here is a Japanese surname so Miki is a female name
{'female': 0.9997618066990981, 'male': 0.00023819330090191215}
>>> predict_gender('Miki Adamić', return_proba=True)  # Adamić is a Croatian surname so Miki is a male name
{'female': 0.16958969831466675, 'male': 0.8304103016853333}
>>> predict_gender('Jessica')
{'female': 0.999996105068476, 'male': 3.894931523973355e-06}
>>> predict_gender('Jesssica')  # typo in Jessica
{'female': 0.9999851534785194, 'male': 1.4846521480649244e-05}

If you want to predict the gender of multiple names, use predict_genders (plural) function instead:

>>> from chicksexer import predict_genders
>>> predict_genders(['Ichiro Suzuki', 'Haruki Murakami'])
[{'female': 3.039836883544922e-05, 'male': 0.9999696016311646},
 {'female': 1.2040138244628906e-05, 'male': 0.9999879598617554}]
>>> predict_genders(['Ichiro Suzuki', 'Haruki Murakami'], return_proba=False)
['male', 'male']

Installation

  • This repository can run on Ubuntu 14.04 LTS & Mac OSX 10.x (not tested on other OSs)
  • Tested only on Python 3.5

chicksexer depends on NumPy and Scipy, Python packages for scientific computing. You might need to have them installed prior to installing chicksexer.

You can install chicksexer by:

pip install chicksexer

chicksexer also depends on tensorflow package. In default, it tries to install the CPU-only version of tensorflow. If you want to use GPU, you need to install tensorflow with GPU support by yourself. (C.f. Installing Tensorflow)

Model Architecture

The gender classifier is implemented using Character-level Multilayer LSTM. The architecture is roughly as follows:

  1. Character Embedding Layer
  2. 1st LSTM Layer
  3. 2nd LSTM Layer
  4. Pooling Layer
  5. Fully Connected Layer

The fully connected layer outputs the probability of a name bing a male name. For the details, look at _build_graph() method in chicksexer/_classifier.py, which implements the computational graph of the architecture in tensorflow.

Training Data

Names with gender annotation are obtained from the sources as follows:

chicksexer's People

Contributors

kensk8er avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

chicksexer's Issues

problem with installation

Message from pip:

Could not find a version that satisfies the requirement tensorflow==1.0.1 (from chicksexer) (from versions: 0.12.1, 1.0.0, 1.1.0rc0, 1.1.0rc1, 1.1.0rc2, 1.1.0, 1.2.0rc0, 1.2.0rc1, 1.2.0rc2, 1.2.0, 1.2.1, 1.3.0rc0, 1.3.0rc1, 1.3.0rc2, 1.3.0, 1.4.0rc0, 1.4.0rc1, 1.4.0, 1.4.1, 1.5.0rc0, 1.5.0rc1, 1.5.0)
No matching distribution found for tensorflow==1.0.1 (from chicksexer)

tensorflow.contrib removed

Hi.
tensorflow.contrib has been removed and causing it to be incompatible.

"from tensorflow.contrib.tensorboard.plugins import projector" >>> Error: 'No module named 'tensorflow.contrib'

I was wondering if you had time to update.

:)

Cheers~

Solution for package compatibility errors

If you run into package compatibility errors, try the following:

  1. Create a virtuel environment with conda or python, use version 3.5
conda create --name py35 python=3.5.
conda activate py35
  1. Install chicksexer
pip install chicksexer
  1. Conda/Python will likely install the newest package versions because of the requirements.txt, so you have to reinstall the necessary packages
pip install regex==2017.4.29
pip install docopt==.6.2
pip install scipy==0.18.1
  1. Use pip list to make sure all necessary packages are installed with the following version numbers:
numpy==1.12.1
tensorflow==1.0.1
scikit-learn==0.18.1
scipy==0.18.1
docopt==0.6.2
regex==2017.4.29
  1. Start Python with the Python command from the shell, now you should be able to import chicksexer:
from chicksexer import predict_gender
predict_gender('John Smith')

Version Incompatibility and Installation Issue

Currently there is no way to use chicksexer in either Python 3.5.0 and Python 3.8.2.

For the latter case, I mentioned it in this comment. To recap, you will be able to successfully install chicksexer in Python 3.8.2 but will receive this error: ModuleNotFoundError: No module named 'tensorflow.contrib'. I think this is due to a naming changing in Tensorflow 2. The obvious workaround was to use Python 3.5.0.

However, here I am noting that even with Python 3.5.0, the solution given in this issue still doesn't work. I am receiving the following error:

Collecting scipy>=0.18.1 (from chicksexer)
  Downloading https://files.pythonhosted.org/packages/26/68/84dbe18583e79e56e4cee8d00232a8dd7d4ae33bc3acf3be1c347991848f/scipy-1.6.1.tar.gz (27.3MB)
    100% |################################| 27.3MB 24kB/s
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 20, in <module>
      File "C:\Users\szinu\AppData\Local\Temp\pip-build-12wmumib\scipy\setup.py", line 31, in <module>
        raise RuntimeError("Python version >= 3.7 required.")
    RuntimeError: Python version >= 3.7 required.

I suppose this has something to do with this pull request: #11

In other words, currently, there is no way to run chicksexer at all!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.