GithubHelp home page GithubHelp logo

Pek Yun Ning's Projects

amazon-textract-enhancer icon amazon-textract-enhancer

This workshop demonstrates how to build a Document parser and query engine with Amazon Textract and other services, such as ElasticSearch and DynamoDB.

autofit-excel-cell-widths-using-xlwings icon autofit-excel-cell-widths-using-xlwings

Autofit excel cell widths (using xlwings). Xlwings is an extremely efficient, state-of-the-art python library to manage, edit, and manipulate excel files & data. Its documentation can be viewed at https://readthedocs.org/projects/xlwings/downloads/pdf/stable.

corex_topic icon corex_topic

Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx

corex_topic_modelling icon corex_topic_modelling

Correlation Explanation (Corex) is a topic modelling technique that is great at identifying 'hidden' topics, or low-frequency-worded but representative topics, very well. It was originally created by Greg Ver Steeg.

csv-blank-removal icon csv-blank-removal

Removes blank cells in CSV files using Python. In Python list, it is seen as 'nan'.

doc2vec_topic_modelling icon doc2vec_topic_modelling

Doc2vec method of topic modelling. It's document-level of Word2vec. Builds on the concept of word vector representations.

doc2x_topic_modelling icon doc2x_topic_modelling

Doc2X is a novel topic modelling technique created in June 2019, by yours truly, Pek Yun Ning. It hybridises the older Doc2Vec and Corex topic modelling algorithms to form this all-new algorithm.

docs icon docs

TensorFlow documentation

enumerate_using_python icon enumerate_using_python

A simple implementation of 'enumeration' in Python. In this case, we number webchats from one whole chunk of text filled with tons of webchat entries.

fasttext_topic_modelling icon fasttext_topic_modelling

FastText is a topic modelling technique originally created by Facebook AI Research team. Its first stable version release was on December 2018. FastText is now available on Python's gensim and scikit-learn as well.

lda2char_topic_modelling icon lda2char_topic_modelling

LDA2Char is a novel topic modelling technique created in June 2019, by yours truly, Pek Yun Ning. It hybridises the older LDA and FastText topic modelling algorithms to form this all-new algorithm.

lda2word_topic_modelling icon lda2word_topic_modelling

LDA2Word is a novel topic modelling technique created in June 2019, by yours truly, Pek Yun Ning. It hybridises the older LDA and Word2Vec topic modelling algorithms to form this all-new algorithm.

lda2x_topic_modelling icon lda2x_topic_modelling

LDA2X is a novel topic modelling technique created in June 2019, by yours truly, Pek Yun Ning. It hybridises the older LDA and Corex topic modelling algorithms to form this all-new algorithm.

lda2xpand_topic_modelling icon lda2xpand_topic_modelling

LDA2XPand is a novel topic modelling technique created in June 2019, by yours truly, Pek Yun Ning. It hybridises the older LDA, Corex, and Word2Vec topic modelling algorithms to form this all-new algorithm.

lda_topic_modelling icon lda_topic_modelling

Latent Dirichlet Allocation (LDA) is a topic modelling technique that involves a three-layered probabilistic approach, taking into account words at words, documents, and corpus level. It is accompanied by its very own unique and powerful data visualisation tool, LDAvis (as part of this code in its Pythonic version, pyLDAvis), as well.

logistic_regression_-_confusion_matrix icon logistic_regression_-_confusion_matrix

An application of Logistic Regression and presentation of its accuracy and precision using Confusion Matrix. This was applied to a consumer complaints dataset.

model_augmentation icon model_augmentation

Model Augmentation by adding more sample data points to an existing (small) dataset in a mathematical manner. Carried out using mathematical concepts such as Euclidean Distance and Uniform Criterion (max-of-min concept). This adds exploration factor to the model, reduces prediction error, and improves global accuracy (in optimisation).

mutual-information icon mutual-information

In probability theory and information theory, the mutual information of two random variables is a quantity that measures the mutual dependence of the two random variables. This script performs MI over Mutual Information over discrete random variables

nlp-ner icon nlp-ner

Natural Language Processing and Named Entity Recognition to automatically get specific structured entities from unstructured texts / data input.

nmf_topic_modelling icon nmf_topic_modelling

Non-negative Matrix Factorisation (NMF) is a topic modelling technique that uses matrix factorisation concepts to identify topics and top words that describe each topic.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.