The hillaryemail from guoguocai

View Code? Open in Web Editor NEW

This is an example of machine-learning for LDA, data is derived from Hilary's mails.

Python 91.92% C 1.63% C++ 1.26% CSS 0.13% TeX 0.07% JavaScript 0.03% Fortran 0.03% Smarty 0.01% Makefile 0.01% MATLAB 0.01% Batchfile 0.01% PowerShell 0.03% Tcl 4.88%

hillaryemail's Introduction

LDA 模型应用

一眼看穿希拉里的邮件

LDA（Latent Dirichlet Allocation）是一种非监督机器学习技术，可以用来识别大规模文档集（Document Collection）或语料库（Corpus）中潜藏的主题信息。每一篇文档代表了一些主题所构成的一个概率分布，而每一个主题又代表了很多单词所构成的一个概率分布。

此 LDA 实例以希拉里来往的邮件为数据来源，经过一系列的处理之后，我们可以很容易知道她在每封邮件中都聊了些什么。

希拉里的所有邮件数据都在 resources 文件夹下的 HillaryEmails.csv 文件中。

How to use

下载并导入项目后运行 HillaryEmail.py 文件。

运行结果

运行成功后会得到各个主题及每个主题中高频单词的概率分布。

经过这样的训练之后，当我们再传入一个新的文本或单词的时候，就可以知道它们属于哪一个主题。

具体的测试代码在 HillaryEmail.py 文件中的 75 - 83 行。

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

guoguocai / hillaryemail Goto Github PK