Light

five-5 / maxson-ml Goto Github PK

View Code? Open in Web Editor NEW

0.0 1.0 0.0 275.15 MB

The ML experiment of Maxson

Python 100.00%

maxson-ml's Introduction

Maxson-ML

📖 简介

这是Maxson系统中的JSONPath预测器部分的实现代码。代码部分使用了中文注释，关于json配置文件的各项说明见 configuration.md。

JSONPath Predictor概览图：

Maxson-ML运行流程图：

目录介绍
- data ：数据集文件夹， full_set_group_json.txt 是训练时所用的数据集。predict.txt train.txt 预测及训练数据集，14_feature_xx.txt 根据数据集构造特征生成文件，作为模型输入。prediction文件夹下为预测数据，其格式为：JSONPath^^次数^^时间。
- model ：训练生成模型文件夹。
- bconfig.json : 构建特征时所需的配置文件。
- buildFeature.py : 构建输入模型特征数据程序。
- config.json : lstm_crf 模型所需配置文件，训练或预测时都需要。
- lstmcrf_pred.py ：获取lstm_crf 模型预测结果程序。
- lstmcrf_train.py ：训练 lstm_crf 模型程序。
- preprocess.py ：预处理程序，主要用于检查数据是否遗漏天数，补全未出现数据次数为0，划分数据集，训练Word2Vec模型等操作。
- wordModel_50_4.model ：训练生成的Word2Vec模型。

🔧 环境

Python3.7及以上
- torch 1.0.1
- numpy 1.16.2
- gensim 3.8.1
- sklearn-crfsuite 0.3.6

🎬 运行

准备好采集完的数据，作为输入，格式可以自定义但需要与 bconfig.json 中定义一致。
运行 Preprocess.py 检查数据完整性，划分训练数据集和预测数据集（使用时无需划分），通过输入数据训练Word2Vec模型（用于将JSONPath等特征向量化）。
```
python Preprocess.py
```
运行 buildFeature.py 构建特征用于 lstm_crf 模型的输入.

./xx filepath configpath

（bconfig.json 中定义了数据的属性及一些特征的参数, 比如窗口大小，Word2Vec模型的位置等。）
```
python buildFeature.py data/predict.txt bconfig.json
```
运行 lstm_crf_pred.py 用于预测结果。若需要训练模型则先运行 lstm_crf_train.py 再运行预测即可。整个预测器的输出在预测文本文件中，其位置在 config.json 文件中定义。
```
python lstm_crf_pred.py config.json
```

maxson-ml's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs