GithubHelp home page GithubHelp logo

paddlepaddle / epep Goto Github PK

View Code? Open in Web Editor NEW
34.0 11.0 8.0 1.35 MB

Easy & Effective Application Framework for PaddlePaddle

License: Apache License 2.0

Python 78.94% Shell 21.06%
deep-learning paddle-frame paddle-quickstart paddle-nccl-multi-gpu paddle-distribute-ps paddle-fleet

epep's Introduction

Easy Paddle, Effective Paddle

Build Status Documentation Status License

EPEP is an Application Framework for PaddlePaddle, to make everyone can easily learn and use. 目前已经被广泛应用在百度内部业务,显著提升单机CPU, 单机GPU, 多机多卡的模型迭代效率

目录

环境搭建

  1. Linux CentOS 6.3, Python 2.7, 获取PaddlePaddle v1.6.1版本以上, 请参考安装指南进行安装

  2. 配置修改conf/var_sys.conf

fluid_bin=/home/epep/tools/paddle_release_home/python/bin/python

#gpu训练配置
cuda_lib_path=/home/epep/tools/cuda-9.0/lib64:/home/epep/tools/cudnn/cudnn_v7.3/cuda/lib64:/home/epep/tools/nccl-2.2_cuda-8.0/lib:$LD_LIBRARY_PATH

框架说明

整体框架

EPEP Frame Overview

EPEP Train Overview

EPEP Pred Overview

使用说明

框架提供了一些NLP的例子,主要包括分类,回归,匹配,标注,翻译,生成等

这里以LR为例,用户只要写20行相关代码即可完成,全是业务模型相关,通过epep轻松一键CPU->GPU, GPU多卡,多机多卡(TODO with Easy-DL)

1. 定义输入

class LinearRegression(BaseDataset):
    def __init__(self, flags):
        super(LinearRegression, self).__init__(flags)
    
    #输入的定义
    def parse_context(self, inputs):
        """
        set inputs_kv: please set key as the same as layer.data.name
        notice:
        (1)
        If user defined "inputs key" is different from layer.data.name,
        the frame will rewrite "inputs key" with layer.data.name
        (2)
        The param "inputs" will be passed to user defined nets class through
        the nets class interface function : net(self, FLAGS, inputs), 
        """
        inputs['x'] = fluid.layers.data(name="x", shape=[self._flags.input_size], dtype="float32")
        inputs['y'] = fluid.layers.data(name="y", shape=[1], dtype="float32")

        context = {"inputs": inputs}
        #set debug list, print info during training
        #debug_list = [key for key in inputs]
        return context

    #解析一行转成parse_context要求的格式, 框架组batch
    def parse_oneline(self, line):
        cols = line.strip("\t\n").split("\t")
        #input_size is size of vector X, 1 is label.
        label = [0]
        if len(cols) >= self._flags.input_size + 1:
            label = [float(cols[self._flags.input_size])]
        input_list = [float(x) for x in cols[:self._flags.input_size]]
        yield ("x", input_list),\
              ("y", label)
    
    #也可以自己组batch, 配置reader_batch设置True
    def parse_batch(self, data_gen):
        for d in data_gen():
            d = self.parse_oneline(d)
            ....

2. 组网

class LinearRegression(BaseNet):
    def __init__(self, FLAGS):
        super(LinearRegression, self).__init__(FLAGS)

    def net(self, inputs):
        """
        linear regression interface
        """
        y_predict = fluid.layers.fc(input=inputs["x"], size=1, act=None)
        cost = fluid.layers.square_error_cost(input=y_predict, label=inputs["y"]) 
        avg_cost = fluid.layers.mean(cost)
        
        # debug output info during training
        debug_output = collections.OrderedDict()
        debug_output['y'] = inputs["y"]
        debug_output['y_predict'] = y_predict
        
        #下面几个字段必须,框架依赖
        model_output = {}
        net_output = {"debug_output": debug_output, 
                      "model_output": model_output}
        
        if self.is_training:
            #默认是Adam,如果要自定义optimizer
            #optimizer = fluid.optimizer.SGD(learning_rate=self._flags.base_lr)
            #net_output['optimizer'] = optimizer

            net_output["loss"] = avg_cost

        #预测使用
        model_output['feeded_var_names'] = ["x"]
        model_output['fetch_targets'] = [y_predict]

        return net_output


    #还可根据需要自定义下面3个函数
    
    #训练打印的日志
    def train_format(self, result, global_step, epoch_id, batch_id):
    
    #从第三方的模型参数加载初始化一些变量
    def init_params(self, place):
    
    #预测的输出格式
    def pred_format(self, result, **kwargs):

3. 配置&运行

#基本配置
#如果模型需要自定义参数,只需要在配置文件直接加xxx就行,不需要代码里提前定义xxx, 就可以引用self._flags.xxxx
[DEFAULT]
#自定义实现的dataset类名
dataset_name: LinearRegression
#file_list prior to dataset_dir
file_list: ./test/linear_regression.data
dataset_dir: ../tmp/data/lr
#only read file match pattern in dataset_dir
file_pattern: part-
#Model settings,自定义实现的net类名
model_name: LinearRegression

3.1 训练

[Train]
base_lr: 0.01
max_number_of_steps: None
#Number of epochs from dataset source
num_epochs_input: 100
#The frequency with which logs are print
log_every_n_steps: 10
#The frequency with which the model is saved, in steps.
save_model_steps: 100

#默认是CPU,如果要GPU, 确保conv/var_sys.conf的cuda_lib_path配置
platform: local-gpu

#单卡或多卡
CUDA_VISIBLE_DEVICES: 0,1

sh run.sh -c conf/linear_regression/linear_regression.local.conf [-m train]

3.2 预测

[Evaluate]

#for predict, init_pretrain_model prior to eval_dir, and can change the net by train saved
#init_pretrain_model: ../tmp/model/lr/save_model/checkpoint_final
#默认是CPU,如果要GPU, 确保conv/var_sys.conf的cuda_lib_path配置
platform: local-gpu

#单卡就行
CUDA_VISIBLE_DEVICES: 0

sh run.sh -c conf/linear_regression/linear_regression.local.conf -m predict

3.3 边训练边评估

TODO

4. 总结

用户只要关注3个:conf/xxx/xxx.local.conf, datasets/xxx.py, nets/xxx.py, 保证路径位置是这样即可

新模型开发比较快的就是 cp这lr的3个文件到对应位置,然后重命名下再去修改对应的配置和代码即可。

Contributing|贡献

本项目目标是让策略开发高效,愉快的做一个炼丹师,欢迎贡献!

TODO

  1. Rank, NMT, ERNIE, 分类,匹配,序列标注,Attention,视觉等AI先进模型

  2. 自动超参数搜索

  3. 预测server

  4. 分布式Easy-DL, Hadoop/Spark预测

  5. ...

epep's People

Contributors

anpark avatar joyo2016 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

epep's Issues

batch_norm效果异常求解答

实现了一个ranknet,网络是fc-bn-fc-bn-fc-bn-fc-bn-fc-bn-fc,fc神经元的数量除了最后一层是1,其他都是32. fc没有激活,bn用的relu。 输入是 样本label永远是1,左边是pos,右边是neg,左边永远大于右边;设计是希望pos和neg走同一个上面的网络,计算出最后一层结果,然后进入margin_rank_loss,所以永远是pos-neg,然后做sigmoid和二元交叉熵logloss,最后取均值作为loss。
逻辑是,希望网络能够根据特征计算相关性得分,然后pos-neg差值越大越好。

现在问题是,训练时loss迅速降为0.001,预测时效果很差。加载训练好的模型再进行训练,loss 0.001,但是一旦bn使用全局状态(use_global_stats设为True,不更新bn参数,模拟预测),立刻loss暴涨。 辛苦帮忙看下是bn层有异常么?
其中一层FC-bn
BaiduHi_2019-11-26_10-41-4

最后一层fc
2

loss的构成
3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.