Light

jeremyczhj / fashionai_tianchi_2018 Goto Github PK

View Code? Open in Web Editor NEW

113.0 4.0 45.0 480 KB

FashionAI Global Challenge—Attributes Recognition of Apparel—Ranked 21st solution.

License: MIT License

Python 100.00%

fashionai_tianchi_2018's Introduction

天池大数据竞赛——FashionAI全球挑战赛—服饰属性标签识别
采用多任务学习的策略，比赛最终成绩为决赛21名，在此记录一下在比赛过程中踩过的坑
FashionAI全球挑战赛官方链接，数据集可在此下载
我的CSDN链接
transfer learning multitasks learning keras FashionAI Tianchi big data

环境

ubuntu16.04/windows10
python 3.6.2
keras 2.1.6
tensroflow 1.8.0
opencv-python 3.4
imgaug 0.2.5

使用

因为懒，代码写完就跑，跑完就算，没怎么优化与封装
config.py-------------配置了样本目录等信息
cal_std_mean.py------计算数据集的std与mean
Multitask_train.py----训练脚本
Multitask_predict-----预测脚本
dataset.py------------数据预处理
inceptionv4.py--------Inceptionv4模型API
来这里下载数据集，解压到datasets文件夹下，就可以执行python Multitask_train.py进行训练了

思路分享

迁移学习+多任务学习+模型融合
分别设计了两个多任务模型结构如下：
- 长度类别多输出模型：
- 领子类别多输出模型：
融合Inceptionv4与Inceptionresnetv2，分别进行预测再对结果做平均
对测试集样本进行增广

提高分数的技巧

shuffle很重要
多任务学习比单任务学习成绩提高1-2%
合适的图像增广，推荐使用imgaug，功能强大。dataset.py有详细代码，能提高2-3%
图像标准化，计算本数据集的std与mean，而不是直接用imagenet的std与mean，提高0.5-1%
增大图像输入尺寸可提高分类准确率，提高1-2%
finetune，算力允许的前提下finetune整个模型，对比只训练最后一层提高3-5%
使用Adam先快速收敛，再用SGD慢慢调，效率会比较高
模型融合，提高1-2%
对测试集进行增广，本例选择了镜像，加上旋转5、10、15度进行预测，最后再取平均，提高0.5-1%

fashionai_tianchi_2018's People

Contributors

Stargazers

Watchers

fashionai_tianchi_2018's Issues

the dateset

比赛结束了，但是并没有数据集，想用数据集做服饰检测，能给一个链接地址吗

您好，12GB显存单卡，batch_size=4也会爆显存，有没有什么经验呢

感觉很奇怪呀，请问您设置batch_size=40是用的什么双卡呢，怎么想都想不通耶。。。
而且您写的代码里两个垃圾也都回收了，怎么还会爆显存呢。。。

请教一下模型训练与预测问题

Jeremyczhj, 你好！我最近尝试学习一下阿里的这个比赛，首先感谢你的开源！向你请问两个问题：

1、我在使用Multitask_train.py时候，在你的main主函数里面，发现train('')里面的参数都是length，我将后面改成了train('design')了，这样最后训练结束后，只保存一个模型inceptionv4.h5，请问一下训练过程是不是两次，第一次全部train('length') 第二次全部train('design') 这样保存两个模型。

2、使用Multitask_predict.py预测时候，是否是加载两个分别对length design模型进行分别预测，然后预测结果综合平均的？

about model

Hi,there
thanks for your great works,
recently I'm learning deep learning courses,
but due to my gpu is powerless , I wanna do fine-tune based on your models,
would you like to share the model files you trained?
thanks!

请教一个关于提交结果的问题

Jeremyczhj, 你好！我最近尝试用fashion ai服装属性标签这个数据集来练手，遇到一个很小但是很头疼的问题想请你帮忙。
我提交的结果（csv压缩到zip）总是判定零分，我已经看过了比赛技术圈论坛的讨论，也参考了很多开源代码关于写csv的代码，几次修改格式，都没能解决我的问题，真的很头疼。
可以帮我看一下我的格式哪里有问题吗？

Jeremyczhj, 你好！首先感谢你的开源，我是一名AI初学者，最近尝试学习一下阿里的这个比赛

Jeremyczhj, 你好！首先感谢你的开源，我是一名AI初学者，最近尝试学习一下阿里的这个比赛，在运行Multitask_train.py文件时报错提示没有tqdm文件，请问你是否可以提供这个文件？

在你的train.py文件中，发现你用了多gpu训练

如果我只有1块gpu的话，那我应该怎么改呢？仅仅将gpus这个参数改为1就好了吗？
multi_gpu_model(model, gpus=1)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs