GithubHelp home page GithubHelp logo

unijoy / data-pipeline-for-sina-weibo-interaction-prediction Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jeremyli28/data-pipeline-for-sina-weibo-interaction-prediction

0.0 2.0 0.0 44.19 MB

Data pipeline for Sina Weibo Interaction-prediction

Jupyter Notebook 99.94% Python 0.06%

data-pipeline-for-sina-weibo-interaction-prediction's Introduction

Sina Weibo Interaction-prediction

Introduction

The Competition's detail can be find here
Basically the competition is about analyzing users' behaviors and messages they post on the Chinese micro-blog platform, and predicting the number of forwarding, comment and like on each message.

This project mainly use python and pandas.

The Stage 2 of this competition is still ongoing. Here is the data pipline I built for Stage 1.

Design

This is a self-designed data pipline. The main thought is modularizing the process of a data project.

  • User write methods to generate features, which stored as DataFrame in Pandas in features folder, and the feature.log will automatically record all existing features and their parameters.
  • User can combine different features and select different models in the Train method, the model will be store in models folder, the model's information will be stored in train.log.
  • User choose different combination of features and parameters for testing, the results will be store in results folder and the test information will be stored at test.log
  • Ipython notebooks in notebooks folder is for playing around data, watching logs iteratively.
  • The code locate in weiboPredict package.

When I do data project before, the problems of managing different version of features, models, results and naming them differently are killing me. This simple data pipeline solve my problem, cheers!

Future Work

Sadly, the second stage of the competition is on Alibaba's ODPS platform with SQL and java, I don't have the chance to develop this framework further right now. The pipeline is still a little problem-specific and I want to build it for more general purpose in the furture.

data-pipeline-for-sina-weibo-interaction-prediction's People

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.