GithubHelp home page GithubHelp logo

konhay / self-service-modeler Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 9.09 MB

Self-service modeling analysis tool based on R language and big data. It integrates SparkR, Rserve, and Mlib machine learning libraries

R 99.45% Shell 0.55%
financial-data logistic-regression mlib r random-forest rserve sparkr

self-service-modeler's Introduction

Self-service modeling

Description

At present, most of the data modeling tools require users to have a high level of programming ability in data processing and model algorithm selection, and the technical threshold is high, and the modeling process cannot be fully automated, which brings no small challenge to the front-line business personnel. At the same time, due to the increasing amount of data to be processed, the traditional modeling process based on R language consumes a lot of time, and can not realize the real-time synchronization of modeling results and client requests. The tool is to solve the problem of self-service and real-time data analysis modeling.

The project has been integrated into the customer relationship management (CRM) system of financial institutions, and has played an important role in precision marketing

Processing flow

Step 1. Define modeling goals

  • Built-in scenario: VIP customer loss warning
  • Custom modeling: Define the modeling target by query criteria

Step 2. Select Customer group

  • Filter by indicator: For example, credit card customer segmentation identifier =2
  • Filter by label: If it is our credit card customer (0 or 1)

Step 3. Generate statistical description report

  • Including max and min, mean and median, standard deviation, missing value, correlation, histogram, etc.

step 4. Selected model algorithm

  • Logistic regression (spark.logit)
  • Decision tree (spark.randomForest)

Step 5. Model execution result

  • Output hit rate, coverage and result set
  • Data statistics (Summary by institutions, e.g. clients, total assets, average holdings)

Technical architecture

微信截图_20240510104534

The tool relies on and requires the use of the R language environment, the SparkR distributed computing environment, and the Rserve component service.

First, the powerful function of R language in the field of statistical analysis and predictive modeling is used to realize data storage and processing, array and matrix operations, and statistical description and mapping.

Second, using the lightweight front end provided by the SparkR distributed computing environment, Apache Spark can be called on R. With the help of various operations such as selection, filtering and aggregation based on distributed data frames provided by SparkR, the processing of massive data sets can be realized. With the MLlib distributed machine learning algorithm library integrated in Spark, the tool makes it easy to build back-end algorithm engines.

Third, use Rserve component service technology to realize the remote call of interactive side to R language server. With the feature that Rserve uses C/S (client/server) mode to call, the interactive side does not need to connect to the R language library, and the purpose of low coupling between the interactive side Java program and the background R program can be realized.

At present, this tool has been well applied in large commercial banks.

Patent Information

SanShan Sun, Bing Han, et al., “Data modeling methods, devices, storage media, and processors”, Invention patent, CN112988119A, publicly available.

self-service-modeler's People

Contributors

konhay avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.