GithubHelp home page GithubHelp logo

kevwan / mapreduce Goto Github PK

View Code? Open in Web Editor NEW
161.0 3.0 23.0 46 KB

A in-process MapReduce library to help you optimizing service response time or concurrent task processing.

Home Page: https://go-zero.dev

License: MIT License

Go 100.00%
go golang mapreduce mapreduce-go concurrent-programming concurrent

mapreduce's Introduction

mapreduce

English | 简体中文

Go codecov Go Report Card Release License: MIT

Why we have this repo

mapreduce is part of go-zero, but a few people asked if mapreduce can be used separately. But I recommend you to use go-zero for many more features.

Why MapReduce is needed

In practical business scenarios we often need to get the corresponding properties from different rpc services to assemble complex objects.

For example, to query product details.

  1. product service - query product attributes
  2. inventory service - query inventory properties
  3. price service - query price attributes
  4. marketing service - query marketing properties

If it is a serial call, the response time will increase linearly with the number of rpc calls, so we will generally change serial to parallel to optimize response time.

Simple scenarios using WaitGroup can also meet the needs, but what if we need to check the data returned by the rpc call, data processing, data aggregation? The official go library does not have such a tool (CompleteFuture is provided in java), so we implemented an in-process data batching MapReduce concurrent tool based on the MapReduce architecture.

Design ideas

Let's sort out the possible business scenarios for the concurrency tool:

  1. querying product details: supporting concurrent calls to multiple services to combine product attributes, and supporting call errors that can be ended immediately.
  2. automatic recommendation of user card coupons on product details page: support concurrently verifying card coupons, automatically rejecting them if they fail, and returning all of them.

The above is actually processing the input data and finally outputting the cleaned data. There is a very classic asynchronous pattern for data processing: the producer-consumer pattern. So we can abstract the life cycle of data batch processing, which can be roughly divided into three phases.

  1. data production generate
  2. data processing mapper
  3. data aggregation reducer

Data producing is an indispensable stage, data processing and data aggregation are optional stages, data producing and processing support concurrent calls, data aggregation is basically a pure memory operation, so a single concurrent process can do it.

Since different stages of data processing are performed by different goroutines, it is natural to consider the use of channel to achieve communication between goroutines.

How can I terminate the process at any time?

It's simple, just receive from a channel or the given context in the goroutine.

Choose the right version

  • v1 (default) - non-generic version
  • v2 (generics) - generic version, needs Go version >= 1.18

A simple example

Calculate the sum of squares, simulating the concurrency.

package main

import (
    "fmt"
    "log"

    "github.com/kevwan/mapreduce/v2"
)

func main() {
    val, err := mapreduce.MapReduce(func(source chan<- int) {
        // generator
        for i := 0; i < 10; i++ {
            source <- i
        }
    }, func(i int, writer mapreduce.Writer[int], cancel func(error)) {
        // mapper
        writer.Write(i * i)
    }, func(pipe <-chan int, writer mapreduce.Writer[int], cancel func(error)) {
        // reducer
        var sum int
        for i := range pipe {
            sum += i
        }
        writer.Write(sum)
    })
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println("result:", val)
}

More examples: https://github.com/zeromicro/zero-examples/tree/main/mapreduce

References

go-zero: https://github.com/zeromicro/go-zero

Give a Star! ⭐

If you like or are using this project to learn or start your solution, please give it a star. Thanks!

mapreduce's People

Contributors

kevwan avatar zcong1993 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

mapreduce's Issues

`mapreduce.Finish`中嵌套使用`mapreduce.MapReduce`会导致`Finish`变成非阻塞操作

嵌套的MapReducereducer中如果不调用writer.Write方法,会产生一个ErrReduceNoOutput错误,Finish 中 worker 返回异常会直接结束 Finish 调用,但是Finish中调用的MapReduceVoid会吞掉ErrReduceNoOutput错误返回一个 nil,从最后结果看是没有异常的成功调用,实际其他的 worker 都还在异步运行

例如下面这样的调用:

func main(){
        err := mapreduce.Finish(func() error {
		return worker1()
	}, func() error {
		val, err := mapreduce.MapReduce(func(source chan<- interface{}) {
			for i := 0;i<10;i++{
				source <- i
			}
		}, func(item interface{}, writer mapreduce.Writer, cancel func(error)) {
			i := item.(int)
			writer.Write(i * i)
		}, func(pipe <-chan interface{}, writer mapreduce.Writer, cancel func(error)) {
			var cnt int
			for i := range pipe{
				cnt += i.(int)
			}
                         // 这里不调用Write 会导致当前这个 worker 任务返回一个异常
			// writer.Write(cnt) 
		})
                 // 收到一个异常 `ErrReduceNoOutput`
		if err != nil {
			return err
		}
		fmt.Println("result:", val)
	})
       // 这里的 err 是 nil
       if err != nil {
           fmt.Println(err)
      }
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.