GithubHelp home page GithubHelp logo

Increase HDFS block size 1GB about kylin HOT 4 CLOSED

kylinolap avatar kylinolap commented on July 19, 2024
Increase HDFS block size 1GB

from kylin.

Comments (4)

abansal avatar abansal commented on July 19, 2024

Are we increasing the Mapper's input size to 1GB or HDFS block size?
I would assume the task is to increase only Mapper's input, please correct me if I am wrong

from kylin.

liyang-gmt8 avatar liyang-gmt8 commented on July 19, 2024

The intention is to increase HDFS block size only, and retain the current mapper input size. It was suggested that HDFS block size should be bigger to reduce hadoop/hbase overhead. The mapper input size should not be affected, which "mapred.max.split.size" come into play.

We need some in house test after the change to confirm the benefit.

from kylin.

abansal avatar abansal commented on July 19, 2024

Is this where you intend to change the block size?: https://github.com/abansal/Kylin/blob/master/job/src/main/java/com/kylinolap/job/hadoop/cardinality/HiveColumnCardinalityJob.java#L141-141

from kylin.

liyang-gmt8 avatar liyang-gmt8 commented on July 19, 2024

Hi Ankur

The goal of this is to control the cube data files, which is the output of cube build jobs.

Check out this file: https://github.com/KylinOLAP/Kylin/blob/master/examples/test_case_data/hadoop_job_conf.xml

It is applied to all cube build jobs via com.kylinolap.job.JobInstanceBuilder.appendMapReduceParameters(). Try search where this method is called. You will find how a chain of jobs are created to complete a cube build.

Setting "dfs.block.size" and "mapred.max.split.size" in the hadoop_job_conf.xml shall do the trick.

We can do testing in eBay's hadoop cluster once you are familiar with the code around the job engine.

from kylin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.