GithubHelp home page GithubHelp logo

Comments (5)

ankurd28 avatar ankurd28 commented on August 27, 2024

So, after some digging we found out the reason it was slow.
Distributed XGBoost with MPI is copying the data back and forth across the two machines and that is making the whole computation slow-down.

Anybody has any ideas on how to fix this data copying issue?

Thanks,
Ankur

from wormhole.

tqchen avatar tqchen commented on August 27, 2024

The data is indeed loaded from distributed data store, but only at startup time. So you can tell the difference from longer number of rounds.

The major goal of distributed xgboost is to scale up to the scale that could not be handled by single machine version. So it is totally possible that distributed version running slower than single node version, if the data fits into single node.

from wormhole.

ankurd28 avatar ankurd28 commented on August 27, 2024

Hi Tianqi,

Thank you for your response!

So, if I understand you correctly, speed would be of secondary concern as long as distributed xgboost can scale up across machines. It is good to understand the design goal, since that makes clear the trade-offs that have been made in the development.

Having said that, do you have any ideas on how it might be possible to speed up the distributed implementation of xgboost? In your opinion, would moving to Hadoop framework be beneficial here for speedup as compared to the MPI framework, in other words, does the xgboost implementation on top of Hadoop also loads data from a distributed data store over the network?

Thanks,
Ankur

from wormhole.

tqchen avatar tqchen commented on August 27, 2024

Hi @ankurd28 Speed is definitely important for us.

As the data scales up, the data loading cost over network is minor compared to the running cost of training in our experience (This is different from data processing problems like mapreduce, where little computation is done on each examples, and data locality is crucial).

Because more computation hits in as we get more data. It is likely not a problem for larger dataset. For small dataset, however, as the running cost already was low, and the data loading bottleneck surface up.

from wormhole.

ankurd28 avatar ankurd28 commented on August 27, 2024

Hi Tianqi,

Thanks a lot for your response!
I completely understand your point!

Best,
Ankur

from wormhole.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.