GithubHelp home page GithubHelp logo

Comments (8)

Zorlin avatar Zorlin commented on May 19, 2024

+1. I originally read this and went "no, that's crazy, it'd wreck performance" until I realized you were talking about only caching directory and file listings instead of whole files.

from lizardfs.

onlyjob avatar onlyjob commented on May 19, 2024

Thanks. Cache displacement may be a primary reason for significant performance degradation unless you running chunkserver on dedicated machine. Chunkservers may be quite active -- I observe over 10000 cached chunk files creating enough pressure to notice slowdown in everything else running on the same server...

from lizardfs.

Zorlin avatar Zorlin commented on May 19, 2024

I'm particularly interested in this patch as we have some machines serving as much as 60-80TB from a single box (via Supermicro JBOD with consumer drives). What would be the (ballpark) performance impact on that sort of dedicated machine?

from lizardfs.

onlyjob avatar onlyjob commented on May 19, 2024

I'm not qualified to prepare this patch -- I'm simply incompetent in C/C++ these days as the last time I did C coding was back in 1995...

I'm starting chunkservers using nocache wrapper as follows:

/usr/bin/nocache -n 2 /usr/sbin/mfschunkserver -d start

and although I've been doing it only for limited time subjectively it feels like everything runs smoother, cache no longer seems over-utilised etc.. I think we're not talking about any performance "impact" whatsoever, even on dedicated machines. Indeed cache would be better used for directories, executables and whatnot rather than wasted for chunks because if chunks are cached everything else will be eventually displaced from the cache.

from lizardfs.

onlyjob avatar onlyjob commented on May 19, 2024

I meant to say that on dedicated machines you will not see performance improvement (it will just run as usual or slightly better) while most beneficial it will be on shared servers where other services are running as well.

from lizardfs.

Zorlin avatar Zorlin commented on May 19, 2024

Hi @onlyjob,

I ask because we have something like 100 million chunks per server and I suspect at that scale this could have more impact than you think.

from lizardfs.

Zorlin avatar Zorlin commented on May 19, 2024

I would also be interested in objective measurements. I would suggest pulling data from the CGI or probe under each of the following conditions:

  • Normal behavior, just after a cold start
  • Nocache run, just after a cold start
  • Normal behavior after one hour of activity
  • Nocache run after one hour of activity.

from lizardfs.

onlyjob avatar onlyjob commented on May 19, 2024

@Zorlin:

I ask because we have something like 100 million chunks per server and I suspect at that scale this could have more impact than you think.

Yes, if we're talking about impact of overusing cache when cache hit ratio is extremely low on large data set as yours... :)

Easy enough you should be able to get some data yourself although please remember that nocache method is not 100% effective so some files will be partially cached but not as much as without it...

Also I'd suggest to run at least for several hour before comparing stats.
At the moment I'm not in position to do an objective test as I will have to generate a similar load for which I would have to stop all clients. Earlier I reached my conclusion regarding benefits of not-caching large data sets on nodes of distributed file systems. I've just started chinkserver with nocache this morning and some hours later I'm going to check how many chunk files are cached comparing to normal situation without nocache... In any case I expect performance benefits for other applications/services, not for chunkservers themselves...

from lizardfs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.