GithubHelp home page GithubHelp logo

Comments (3)

jti-lanl avatar jti-lanl commented on September 4, 2024 1

[@Brofessional is interested in re-examining this issue, so here's some more info.]

Detecting systematic incast delays (or other problems) would be facilitated by the new-ish options that support logging of detailed timing-data.

  1. configure syslog[-ng] to forward from servers (and clients, assuming we're not sure what issues might be involved). Consider your configuration of NFS vs RDMA DAL. You probably want syslog[-ng] over TCP (rather than UDP) so you don't miss any lines.

  2. Add timing flags to your favorite namespace and/or repo, in MARFSCONFIGRC. These flags are OR'ed together, so conceivably you could have some on the repo and some on the NS, but I prefer to leave the repo alone, and simply have a duplicate NS (same repo as the one I would normally use), but with timing flags installed. That way, I can choose to use this NS when I want to collect timing info. For example:

# everything the same, but with timing_flags (and mnt_path)
<namespace>
  <name>mc.timing</name>
  <mnt_path>/mc.timing</mnt_path>
  <alias>...</alias>
  <bperms>RM,WM,RD,WD,TD,UD</bperms>
  <iperms>RM,WM,RD,WD,TD,UD</iperms>
  <iwrite_repo_name>...</iwrite_repo_name>
  <range>
    <min_size>0</min_size>
    <max_size>-1</max_size>
    <repo_name>mc3+1</repo_name>
  </range>
  <md_path>.../mdfs</md_path>
  <trash_md_path>.../mc-trash</trash_md_path>
  <fsinfo_path>.../fsinfo</fsinfo_path>
  <quota_space>-1</quota_space>
  <quota_names>-1</quota_names>
  <timing_flags> OPEN,RW,CLOSE,RENAME,ERASURE,THREAD,HANDLE </timing_flags> 
</namespace>
  1. invoke pftool with '-l'

  2. dig through syslog. It may be useful to look for the '*_h' histogram lines in the TIMING_INFO diagnostics, in syslog. If you want to separate out threads, use the "separate_log_threads" utility to extract per-thread logs out of syslog.

from marfs.

jti-lanl avatar jti-lanl commented on September 4, 2024

We dug into this some time ago, which lead to a discovery of what looked like "incast" problems. For the cctest and ODSU test-beds, we got big improvements in performance and reliability by tuning the network parameters. tcp_sack=1 seemed to be key to resolving the incast behavior as observed at the client, but I'm still concerned there could be incast at some/all of the servers, as they e.g. make many concurrent requests of other servers. This is tricky to measure, but one approach is to do brief tcpdump captures during activity, and then analyze for systematic delays.

Meanwhile, there has been a lot of new development that might affect throughput (e.g. writing recovery-info). One way to take measurements is to instrument pftool for gprof/valgrind, and then collect and analyze the profile output-files after a run. I'll do this again, when I get back to this issue.

from marfs.

shanegoff avatar shanegoff commented on September 4, 2024

This has always been

from marfs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.