GithubHelp home page GithubHelp logo

Comments (19)

opcm avatar opcm commented on August 19, 2024

are you running pcm in a restricted cgroup?

from pcm.

torshepherd avatar torshepherd commented on August 19, 2024

Hijacking this thread, I'm trying to add pcm to an online monitoring system but in a restricted cgroup - is pcm not able to be run restricted to just some cpus?

from pcm.

rdementi avatar rdementi commented on August 19, 2024

pcm-pcie needs all cpus (like in the example above). Other pcm utilities typically don't need all cpus but pcm won't be able to show per-cpu stats for excluded cpus.

from pcm.

torshepherd avatar torshepherd commented on August 19, 2024

Thanks for the response.

My use case is using pcm as a library, intermittently getting all the counter states (for all cores, not just the ones pcm's cgroup is limited to) to compute QPI metrics. I can set the cpuset for the cgroup to all cores to initialize PCM instance, but I would ideally restrict it for calling getCounterStates - does getCounterStates need all cpus if the initialization had all cpus?

from pcm.

rdementi avatar rdementi commented on August 19, 2024

it might or might not work. This scenario is not tested.

from pcm.

torshepherd avatar torshepherd commented on August 19, 2024

Ok I've tested with moving it to a restricted cgroup after initialization. This gives errors in some metrics, but the QPI Utilization seems to work fine in a restricted c group.

When I restrict it before initialization, however, I get an exception thrown in discoverSystemTopology: line 1082.

Do we need the topologyDomainMap to get QPI metrics across all sockets & links?

from pcm.

torshepherd avatar torshepherd commented on August 19, 2024

Seems like there are a couple of places where we pin to core 0:

TemporalThreadAffinity aff0(0);

Could this instead pin to an available core within the cgroup?

from pcm.

rdementi avatar rdementi commented on August 19, 2024

Seems like there are a couple of places where we pin to core 0:

TemporalThreadAffinity aff0(0);

Could this instead pin to an available core within the cgroup?

Let me see...
I see just one place. Could you please point to the other?

from pcm.

torshepherd avatar torshepherd commented on August 19, 2024

The other one is within "readCPUMicrocodeLevel"

from pcm.

rdementi avatar rdementi commented on August 19, 2024

could you try changing 0 to socketRefCore[0] ?
does it work then?

from pcm.

torshepherd avatar torshepherd commented on August 19, 2024

I tried hardcoding it to 2, which I know is in the cluster of the cgroup. I added some try catches around the rest, but I can't get any QPI measurements unfortunately :/ but at least it doesn't crash

from pcm.

torshepherd avatar torshepherd commented on August 19, 2024

Basically getting output similar to the following:

ERROR: pthread_set_affinity_np for core 0 failed with code 22
Marking core 0 offline
... repeat for all cores except 2-9
PCM warning: total_sockets_ 1 does not match socket2M2Mbus.size() 2 // I think this is because the cgroup is only on one of the two Numa nodes
Socket 0: ... 0 UPI ports detected.

But I expect 3 links and 2 sockets, so I guess the topology marking cores offline affects the number of QPI ports?

from pcm.

rdementi avatar rdementi commented on August 19, 2024

But I expect 3 links and 2 sockets, so I guess the topology marking cores offline affects the number of QPI ports?

yes, PCM thinks on single-socket systems UPI links don't need to be detected because UPI links are only there to connect 2 or more sockets...

from pcm.

torshepherd avatar torshepherd commented on August 19, 2024

Ah of course, thanks for pointing that out.

Is the setting of thread affinity necessary to detect that there are multiple sockets?

from pcm.

rdementi avatar rdementi commented on August 19, 2024

yes. you need at least one core on the other socket to be in the cgroup

from pcm.

torshepherd avatar torshepherd commented on August 19, 2024

Ok I fully see the problem. When initializing pcm in a restricted cgroup, the try block that populates Entry and fills the socketIdMap fails to add the cores from the other socket due to not being able to set affinity on cores outside of the cgroup, instead going to the catch block "Marking core offline", thus making the system topology inaccurate.

from pcm.

torshepherd avatar torshepherd commented on August 19, 2024

Aha, and the reason it needs the thread affinity RAII there is so that getting the apic_id will work? Which uses pcm_cpuid, which calls the cpuid instruction, which returns "apic of the current logical processor"?

from pcm.

rdementi avatar rdementi commented on August 19, 2024

correct

from pcm.

torshepherd avatar torshepherd commented on August 19, 2024

Thanks for the help with this. In cases where cores are inactive, I wonder if you could just read /proc/cpuinfo to get the topology instead of running cpuid on each core..?

from pcm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.