Hi Developers, I'd like to use PCM Tools to monitor UPI traffic and execute UPI st

How could I use "memoptest" to do UPI stress and make the total percentage of UPI bandwidth up to 100% ? about pcm HOT 26 CLOSED

WisQTuser commented on August 19, 2024

How could I use "memoptest" to do UPI stress and make the total percentage of UPI bandwidth up to 100% ?

from pcm.

Comments (26)

opcm commented on August 19, 2024

memoptest is single threaded. A single thread can not consume the whole bandwidth capacity. Please try to run many memoptest processes in parallel to get close to 100% utilization.

from pcm.

WisQTuser commented on August 19, 2024

Hi Developers,
I try to run many memoptest process in parallel, but the result is the same as before. Total percentage of UPI bandwidth still can't get close to 100%. The bandwidth of streaming to memory will be divided equally. please refer the screenshot result as below.

from pcm.

opcm commented on August 19, 2024

Hi, could you please try to use many memoptest instances but with read-only traffic to drive the utilization up? This will be option "0" instead of "2".

It is also important to disable compiler optimizations with "-O0" option:

g++ -O0 -std=c++11 memoptest.cpp -o memoptest

from pcm.

WisQTuser commented on August 19, 2024

Hi Developers,
I follow your suggestions using many memoptest instances with read-only traffic and disable compiler optimizations. The total UPI bandwidth percentage is up to about 85% but still can't get to 100%. If I keep opening a new window to run memoptest with option "0", the reading memory bandwidth will drop down to 17XX MB/s as the below picture . Is there anything that I can do ? Please refer the picture of test result. Thanks.

from pcm.

opcm commented on August 19, 2024

Can you try to use a mix of read and write traffic? There is also a specialized tool that can trigger different traffic patterns: https://software.intel.com/en-us/articles/intelr-memory-latency-checker
Options are --bandwidth_matrix and -Wn, where n is the type of traffic

from pcm.

WisQTuser commented on August 19, 2024

Hi Developers,
I try to use Intel memory latency checker to trigger different traffic patterns, and use options --bandwidth_matrix -Wn, where n is 2. But the total UPI bandwidth percentage can't always stay in high performance. Once the Numa mode number appears, the UPI bandwidth percentage will rise to 5X%. After two seconds, the UPI bandwidth percentage will drop to a lower value(0%). How should I do to keep the UPI bandwidth on the highest percentage like the memptest?

from pcm.

opcm commented on August 19, 2024

could you please increase the test phase duration? I believe this is -t option

from pcm.

WisQTuser commented on August 19, 2024

Hi Developers,
I increase the test phase duration -t, but the result is the same. Still can't keep the UPI bandwidth on the highest percentage.

from pcm.

WisQTuser commented on August 19, 2024

Hi developers,
Should I try another methods or command that can trigger the UPI traffic to 100%?

from pcm.

opcm commented on August 19, 2024

the parameter value you have chosen is in seconds. According to the screen shot it still runs local memory test. Did it ever finish? Please choose a smaller value (e.g. 16 seconds (per matrix element))

from pcm.

WisQTuser commented on August 19, 2024

Hi developers,
I use a smaller value to run the test. But the situation is the same as before. Still can't trigger the UPI traffic to 100%. The stress test only run the specific matrix will trigger the bandwidth higher, maybe about 90%. The whole test will be finished in about five minutes. I want the stress can be keep the UPI bandwidth on 100% overnight or even longer.

from pcm.

opcm commented on August 19, 2024

I could get 96% utilization with these parameters:

--loaded_latency -omlc_2s_10c_ro-remote.cfg -d0 -t1000

with this config file (mlc_2s_10c_ro-remote.cfg)

0-9 R seq 300000 dram 1
28-37 R seq 300000 dram 0

I guess on your 10 core CPU you need to change it to

0-9 R seq 300000 dram 1
10-19 R seq 300000 dram 0

you can increase -t parameter as you want to run it longer

from pcm.

WisQTuser commented on August 19, 2024

Hi developers,
I follow your instruction to run the test, only get about 60% utilization.
Is there anything wrong with my command or config file?

from pcm.

opcm commented on August 19, 2024

could you please run mlc without parameters and post the mlc output here as text? (just want to see if you platform configuration is healthy)

from pcm.

opcm commented on August 19, 2024

/proc/cpuinfo is also interesting to check

from pcm.

WisQTuser commented on August 19, 2024

Hi Developers,
I post the mlc output and cpuinfo as text.
Please refer it,
Thanks.
output.txt
cpuinfo.txt

from pcm.

opcm commented on August 19, 2024

You have a very weird OS processor -> socket topology (round robin). I did not expect that. Here is a fixed configuration file:

0,2,4,6,8,10,12,14,16,18 R seq 300000 dram 1
1,3,5,7,9,11,13,15,17,19 R seq 300000 dram 0

from pcm.

WisQTuser commented on August 19, 2024

Hi Developers,
After I use the fixed configuration file, I could get about 90% utilization. May I ask what's the meaning about the configuration file and the command "--loaded_latency -omlc_2s_10c_ro-remote.cfg -d0 -t1000? Could I change some value that can get more utilization? Thanks for your support again.

from pcm.

opcm commented on August 19, 2024

the options and configuration file format are described the readme file in the mlc package. You might try other traffic types and/or random traversal patterns.

from pcm.

WisQTuser commented on August 19, 2024

Ok, I will check the readme file, Thanks. Another question about pcm.x is the bandwidth utilization shows 90%, UPI0 and UPI2 show 21G. There are two Intel Skylake CPUs on our server, and peak bandwidth is 10.4 GT/s * 2channels * 2Bytes/channel=41.6GB/s (peak), 20.8GB/s per channel. How could I get the 21G of 90% data number on pcm.x ?

from pcm.

opcm commented on August 19, 2024

the formula you are using gives a somewhat pessimistic estimation of Intel UPI max throughput. Intel UPI may achieve a better packing of data into packets. PCM uses a more optimistic estimation of max throughput assuming good data packing.

from pcm.

WisQTuser commented on August 19, 2024

So if now I have to test with the max throughout of Intel UPI, which standard can I refer to that I could judge the result is passed or failed? Now I can use the command to trigger the total UPI bandwidth to 93%, I just want to know if the 93% of UPI bandwidth reach the standard or not.

from pcm.

opcm commented on August 19, 2024

as far as I know there is no standard that defines the theoretical maximum. It is workload dependent.

from pcm.

WisQTuser commented on August 19, 2024

Thanks for your answer. So you means the percentage of UPI bandwidth can be up to 100% if increasing the workload, how can I increase the workload if I can change hardware configuration or another traffic types?

from pcm.

WisQTuser commented on August 19, 2024

Hi Developers,
You means the percentage of UPI bandwidth can be up to 100% if increasing the workload, how can I increase the workload if I can change hardware configuration or another traffic types?

from pcm.

opcm commented on August 19, 2024

I never managed to drive the utilization to 100%. I think a specific synthetic test is required but I don't know how to implement it.

from pcm.

How could I use "memoptest" to do UPI stress and make the total percentage of UPI bandwidth up to 100% ? about pcm HOT 26 CLOSED

Comments (26)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs