GithubHelp home page GithubHelp logo

Comments (26)

opcm avatar opcm commented on August 19, 2024

memoptest is single threaded. A single thread can not consume the whole bandwidth capacity. Please try to run many memoptest processes in parallel to get close to 100% utilization.

from pcm.

WisQTuser avatar WisQTuser commented on August 19, 2024

Hi Developers,
I try to run many memoptest process in parallel, but the result is the same as before. Total percentage of UPI bandwidth still can't get close to 100%. The bandwidth of streaming to memory will be divided equally. please refer the screenshot result as below.

four windows

six windows

from pcm.

opcm avatar opcm commented on August 19, 2024

Hi, could you please try to use many memoptest instances but with read-only traffic to drive the utilization up? This will be option "0" instead of "2".

It is also important to disable compiler optimizations with "-O0" option:

g++ -O0 -std=c++11 memoptest.cpp -o memoptest

from pcm.

WisQTuser avatar WisQTuser commented on August 19, 2024

Hi Developers,
I follow your suggestions using many memoptest instances with read-only traffic and disable compiler optimizations. The total UPI bandwidth percentage is up to about 85% but still can't get to 100%. If I keep opening a new window to run memoptest with option "0", the reading memory bandwidth will drop down to 17XX MB/s as the below picture . Is there anything that I can do ? Please refer the picture of test result. Thanks.
intel upi stress up to 85

from pcm.

opcm avatar opcm commented on August 19, 2024

Can you try to use a mix of read and write traffic? There is also a specialized tool that can trigger different traffic patterns: https://software.intel.com/en-us/articles/intelr-memory-latency-checker
Options are --bandwidth_matrix and -Wn, where n is the type of traffic

from pcm.

WisQTuser avatar WisQTuser commented on August 19, 2024

Hi Developers,
I try to use Intel memory latency checker to trigger different traffic patterns, and use options --bandwidth_matrix -Wn, where n is 2. But the total UPI bandwidth percentage can't always stay in high performance. Once the Numa mode number appears, the UPI bandwidth percentage will rise to 5X%. After two seconds, the UPI bandwidth percentage will drop to a lower value(0%). How should I do to keep the UPI bandwidth on the highest percentage like the memptest?

2

1

from pcm.

opcm avatar opcm commented on August 19, 2024

could you please increase the test phase duration? I believe this is -t option

from pcm.

WisQTuser avatar WisQTuser commented on August 19, 2024

Hi Developers,
I increase the test phase duration -t, but the result is the same. Still can't keep the UPI bandwidth on the highest percentage.
t

from pcm.

WisQTuser avatar WisQTuser commented on August 19, 2024

Hi developers,
Should I try another methods or command that can trigger the UPI traffic to 100%?

from pcm.

opcm avatar opcm commented on August 19, 2024

the parameter value you have chosen is in seconds. According to the screen shot it still runs local memory test. Did it ever finish? Please choose a smaller value (e.g. 16 seconds (per matrix element))

from pcm.

WisQTuser avatar WisQTuser commented on August 19, 2024

Hi developers,
I use a smaller value to run the test. But the situation is the same as before. Still can't trigger the UPI traffic to 100%. The stress test only run the specific matrix will trigger the bandwidth higher, maybe about 90%. The whole test will be finished in about five minutes. I want the stress can be keep the UPI bandwidth on 100% overnight or even longer.
mlc_1
mlc_2
mlc_3

from pcm.

opcm avatar opcm commented on August 19, 2024

I could get 96% utilization with these parameters:

--loaded_latency -omlc_2s_10c_ro-remote.cfg -d0 -t1000

with this config file (mlc_2s_10c_ro-remote.cfg)

0-9 R seq 300000 dram 1
28-37 R seq 300000 dram 0

I guess on your 10 core CPU you need to change it to

0-9 R seq 300000 dram 1
10-19 R seq 300000 dram 0

you can increase -t parameter as you want to run it longer

from pcm.

WisQTuser avatar WisQTuser commented on August 19, 2024

Hi developers,
I follow your instruction to run the test, only get about 60% utilization.
Is there anything wrong with my command or config file?
mlc_4

from pcm.

opcm avatar opcm commented on August 19, 2024

could you please run mlc without parameters and post the mlc output here as text? (just want to see if you platform configuration is healthy)

from pcm.

opcm avatar opcm commented on August 19, 2024

/proc/cpuinfo is also interesting to check

from pcm.

WisQTuser avatar WisQTuser commented on August 19, 2024

Hi Developers,
I post the mlc output and cpuinfo as text.
Please refer it,
Thanks.
output.txt
cpuinfo.txt

from pcm.

opcm avatar opcm commented on August 19, 2024

You have a very weird OS processor -> socket topology (round robin). I did not expect that. Here is a fixed configuration file:

0,2,4,6,8,10,12,14,16,18 R seq 300000 dram 1
1,3,5,7,9,11,13,15,17,19 R seq 300000 dram 0

from pcm.

WisQTuser avatar WisQTuser commented on August 19, 2024

Hi Developers,
After I use the fixed configuration file, I could get about 90% utilization. May I ask what's the meaning about the configuration file and the command "--loaded_latency -omlc_2s_10c_ro-remote.cfg -d0 -t1000? Could I change some value that can get more utilization? Thanks for your support again.
90

from pcm.

opcm avatar opcm commented on August 19, 2024

the options and configuration file format are described the readme file in the mlc package. You might try other traffic types and/or random traversal patterns.

from pcm.

WisQTuser avatar WisQTuser commented on August 19, 2024

Ok, I will check the readme file, Thanks. Another question about pcm.x is the bandwidth utilization shows 90%, UPI0 and UPI2 show 21G. There are two Intel Skylake CPUs on our server, and peak bandwidth is 10.4 GT/s * 2channels * 2Bytes/channel=41.6GB/s (peak), 20.8GB/s per channel. How could I get the 21G of 90% data number on pcm.x ?

from pcm.

opcm avatar opcm commented on August 19, 2024

the formula you are using gives a somewhat pessimistic estimation of Intel UPI max throughput. Intel UPI may achieve a better packing of data into packets. PCM uses a more optimistic estimation of max throughput assuming good data packing.

from pcm.

WisQTuser avatar WisQTuser commented on August 19, 2024

So if now I have to test with the max throughout of Intel UPI, which standard can I refer to that I could judge the result is passed or failed? Now I can use the command to trigger the total UPI bandwidth to 93%, I just want to know if the 93% of UPI bandwidth reach the standard or not.

from pcm.

opcm avatar opcm commented on August 19, 2024

as far as I know there is no standard that defines the theoretical maximum. It is workload dependent.

from pcm.

WisQTuser avatar WisQTuser commented on August 19, 2024

Thanks for your answer. So you means the percentage of UPI bandwidth can be up to 100% if increasing the workload, how can I increase the workload if I can change hardware configuration or another traffic types?

from pcm.

WisQTuser avatar WisQTuser commented on August 19, 2024

Hi Developers,
You means the percentage of UPI bandwidth can be up to 100% if increasing the workload, how can I increase the workload if I can change hardware configuration or another traffic types?

from pcm.

opcm avatar opcm commented on August 19, 2024

I never managed to drive the utilization to 100%. I think a specific synthetic test is required but I don't know how to implement it.

from pcm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.