GithubHelp home page GithubHelp logo

Comments (16)

mgoin avatar mgoin commented on July 17, 2024 2

I would like to second the importance of removing unnecessary runs and time required for power submissions. This steep increase (at least 2x time for each benchmark) has deterred Neural Magic from contributing power results on several hardware platforms. This change would absolutely help lower the barrier to entry while simultaneously encouraging more holistic and thorough submissions to MLPerf.

from power-dev.

arjunsuresh avatar arjunsuresh commented on July 17, 2024 1

@rakshithvasudev I hope you are also interested in this proposal as anyone running 3d-unet will be 😄

from power-dev.

arjunsuresh avatar arjunsuresh commented on July 17, 2024

@psyhtest Yesterday you had asked for having manual range setting as an experimental option - this is already in the master branch and should be working as expected.

from power-dev.

araghun avatar araghun commented on July 17, 2024

Documenting some of my thoughts on this.

  • We have also spent significant cycles discussing this issues in the past. #270 (and also before in 2021) and arrived at a conclusion that there is a reason for what we are doing currently and we should look to continue this approach. Please refer to the MLPerf PowerWG discussion notes.

  • This short term proposal calls for a practice that is deviating from the industry standard methodology. While a lot of the power measurement standardization approaches are much more rigorous (Olympic scoring); at MLC, we adopted a somewhat lenient, yet realistic approach of power measurement that requires 2 steps - Ranging and Testing. These were well thought out and documented approaches and I do not think the problem statement calls for this solution.

  • The problem statement seems to be time taken for doing power measurements. While this is a true statement, we have heard complaints about this approach from only 1-2 organization consistently. It does not seem to represent the vast majority of submitting organizations. As seen in the above point, power measurements across the industry adopt a more time consuming approach which is deemed industry standard for power measurement methodologies.

  • One rationale being given for this proposal is that we do it for systems below 75W and hence there is precedence. It is to be noted that the methodology for systems below 75W is likely to change completely for other technical reasons which are not applicable to the broader category and hence we should be careful in changing a well working methodology in favor of something that is bound to change.

  • The other rationale that the number of power submissions are going down version over version is inconsistent with the PowerWG messaging. Please see PowerWG notes for what is being reported.

Summarizing, I see this proposal as a drastic change from current methodology (and importantly is not backed by data) for what is labelled as a short term fix (as indicated in the PR) and as a good practice, we should avoid to do it as much as possible.

what I would like to propose is to close out this PR with these comments. Will wait to discuss in PowerWG on 5/23.

from power-dev.

arjunsuresh avatar arjunsuresh commented on July 17, 2024

"and arrived at a conclusion that there is a reason for what we are doing currently and we should look to continue this approach"

When did we arrive at that conclusion? If I believe that was only for low power devices which indeed had an issue due to PTDaemon. Now when we move to DC power measurement for low power devices I dont see that argument holding here.

"This short term proposal calls for a practice that is deviating from the industry standard methodology."

Can you please say how? We already confirmed with SPEC that using uncertainty command without ranging mode is indeed industrial standard accepted by SPEC.

"The problem statement seems to be time taken for doing power measurements. While this is a true statement, we have heard complaints about this approach from only 1-2 organization consistently. It does not seem to represent the vast majority of submitting organizations. As seen in the above point, power measurements across the industry adopt a more time consuming approach which is deemed industry standard for power measurement methodologies."

As I told in earlier comment, there is no deviation from industrial standard methodology here. If so please prove it and I'm happy to close this issue. More over doesn't just 2 organizations contribute to more than 95% of all the power results?

"The other rationale that the number of power submissions are going down version over version is inconsistent with the PowerWG messaging. Please see PowerWG notes for what is being reported."

I took the data directly from the submission results.

from power-dev.

arjunsuresh avatar arjunsuresh commented on July 17, 2024

Section 3.16.2 in this SPEC document clearly says how the power results are getting validated there. This is exactly what my proposal is doing. @dmiskovic-NV please correct me if I'm wrong here.

Regarding data - we already shared many datapoints here which actually unearthed the problem with low power devices. For high power devices there are no issues and we can share 10 more data points on different benchmarks if needed or any submitter can try them on their system -- if there is even 1 wrong measurement we can close this issue.

from power-dev.

arjunsuresh avatar arjunsuresh commented on July 17, 2024

Krai

[2023-05-23 20:51:58,326 submission_checker.py:2856 INFO] Results=3994, NoResults=0, Power Results=1735
[2023-05-23 20:51:58,326 submission_checker.py:2859 INFO] ---
[2023-05-23 20:51:58,326 submission_checker.py:2860 INFO] Closed Results=67, Closed Power Results=40

[2023-05-23 20:51:58,326 submission_checker.py:2861 INFO] Open Results=3927, Open Power Results=1695

[2023-05-23 20:51:58,326 submission_checker.py:2862 INFO] Network Results=0, Network Power Results=0

[2023-05-23 20:51:58,326 submission_checker.py:2863 INFO] ---
[2023-05-23 20:51:58,326 submission_checker.py:2865 INFO] Systems=20, Power Systems=10
[2023-05-23 20:51:58,326 submission_checker.py:2866 INFO] Closed Systems=18, Closed Power Systems=10
[2023-05-23 20:51:58,326 submission_checker.py:2867 INFO] Open Systems=16, Open Power Systems=7
[2023-05-23 20:51:58,326 submission_checker.py:2868 INFO] Network Systems=0, Network Power Systems=0
[2023-05-23 20:51:58,326 submission_checker.py:2869 INFO] ---
[2023-05-23 20:51:58,326 submission_checker.py:2874 INFO] SUMMARY: submission looks OK

cTuning

[2023-05-23 20:41:05,744 submission_checker.py:2856 INFO] Results=1949, NoResults=0, Power Results=529
[2023-05-23 20:41:05,745 submission_checker.py:2859 INFO] ---
[2023-05-23 20:41:05,745 submission_checker.py:2860 INFO] Closed Results=26, Closed Power Results=19

[2023-05-23 20:41:05,745 submission_checker.py:2861 INFO] Open Results=1923, Open Power Results=510

[2023-05-23 20:41:05,745 submission_checker.py:2862 INFO] Network Results=0, Network Power Results=0

[2023-05-23 20:41:05,745 submission_checker.py:2863 INFO] ---
[2023-05-23 20:41:05,745 submission_checker.py:2865 INFO] Systems=47, Power Systems=10
[2023-05-23 20:41:05,745 submission_checker.py:2866 INFO] Closed Systems=5, Closed Power Systems=3
[2023-05-23 20:41:05,745 submission_checker.py:2867 INFO] Open Systems=47, Open Power Systems=9
[2023-05-23 20:41:05,745 submission_checker.py:2868 INFO] Network Systems=0, Network Power Systems=0
[2023-05-23 20:41:05,745 submission_checker.py:2869 INFO] ---
[2023-05-23 20:41:05,745 submission_checker.py:2874 INFO] SUMMARY: submission looks OK
arjun@hp-envy:~/inference/tools/submission$ python3 submission_checker.py --input  ~/inference_results_v3.0 --submitter cTuning --skip-meaningful-fields-empty-check --skip-empty-files-check

Qualcomm

[2023-05-23 21:00:03,729 submission_checker.py:2856 INFO] Results=107, NoResults=0, Power Results=65
[2023-05-23 21:00:03,729 submission_checker.py:2859 INFO] ---
[2023-05-23 21:00:03,729 submission_checker.py:2860 INFO] Closed Results=88, Closed Power Results=56

[2023-05-23 21:00:03,729 submission_checker.py:2861 INFO] Open Results=15, Open Power Results=9

[2023-05-23 21:00:03,729 submission_checker.py:2862 INFO] Network Results=4, Network Power Results=0

[2023-05-23 21:00:03,729 submission_checker.py:2863 INFO] ---
[2023-05-23 21:00:03,729 submission_checker.py:2865 INFO] Systems=11, Power Systems=7
[2023-05-23 21:00:03,729 submission_checker.py:2866 INFO] Closed Systems=11, Closed Power Systems=7
[2023-05-23 21:00:03,729 submission_checker.py:2867 INFO] Open Systems=3, Open Power Systems=2
[2023-05-23 21:00:03,729 submission_checker.py:2868 INFO] Network Systems=1, Network Power Systems=0
[2023-05-23 21:00:03,729 submission_checker.py:2869 INFO] ---
[2023-05-23 21:00:03,729 submission_checker.py:2874 INFO] SUMMARY: submission looks OK

NVIDIA

[2023-05-23 20:58:59,886 submission_checker.py:2856 INFO] Results=268, NoResults=0, Power Results=46
[2023-05-23 20:58:59,886 submission_checker.py:2859 INFO] ---
[2023-05-23 20:58:59,886 submission_checker.py:2860 INFO] Closed Results=262, Closed Power Results=46

[2023-05-23 20:58:59,886 submission_checker.py:2861 INFO] Open Results=0, Open Power Results=0

[2023-05-23 20:58:59,886 submission_checker.py:2862 INFO] Network Results=6, Network Power Results=0

[2023-05-23 20:58:59,886 submission_checker.py:2863 INFO] ---
[2023-05-23 20:58:59,886 submission_checker.py:2865 INFO] Systems=19, Power Systems=3
[2023-05-23 20:58:59,886 submission_checker.py:2866 INFO] Closed Systems=18, Closed Power Systems=3
[2023-05-23 20:58:59,886 submission_checker.py:2867 INFO] Open Systems=0, Open Power Systems=0
[2023-05-23 20:58:59,886 submission_checker.py:2868 INFO] Network Systems=1, Network Power Systems=0
[2023-05-23 20:58:59,886 submission_checker.py:2869 INFO] ---
[2023-05-23 20:58:59,886 submission_checker.py:2874 INFO] SUMMARY: submission looks OK

Dell

[2023-05-23 21:09:36,333 submission_checker.py:2856 INFO] Results=211, NoResults=0, Power Results=40
[2023-05-23 21:09:36,333 submission_checker.py:2859 INFO] ---
[2023-05-23 21:09:36,333 submission_checker.py:2860 INFO] Closed Results=211, Closed Power Results=40

[2023-05-23 21:09:36,333 submission_checker.py:2861 INFO] Open Results=0, Open Power Results=0

[2023-05-23 21:09:36,333 submission_checker.py:2862 INFO] Network Results=0, Network Power Results=0

[2023-05-23 21:09:36,333 submission_checker.py:2863 INFO] ---
[2023-05-23 21:09:36,334 submission_checker.py:2865 INFO] Systems=21, Power Systems=4
[2023-05-23 21:09:36,334 submission_checker.py:2866 INFO] Closed Systems=21, Closed Power Systems=4
[2023-05-23 21:09:36,334 submission_checker.py:2867 INFO] Open Systems=0, Open Power Systems=0
[2023-05-23 21:09:36,334 submission_checker.py:2868 INFO] Network Systems=0, Network Power Systems=0
[2023-05-23 21:09:36,334 submission_checker.py:2869 INFO] ---
[2023-05-23 21:09:36,334 submission_checker.py:2874 INFO] SUMMARY: submission looks OK

Total

[2023-05-23 21:05:51,774 submission_checker.py:2856 INFO] Results=7283, NoResults=0, Power Results=2449
[2023-05-23 21:05:51,774 submission_checker.py:2859 INFO] ---
[2023-05-23 21:05:51,774 submission_checker.py:2860 INFO] Closed Results=1333, Closed Power Results=232

[2023-05-23 21:05:51,774 submission_checker.py:2861 INFO] Open Results=5936, Open Power Results=2217

[2023-05-23 21:05:51,774 submission_checker.py:2862 INFO] Network Results=14, Network Power Results=0

[2023-05-23 21:05:51,774 submission_checker.py:2863 INFO] ---
[2023-05-23 21:05:51,774 submission_checker.py:2865 INFO] Systems=200, Power Systems=40
[2023-05-23 21:05:51,774 submission_checker.py:2866 INFO] Closed Systems=134, Closed Power Systems=32
[2023-05-23 21:05:51,774 submission_checker.py:2867 INFO] Open Systems=88, Open Power Systems=19
[2023-05-23 21:05:51,774 submission_checker.py:2868 INFO] Network Systems=3, Network Power Systems=0
[2023-05-23 21:05:51,774 submission_checker.py:2869 INFO] ---
[2023-05-23 21:05:51,774 submission_checker.py:2874 INFO] SUMMARY: submission looks OK

Contribution by Krai + cTuning = 1735 + 529 = 2264 power results out of 2449 in total. > 92%. So, we have 2 submitters contributing to > 92% of all power results and 5 submitters contributing to > 98.6% of all the power results. Out of these I believe Krai is doing mostly low power devices and so this may not be important for them. Unless we have a valid justification to reject this proposal, on our part it makes sense to do 2X non power submissions instead of unnecessarily doing a ranging run to show case power. For those submitters doing power on just 1-2 systems waiting for 1 hour more is not a big deal.

from power-dev.

arjunsuresh avatar arjunsuresh commented on July 17, 2024

@dmiskovic-NV can you please confirm if this proposal is fine from the SPEC power side? I believe you had seen the replies from Greg.

From my side, Im going to do at least 1000 power submissions in 3.1 round with manual range setting whether they'll be officially approved or not.

from power-dev.

TheKanter avatar TheKanter commented on July 17, 2024

Hi @arjunsuresh All communications to/from SPEC regarding PTD need to go through the official channels. That means myself or the WG chairs and the official SPEC email address.

It's not fair or reasonable to ask dmiskovic-NV to provide an official answer on behalf of SPEC as that is outside of his job and role.

Please note: This means @arjunsuresh should not be inquiring to SPEC on behalf of MLCommons. That is the role of the WG chair or the executive director. SPEC has specifically asked that all inquiries be handled in a particular manner to avoid confusion or problems.

from power-dev.

TheKanter avatar TheKanter commented on July 17, 2024

I would like to second the importance of removing unnecessary runs and time required for power submissions. This steep increase (at least 2x time for each benchmark) has deterred Neural Magic from contributing power results on several hardware platforms. This change would absolutely help lower the barrier to entry while simultaneously encouraging more holistic and thorough submissions to MLPerf.

@mgoin - Thanks for speaking up, appreciate the perspective! What is the total run time for the benchmarks with and without power (don't need exact, ballpark is good enough)?

from power-dev.

arjunsuresh avatar arjunsuresh commented on July 17, 2024

Thank you for your reply @TheKanter. Actually I communicated with SPEC only once as directed by the power WG with the WG chairs in CC. Their reply to this issue is captured in this comment: #270 (comment)

I only asked @dmiskovic-NV for his interpretation as some people interpret this reply as "no" whereas for me it's clearly an "yes".

from power-dev.

arjunsuresh avatar arjunsuresh commented on July 17, 2024

" What is the total run time for the benchmarks with and without power (don't need exact, ballpark is good enough)?"

@TheKanter Actually for last inference round we helped get the power results for Neural Magic as their deepsparse implementation is integrated in CK/CM. The power runs take 2X the time compared to any non-power runs. For the optimized runs like the one for Neural magic it is a change from 10 minutes to 20 minutes for doing power. But for baseline power comparison we also had to run the native run (onnxruntime on CPU ) which took close to 2 hours just for the offline scenario of a single benchmark.

Also the problem with ranging mode is not just doubling of the runtime. Say we are having 3 submission systems and just 1 power analyzer. If we have say 6 hours of non-power runs on each system the submission times are as follows:

  • Non-power: 6 hours as all runs can happen at the same time
  • Power with ranging: 2 * 6 * 3 = 36 hours (runs have to be sequential as only 1 power analyzer is there and takes twice the time due to ranging mode)
  • Power with manual range setting: 6 * 3 = 18 hours

from power-dev.

s-idgunji avatar s-idgunji commented on July 17, 2024

@araghun - I'm leaving my comments as per discussions with David Kanter and you

  • Manual ranging is not an option we'd like since it is not consistent with approaches where all submitters use the same flow without any "manual" settings.
  • Performance also does accuracy phase where we get two sets of runs , the perf during accuracy and the perf during measurement which is compared to be within some % of each other. This is consistent for edge or data center
  • We want consistent flows. Making a flow that is checking ranging vs testing (measured/submitted ) runs for edge devices that are mostly under 75W and not doing this for data center does not have consistent approaches
  • A shorter ranging that can make submissions more productive has not been investigated
  • Saying that data center is impacted but edge is not , is inconsistent.
  • SPEC originally recommended the ranging for eliminating manual intervention (Klaus during v1.0 - as captured in the MLPerf Power notes) and they also use a limited ranging mode for their server power benchmarks called SERT

We need to make sure all these aspects are addressed

from power-dev.

arjunsuresh avatar arjunsuresh commented on July 17, 2024

"SPEC originally recommended the ranging for eliminating manual intervention (Klaus during v1.0 - as captured in the MLPerf Power notes) and they also use a limited ranging mode for their server power benchmarks called SERT"

@s-idgunji Can you please point to the exact recommendation from Klaus? 'ranging mode' is always good to have for first time users and there is no doubt in it. I would like to know if Klaus or anyone from SPEC has disallowed "manual range" setting as it is what SPEC power is allowing in their documentation.

from power-dev.

arjunsuresh avatar arjunsuresh commented on July 17, 2024

Here, is the shorter ranging run proposal.

#315

Those who are opposing and never did any power submissions can bring in new arguments 🙂

from power-dev.

arjunsuresh avatar arjunsuresh commented on July 17, 2024

Since this mechanism is there this issue is no longer relevant. Hence, closing.

from power-dev.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.