GithubHelp home page GithubHelp logo

Comments (8)

r30shah avatar r30shah commented on May 30, 2024 1

On code-gen we would see the performance impact going from shift 1 to higher shifts (Not for 0 shift) where we lose the capability of embedding the shift into the load/store instruction itself forcing us to generate extra instruction for each load/store.

Perf wise, I am pasting the old numbers I have collected comparing the shift of 1 vs 3(Few weeks ago, I did refresh those numbers but are on the machine which is offline now, so would be able to extract those later this week). For now pasting old numbers to get the conversation continue, will update comment with latest results (Though I think, performance delta was similar)

  1. ILOG-ODM :
    Options : -Xcompressedrefs -Xms1024m -Xmx1024m

    Configuration : 4 application threads, using 4 Logical CPs

Benchmark -XXgc:forcedShiftingCompressionAmount=1 -XXgc:forcedShiftingCompressionAmount=3
300 RuleSet 100 98.50
5 FastPath 100 98.78

from openj9.

vijaysun-omr avatar vijaysun-omr commented on May 30, 2024

@dmitripivkine just so I am clear on the proposal, it is composed of two steps that are different from current default scheme, both of which only apply to the case when heap size could have been allocated below 8gb following a "bottom-up" approach in effect today by default on zLinux.

  1. Change to "top-down" approach and see if it succeeds first in allocating below 8gb
  2. If "top-down" approach did not succeed, then instead of trying "bottom-up" approach, we will try a scheme where we set "estimated start address" to 4gb and thereby avoid allocating in the bottom 4gb completely, thereby incurring a higher risk of shift greater than 1.

Is this understanding correct ?

from openj9.

dmitripivkine avatar dmitripivkine commented on May 30, 2024

@dmitripivkine just so I am clear on the proposal, it is composed of two steps that are different from current default scheme, both of which only apply to the case when heap size could have been allocated below 8gb following a "bottom-up" approach in effect today by default on zLinux.

  1. Change to "top-down" approach and see if it succeeds first in allocating below 8gb
  2. If "top-down" approach did not succeed, then instead of trying "bottom-up" approach, we will try a scheme where we set "estimated start address" to 4gb and thereby avoid allocating in the bottom 4gb completely, thereby incurring a higher risk of shift greater than 1.

Is this understanding correct ?

No, not exactly. Sorry, I was not clear.

My base suggestion is change bottom-up to top-down allocation direction for all zLinux cases except Concurrent Scavenger with HW support case (which might be addressed later if we need, just requires more work).

Changing of allocation direction will reduce usage of the memory below 4G bar, it is going to be better or the same in the worst case scenario. If currently with bottom-up all free memory below 4G bar is consumed for sure, with top-down approach it might be consumed if there is not enough memory between 4G bar and maximum address supported with selected minimum shift.
For example, for 5G heap minimum shift is 1. Maximum address for shift 1 is 8G. So, if heap larger than [4G,8G] interval (which it is in this example) the deficit (1G) will be taken from below 4G bar. If such allocation attempt has failed the higher shift (2 in this example) is going to be selected and attempt repeated with new max address (16G). This is same behaviour as we have now. The changing of allocation direction to top-down improves average case without compromising performance by switching to higher shift. This scenario I described in case 1.

Now, when we state this, our allocation policy can be improved optionally if we like. We can reduce (or eliminate) memory usage below 4G bar by playing with parameters (for the price to go to higher shift sometimes of course). I tried to explain this in example in case 2.

I am open for ideas how heap allocation can be improved (and we do have tools to do it on zLinux). However would be good to have allocation logic to be aligned with other platforms.

from openj9.

vijaysun-omr avatar vijaysun-omr commented on May 30, 2024

Thanks @dmitripivkine

My preference would be to go with the "top-down" scheme described under case 1. But, does this scheme not come with its own throughput risk ? Specifically, in a case where the heap size was such that the entire heap could have been contained in the lower 4gb, we may have been able to run without any shifting with the "bottom-up" approach, whereas with the "top-down" approach, we may do shift=1 (not as bad as shift=3 but also not as good as no shift). If so, maybe we need to compare shift=1 vs no shift (i.e. not what Rahil had collected before).

The optional enhancement described in case 2 maybe can come later, if we find that employing the "top-down" scheme did not help in enough of the cases that you are attacking with this proposal. Is it support cases that are driving this proposal and if so, do you feel that trying with just "top-down" approach change would be worth trying as an initial step to address what you are seeing with the support cases ? We can discuss going further if needed later in my opinion (but happy to hear more reasons to reconsider that position).

from openj9.

dmitripivkine avatar dmitripivkine commented on May 30, 2024

But, does this scheme not come with its own throughput risk ? Specifically, in a case where the heap size was such that the entire heap could have been contained in the lower 4gb, we may have been able to run without any shifting with the "bottom-up" approach, whereas with the "top-down" approach, we may do shift=1 (not as bad as shift=3 but also not as good as no shift). If so, maybe we need to compare shift=1 vs no shift (i.e. not what Rahil had collected before).

@vijaysun-omr No, there is no risk. There are details have not been described, I have focused on zLinux specific:
Full proposed allocation process steps (we are going to the next step if current one is skip or not succeeded, allocation direction top-down for all attempts):

  1. If requested heap size <= 4G attempt to allocate below 4G to get shift 0. This step is common for all platforms (except zOS). The only difference for zLinux is direction for allocation - instead bottom-up is going to be top-down.
  2. If requested heap size <= 8G attempt to allocate below 8G to get shift 1. This is proposed step specific for zLinux only.
  3. If requested heap size <= 16G attempt to allocate below 16G to get shift 2. This is proposed step specific for zLinux only.
  4. If requested heap size <= 28G attempt to allocate below 32G to get shift 3. 28G value is used to protect [0,4G] range explicitly. If heap size is larger than 28G force shift 4. This step is common for all platforms. However there is (only) new behaviour for zLinux. We can follow common path or make an exception for zLinux. I prefer common path.
  5. Allocate heap below 64G top-down and get smallest shift possible (expected to be 4). This step is common for all platforms.

So, steps 1, 4 and 5 exists today for all platforms except Z. I am suggesting to apply them for zLinux too with addition of steps 2 and 3 specific for zLinux only.

from openj9.

vijaysun-omr avatar vijaysun-omr commented on May 30, 2024

Thanks for those details. I am fine with the proposed "top-down" scheme since it carries no throughput risk.

from openj9.

dmitripivkine avatar dmitripivkine commented on May 30, 2024

Implementation eclipse/omr#7344

from openj9.

dmitripivkine avatar dmitripivkine commented on May 30, 2024

There are a few examples of heap location for new implementation:

-- 512m, 0-shift

1STHEAPREGION  0x000003FFA4081E10 0x00000000E0000000 0x00000000E0600000 0x0000000000600000 Generational/Tenured Region
1STHEAPREGION  0x000003FFA4081870 0x00000000FFE00000 0x00000000FFF00000 0x0000000000100000 Generational/Nursery Region
1STHEAPREGION  0x000003FFA40812D0 0x00000000FFF00000 0x0000000100000000 0x0000000000100000 Generational/Nursery Region

-- 3G located [1G,4G], 0-shift

1STHEAPREGION  0x000003FF98084800 0x0000000040000000 0x0000000040600000 0x0000000000600000 Generational/Tenured Region
1STHEAPREGION  0x000003FF98084260 0x00000000FFE00000 0x00000000FFF00000 0x0000000000100000 Generational/Nursery Region
1STHEAPREGION  0x000003FF98083CC0 0x00000000FFF00000 0x0000000100000000 0x0000000000100000 Generational/Nursery Region

— 4G, [4G, 8G] 1-shift

1STHEAPREGION  0x000003FFA4084940 0x0000000100000000 0x0000000100600000 0x0000000000600000 Generational/Tenured Region
1STHEAPREGION  0x000003FFA40843A0 0x00000001FFE00000 0x00000001FFF00000 0x0000000000100000 Generational/Nursery Region
1STHEAPREGION  0x000003FFA4083E00 0x00000001FFF00000 0x0000000200000000 0x0000000000100000 Generational/Nursery Region

— 5G, [3G,8G] 1-shift

1STHEAPREGION  0x000003FF7C084940 0x00000000C0000000 0x00000000C0600000 0x0000000000600000 Generational/Tenured Region
1STHEAPREGION  0x000003FF7C0843A0 0x00000001FFE00000 0x00000001FFF00000 0x0000000000100000 Generational/Nursery Region
1STHEAPREGION  0x000003FF7C083E00 0x00000001FFF00000 0x0000000200000000 0x0000000000100000 Generational/Nursery Region

— 11G, [5G,16G] 2-shift

1STHEAPREGION  0x000003FFA8084D40 0x0000000140000000 0x0000000140600000 0x0000000000600000 Generational/Tenured Region
1STHEAPREGION  0x000003FFA80847A0 0x00000003FFE00000 0x00000003FFF00000 0x0000000000100000 Generational/Nursery Region
1STHEAPREGION  0x000003FFA8084200 0x00000003FFF00000 0x0000000400000000 0x0000000000100000 Generational/Nursery Region

-- 23G, [9G,32G] 3-shift

1STHEAPREGION  0x000003FFB4084FC0 0x0000000240000000 0x0000000240600000 0x0000000000600000 Generational/Tenured Region
1STHEAPREGION  0x000003FFB4084A20 0x00000007FFE00000 0x00000007FFF00000 0x0000000000100000 Generational/Nursery Region
1STHEAPREGION  0x000003FFB4084480 0x00000007FFF00000 0x0000000800000000 0x0000000000100000 Generational/Nursery Region

— 27G, [5G,32G] 3-shift

1STHEAPREGION  0x000003FF84085350 0x0000000140000000 0x0000000140600000 0x0000000000600000 Generational/Tenured Region
1STHEAPREGION  0x000003FF84084DB0 0x00000007FFE00000 0x00000007FFF00000 0x0000000000100000 Generational/Nursery Region
1STHEAPREGION  0x000003FF84084810 0x00000007FFF00000 0x0000000800000000 0x0000000000100000 Generational/Nursery Region

— 29G, [35G,64G] 4-shift <--- this is only difference from current behaviour, pushed to 4-shift

1STHEAPREGION  0x000003FF88085350 0x00000008C0000000 0x00000008C0600000 0x0000000000600000 Generational/Tenured Region
1STHEAPREGION  0x000003FF88084DB0 0x0000000FFFE00000 0x0000000FFFF00000 0x0000000000100000 Generational/Nursery Region
1STHEAPREGION  0x000003FF88084810 0x0000000FFFF00000 0x0000001000000000 0x0000000000100000 Generational/Nursery Region

— 35G, [29G,64G] 4-shift

1STHEAPREGION  0x000003FFB0085490 0x0000000740000000 0x0000000740600000 0x0000000000600000 Generational/Tenured Region
1STHEAPREGION  0x000003FFB0084EF0 0x0000000FFFE00000 0x0000000FFFF00000 0x0000000000100000 Generational/Nursery Region
1STHEAPREGION  0x000003FFB0084950 0x0000000FFFF00000 0x0000001000000000 0x0000000000100000 Generational/Nursery Region

— 60G, [4G,64G] 4-shift

1STHEAPREGION  0x000003FFB80855D0 0x0000000100000000 0x0000000100600000 0x0000000000600000 Generational/Tenured Region
1STHEAPREGION  0x000003FFB8085030 0x0000000FFFE00000 0x0000000FFFF00000 0x0000000000100000 Generational/Nursery Region
1STHEAPREGION  0x000003FFB8084A90 0x0000000FFFF00000 0x0000001000000000 0x0000000000100000 Generational/Nursery Region

from openj9.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.