GithubHelp home page GithubHelp logo

Comments (10)

lhecker avatar lhecker commented on September 25, 2024 2

To explain how WPA is commonly used... When you open it, it'll look something like this:
image

Each tab can contain an arbitrary number of panes. When you click on the graph types on the left, new panes will be added to the current tab. As such, I usually first close all tabs and then open the graph that I want. In this case we want the "Virtual Allocations" graph which is in the "Others" section on the left. This will list all the processes that were recorded:

image

In the table view, everything to the left of the vertical yellow line are columns which group data and everything to right of the yellow line is an aggregate. It's a little bit like working with a database. Basically, everything to the left "maps" / "groups" and everything to right "reduces" / "sums".

Columns in WPA are special however: They can have complex rules and configurations to customize everything to your liking. If you're interested in this, click on the wheel icon at the top of the pane (next to the red "3" marker).

Here you can do a couple things, which I've marked with the red numbers:

  1. By right clicking on the process(es) that you actually want, you get to choose "Filter to Selection", which will remove all the noise.
  2. By right clicking on the headers of the table you can choose which columns to see. Here you can choose "Stack". The "Stack" column is configured by default to be to the left of the yellow line. That way you get a cumulative amount of allocations per stack trace.
  3. The graph button allows you to switch to a "Flame" (-graph) which is IMO the most useful view. If you aren't familiar with flame graphs: Each bar is a function call and it represents your call stack. The width of each bar represents the percentage it makes up compared to the total. For instance if your app allocates 1GB of memory and 1 function allocates 100MB, then its bar will be exactly 10% wide.
  4. Here you can change the view to hide the table and only show the graph. There's a "maximize" button which you can click to resize the pane to fit the tab size.

To get function names you have to load symbols. Unfortunately, even if you use a "Filter" it'll load symbols for all applications by default. This takes a long time. So what you can do is add a filter for symbol loading:

image

At the end it'll look something like this:
image

from terminal.

wdscxsj avatar wdscxsj commented on September 25, 2024 1

I'm glad to report that this issue is solved by the latest Intel graphics driver (31.0.101.5382), updated from the OEM-provided driver 31.0.101.5008. It works in both the latest release (1.21.921.0) of Terminal and the Canary.

On the same laptop, Canary with 10 tabs uses about 380 MB (Automatic or Direct3D 11). Software rendering uses a much lower 100 MB, but it's quite acceptable. Roughly the same numbers for the latest release with the AtlasEngine.

Frankly I didn't expect the new driver to work so well, since its release note doesn't mention a word about this issue. There are multiple releases in between, so it must be one of them that comes up with the fix.

Here is how a 10-tab Canary process now looks in VMMap. The 64 MB regions of 1 Read/Write are gone, replaced by roughly 32 MB for each tab with some activities.

Screenshot338

Huge thanks to @lhecker again. Your help and guidance are truly invaluable!

from terminal.

github-actions avatar github-actions commented on September 25, 2024

Hi I'm an AI powered bot that finds similar issues based off the issue title.

Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it. Thank you!

Closed similar issues:

Note: You can give me feedback by thumbs upping or thumbs downing this comment.

from terminal.

lhecker avatar lhecker commented on September 25, 2024

Can you please install our nightly ("canary") release? You can find it here: https://aka.ms/terminal-canary-installer

Afterwards please take the following steps:

  • Open 10 tabs as you did before
  • Open the settings tab (Ctrl+,) and its "Rendering", then change the Graphics API to Direct3D 11:
    image
    Click save and wait >10s.
  • Then enable the WARP setting
    image
    Click save again and wait another >10s. How did the memory usage change?

If the memory usage drops after the last step and only after the last step, we can already be extremely certain that it's due to your graphics driver.

However, we can debug it further if you'd like. There are two ways to do so:

  • Send us a full memory dump!
    You can do that in Task Manager: https://github.com/microsoft/terminal/wiki/Troubleshooting-Tips#capture-with-task-manager
    I'm not 100% sure whether Task Manager does a full memory dump, but I think it does. Afterwards, you can share it via email with us. You can find my email address in my GitHub profile. The dump file will be very large so you'll have to use some kind of file hoster.
  • Correlate the Thread Environment Block (TEB) with the thread!
    To do so open the TEB section and find one of the IDs:
    image
    Then use your favorite application to inspect threads. Since you're using VMMap, you may also be familiar with Process Explorer. In Process Explorer double click WindowsTerminal.exe, navigate to the Threads tab and find your ID in the TID column:
    image
    Then tell us the Start Address of that TID. A screenshot would be preferable. 🙂

from terminal.

wdscxsj avatar wdscxsj commented on September 25, 2024

Thanks for your detailed response! I've tried again with Terminal Canary, and the result is roughly the same as before. With 10 tabs open:

  • Graphics API = Automatic: 1012 MB
  • Graphics API = Direct 3D 11 (after 10s): 1028 MB
  • Use software rendering (WARP) (after 10s): 165 MB

I also suspect it's due to the graphics driver. This Intel Arc graphics card is not yet recognized by the latest GPU-Z...

A full memory dump would be around 100 GB. So I run VMMap as admin, and this is a screenshot of the Total memory:

a

None of the Private Data regions shows a thread ID (otherwise I would notice that yesterday). After 1 hour of waiting and some activities in Terminal Canary, a refreshed view shows each 65,536 KB Private Data regions still has 1 Read/Write.

The top ASLR Image is igc64.dll (65.7 MB of file size) from the graphics driver, the Intel Graphics Shader Compiler for Intel(R) Graphics Accelerator. It's also loaded by dwm, IGCC, Chrome, VSCode, etc.

I guess it's better to stay with WARP until an updated driver brings good luck, right?

from terminal.

lhecker avatar lhecker commented on September 25, 2024

A full memory dump would be around 100 GB.

I only meant a dump of WindowsTerminal.exe. It should only be ~1028MB as you noted. However, I sort of realized that this is not needed anyways. There are better ways to investigate the issue...

If you have Windows Performance Recorder (WPR) installed, you can

  • select the checkbox for "VirtualAlloc usage"
  • unselect all other checkboxes
  • click "Start"
  • launch Windows Terminal
  • click "Save"

It can then be debugged in the Windows Performance Analyzer (WPA). You can find the latter in the app store, which I believe should also install the former. In any case, this will net us something like this:
image

It would tell us exactly where it's coming from. I probably don't have access to the symbols for Intel's drivers, but I know people who do, so I could send it to them. If you want to do such a WPR trace, I'd be happy to check it out!

None of the Private Data regions shows a thread ID (otherwise I would notice that yesterday). After 1 hour of waiting and some activities in Terminal Canary, a refreshed view shows each 65,536 KB Private Data regions still has 1 Read/Write.

Any allocation via VirtualAlloc is labeled with "Thread Environment Block" for whatever reason. Only those with an ID next to them are actual TEBs and refer to stack memory. Since your allocations don't have an ID, they must be VirtualAlloc calls with a 64MiB size. If I had to take a guess, I suspect that Intel's driver is using arena/linear allocators and forgot that you're supposed to MEM_RESERVE the address space and only then gradually MEM_COMMIT it. 😅

However, it's very suspicious that only we're affected and no one else. One thing you could try instead of using WARP is to set the "Graphics API" to "Direct2D" (and with WARP disabled).

from terminal.

wdscxsj avatar wdscxsj commented on September 25, 2024

Thank you very much! I've learned a lot again. The download link for a WPR trace file with a full memory dump has been sent to your email.

from terminal.

lhecker avatar lhecker commented on September 25, 2024

For some reason my WinDbg can't search any heaps anymore, but I need that because otherwise I can't find the addresses of the AtlasEngine instances in the memory. So, I'll have to unfortunately respond later when it comes to the dump.

However, given the stack trace in the WPR I believe it's likely that the driver allocates a 64MiB ring buffer for uploading Direct3D resources that have D3D11_CPU_ACCESS_WRITE.

In any case, I believe this may be another indication why we need #15186 much more urgently than it may seem.

from terminal.

wdscxsj avatar wdscxsj commented on September 25, 2024

Thanks a lot for your help and detailed explanation. Now I have my stack view and flame graph, too!

The laptop is using the latest graphics driver from the OEM (Lenovo), but there is a newer version from Intel released on March 27. After a public holiday leave of 3 days, I can try my luck again with an update.

from terminal.

carlos-zamora avatar carlos-zamora commented on September 25, 2024

Thank you so much for following up! We'll close this and keep it around for anybody that asks this questions. 😊

from terminal.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.