GithubHelp home page GithubHelp logo

fastmm4's People

Contributors

ahausladen avatar ajax16384 avatar brianjford avatar gabr42 avatar jeroenuw avatar jimmckeeth avatar pleriche avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastmm4's Issues

Can there be added helper functions for MS iMalloc support ?

https://msdn.microsoft.com/ru-ru/library/windows/desktop/ms678425.aspx

This interface can be consumed by MS libraries such as XMLLite for heap management.
It potentially can optimize application memory layout by avoiding using of concurrent heap managers.

Three methods are already mapped:
iMalloc // FastMM4
AllocMem -> GetMem
ReAllocMem -> ReAllocMem
Free -> FreeMem

However there are three more methods.

HeapMinimize : Minimizes the heap as much as possible by releasing unused memory to the operating system, coalescing adjacent free blocks, and committing free pages.
// might be implemented by a do-nothing stub, or might really enforce deferred memory freeing, if FastMM has such a concept.

DidAlloc : Determines whether this allocator was used to allocate the specified block of memory.
// might be "implemented" by stub, always returning -1 aka "I don't know" value

GetSize : Retrieves the size of a previously allocated block of memory.
// this part is tricky, it asks "how much memory was really allocated" including extra slack for potential future growth
// a rather convoluted code of this kind is within ReAllocMem routine, but it is way more convoluted to justify copy-past. May that code be extracted to a separate helper function ? Inlined one, if extra indirection would hurt performance badly (but will it? there are already calls to (un-)locking small/med/big segs anyway)?

Optional sort by name in `LogMemoryManagerStateToFile`

Currently this method sorts by TotalMemoryUsage:

function LogMemoryManagerStateToFile(const AFileName: string; const AAdditionalDetails: string = ''): Boolean;

For monitoring live memory allocations in an application, and comparing before/after in a call stack, it can make sense to have a different sort order, for instance on the "Name" or InstanceCount of the TMemoryLogNode item.

As far as I can see now this would require a couple of changes:

  • introduce a mechanism to specify the sort order (could as simple as an enumeration type or set of enumeration type with default value to mean TotalMemoryUsage)
  • abstract the name generation part of LogMemoryManagerStateToFile into a separate method
  • abstract comparison of two TMemoryLogNode items into a separate method similar to System.Generic.Collections.TList.Sort (IComparer<T>) but without using interfaces (and implementations that avoid allocating heap memory)
  • adopt these methods to allow for a different sorting mechanism:
    • procedure QuickSortLogNodes(APLeftItem: PMemoryLogNodes; ARightIndex: Integer);
    • procedure InsertionSortLogNodes(APLeftItem: PMemoryLogNodes; ARightIndex: Integer);

Strange AccessViolation in GetRawStackTrace

I'm running FullDebugMode on a large application and sometimes it raises some access violations inside FullDebugMode DLL, on GetRawStackTrace specifically. The exception seems to be handled since program execution doesn't get affected at all, but I'd rather not happen when debugging our application since other developers would freak out (we're mandating everyone to check for memory leaks before posting code).

I debugged the DLL and the access violation is happening on IsValidCallSite at line 286:

if PByteArray(LCallAddress)[3] = $E8 then

This exception happens when AReturnAddress points to an address below 4MB ($3fffff). I noticed at the beginning of the function that it only tests addresses above 65KB and below 4GB, so I changed the first line to

if (AReturnAddress > $3fffff) and (AReturnAddress <= $ffffffff) then

And the access violations stop happening, but I'm not sure if this is a valid change, what kind of side effects it can introduce, if my application has a stack corruption, or if those access violations weren't supposed to happen at all just by reading an address below 4MB.

I'm looking for pointers on how to gather more info about this issue.

With regards,

Correlate NoMessageBoxes with SuppressMessageBoxes in documentation/comments

Edit

The global Boolean variable SuppressMessageBoxes already suppresses the effect of a missing NoMessageBoxes define, but it is not obvious these are correlated.

Adopt documentation/comments to make their relation more obvious.

Time permitting, I will submit a pull-request.

Original title

Have NoMessageBoxes depend on a global boolean

Original

Sometimes I want the NoMessageBoxes to depend on run-time circumstances, for instance:

  • force NoMessageBoxes when running as a standalone process
  • disable NoMessageBoxes when running under the debugger

For that, it would be nice if NoMessageBoxes was implemented through a global variable.

Time permitting, I will submit a pull-request.

Pooling large blocks

We are using FastMM in its latest version in our application and are very pleased with its capabilities, especially the full debug mode support.
However, in memory intensive cases, its performances are really below what we can achieve with TBBMalloc, but this one has serious disadvantages of its own on machines with lots of cores (>16)
Digging in our own source code, we came up with a simple example that illustrates the issue, which code is as follows:

program TestFastMM;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  FastMM4,
  System.Diagnostics,
  System.TimeSpan,
  System.SysUtils,
  System.Classes,
  System.Generics.Collections,
  Winapi.Windows;

type
  TTestThread = class(TThread)
  public
  procedure Execute; override;
  end;

var
  Stopwatch: TStopwatch;
  Elapsed: TTimeSpan;
  ThreadList: TList<TThread>;
  Threads: array of TTestThread;
  iGlobal: Integer;

const
  C_StrL = 16351;
{ TTestThread }

procedure TTestThread.Execute;
var
  CurrentStringList: TStringList;
  i: Integer;
  CurrentString: string;
begin
  CurrentStringList := TStringList.Create;
  try
    for I := 1 to 1571000 do
    begin
      SetLength(CurrentString, C_StrL);
      SetLength(CurrentString, 0);
      CurrentStringList.Add(IntToStr(Random(i)) + 'bob' + IntToStr(Random(i)));
    end;
  finally
    CurrentStringList.Free;
  end;
end;

begin
  try
    Stopwatch := TStopwatch.StartNew;

    SetLength(Threads, 40); // highly parallel
    ThreadList := TList<TThread>.Create;
    try
      for iGlobal := Low(Threads) to High(Threads) do
      begin
        Threads[iGlobal] := TTestThread.Create;
        ThreadList.Add(Threads[iGlobal]);
      end;

      while ThreadList.Count > 0 do
      begin
        if ThreadList[0].WaitFor = WAIT_OBJECT_0 then
          ThreadList.Delete(0);
        Sleep(10);
      end;
    finally
      ThreadList.Free;
    end;

    Elapsed := Stopwatch.Elapsed;
    Writeln(Format('FastMM took %n milliseconds', [Elapsed.TotalMilliseconds]));
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
  ReadLn;
end.

On my Core i7 computer, this takes around 25s while the same program with TBBMalloc takes only 5.
Looking at FastMM source code, I discovered that this is because our TStringList quickly grows above the maximum medium block size which is 264768 bytes and thus leads to lots of calls to VirtualAlloc inside AllocateLargeBlock. In the program above, there are 448 calls, which, if this is the only difference, accounts for 40ms per call to VirtualAlloc (that sounds quite realistic).
I tried adjusting MediumBlockBinGroupCount so that I get a larger value for MaximumMediumBlockSize but all I achieved was to get Access violations very fast.

In the end, I believe it would be much nicer if the large blocks were also pooled like small and large blocks, which would be very nice for us as we are manipulating lots of objects in lists under our x64 applications.

Would anyone have any suggestion on this subject?

Access violation in 64 bits when calling LogStackTrace (using JCLDebug)

Hello,

Hunting memory leaks in my application in full debug mode in 64 bits, I have an access violation when LogStackTrace is called from the DLL. Reading the code, I suspect the following line from generating this error:
GetLocationInfo(Pointer(Cardinal(LAddress) - 1), LInfo);
Changing it to GetLocationInfo(Pointer(LAddress - 1), LInfo); fixes the access violation, but maybe there is more to it.

Odd behavior when FullDebugMode is used

Hello,

We've been using FastMM to track down memory leaks with great success and it always showed issues in our code.
Today, we are running in a situation where we get an access violation in FreeMem which origin we cannot explain. That crash only occurs in "FullDebugMode" and is shown by the IDE but it appears to be trapped somewhere as it does not prevent the application from continuing just fine.
We suspect our architecture to be the origin of the issue, which is why I'm describing it now:
The application is built (x86) using Delphi Seattle.
It loads a DLL via LoadLibrary and calls a method to tell the DLL about existing callbacks inside our application.
Then later on, the application calls a method in the DLL that does its work and calls our callback.
If we do any memory allocation in that callback, we get an access violation later on while freeing an unrelated TObjectList.
If we make sure no memory allocation occurs during the callback, then we do not get the access violation.
All these calls are done inside the same thread, which is not the main one.
Any DLL method and the callbacks use the register convention.

To sum up, here is the code lifetime
Application Start
Thread creation
Thread code loads DLL
Thread code tells DLL about callbacks
Thread prepares work, creates a TObjectList
Thread calls DLL method
DLL method does its work, calls one of our callback
Callback runs
Callback does getmem(P, 1)
Callback exits
DLL method exists
Thread continues its execution (may call the same DLL method again)
Thread finalizes, frees the created TObjectList
Access violation is raised in the TObject.Destroy / FreeInstance call

If we remove the getmem(P, 1) call, we don't get the AV.
If we replace that call with a refcounted memory object (assign a string, set TBytes length...), we get the AV

The initial code was setting a string to a new value, but we got down to a simple getmem(P, 1) call to trigger the AV.

As the DLL is compiled with FPC, we initially suspected a memory manager replacement, but the DLL lives in its own world, no sharemem is involved whatsoever. And stepping through the assembly code, we do end up in FastMM code itself which the DLL does not include in any way.
If we run without tracking memory leaks, we don't get the AV either.

Here are the options for our FullDebugMode setup:

{$define FullDebugMode}
{$define EnableMemoryLeakReporting}
{$define NoMessageBoxes}
{$define LoadDebugDLLDynamically}
{$define DoNotInstallIfDLLMissing}
{$undef RequireDebuggerPresenceForLeakReporting}

I'm quite sure this comes from our own code, but for the life of me, I can't figure out what we have done wrong.
Any help would be greatly appreciated.

FPC support??

Browsing the FastMM4.pas, I found many entries related to FPC (32bit and 64bit); yet the readme only mentions the Delphi4+ and BC4+;
so is FastMM4 FPC ready?
what's limitation, and how it fares compared to FPC internal memory manager.
how to use it.

Functions returning PAnsiChar don't work in FullDebugMode

Platform: Win32
Delphi version: At least from XE6 to 10 Seattle (no more versions tested)

The following program works well when FullDebugMode is off. When you turn on FullDebugMode then function StringToPAnsiChar(const Value: string): PAnsiChar; returns nonsense.

program FullDebugModePAnsiCharError;
{$APPTYPE CONSOLE}

uses
  FastMM4,
  System.SysUtils;

function StringToPAnsiChar(const Value: string): PAnsiChar;
begin
  Result := PAnsiChar(AnsiString(Value));
end;

function StringToPAnsiCharInline(const Value: string): PAnsiChar; inline;
begin
  Result := PAnsiChar(AnsiString(Value));
end;

function StringToPWideChar(const Value: string): PWideChar;
begin
  Result := PWideChar(Value);
end;

var
  TestString: string;
  pAnsiTest: PAnsiChar;
  pWideTest: PWideChar;

begin
  TestString := 'Test';
  pAnsiTest :=  PAnsiChar(AnsiString(TestString));
  Writeln(pAnsiTest); // Writes: 'Test'
  pAnsiTest := StringToPAnsiChar(TestString);
  Writeln(pAnsiTest); // Writes: Nonsense in FullDebugMode -- Writes : 'Test' if not in FullDebugMode
  pAnsiTest := StringToPAnsiCharInline(TestString);
  Writeln(pAnsiTest); // Writes: 'Test'
  pWideTest := StringToPWideChar(TestString);
  Writeln(pWideTest); // Writes: 'Test'
  readln;
end.

Is there any chance to get this fixed?

Fix for spurious GetRawStackTrace Access Violations when using FastMM_FullDebugMode.dll

I've been getting intermittent spurious Access Violations on shutdown of my application from GetRawStackTrace in FastMM_FullDebugMode.dll which are just plain annoying.

I'm aware that this has been a known issue with FullDebugMode since 2014 (or so).
https://stackoverflow.com/questions/22685386/occasional-access-violation-in-fastmm4-debuggetmem

In my case, I've determined that it's due to IsValidCallSite having an out-of-date MemoryPageAccessMap as a DLL which was at the address in question has just been unloaded during finalization for a unit, however one of it's return addresses is still sitting in a non-overwritten stack slot when DebugFreeMem is called.

I have a patch to FastMM_FullDebugMode.dpr that uses LdrRegisterDllNotification to add a DLL unload watcher to handle these cases and update the cache and thus avoid these access violations. It seems to work in that my reliably triggerable AV is now gone.

Ldr Dll Notifications are a documented internal NTDLL.DLL API that has existed since Windows Vista and are used by a number of low level tools and emulators for various purposes. Microsoft has scary warnings that they may remove them at any time but they are still there unchanged 13 years later in Windows 10 so you can make of that what you will.

https://docs.microsoft.com/en-us/windows/win32/devnotes/dll-load-notification

If others are bothered by these A/Vs like I am (and I can see a number of issues where developers have bumped into them and been confused by them), I've created a fork of FastMM4 and committed the code changes so others can see and evaluate them for the own use.

My code uses LoadLibrary/GetProcessAddress against NTDLL.DLL to check for the necessary Ldr APIs since that way anything earlier than Vista will just skip the new code. Similarly, if Microsoft does remove these APIs in a later Windows version for engineering reasons, the code will see that they are gone and not attempt to use them.

Via StackOverflow: FastMM should check DLL version number

From https://stackoverflow.com/questions/22461755/does-calling-fastmm4-logallocatedblockstofile-periodically-use-up-memory-space:

I have tracked this down to be a version mismatch of the support library FastMM_FullDebugMode.dll.

An older version of the library works with the newer version compiled into the executable. There seems to be no check that versions do match. However, modules don't really work together at run-time.

I think it is wise to add a version check so FastMM ensures the DLL is of the same version (or maybe a compatible version, but that makes version checking logic more cumbersome).

Question for optimisation code

Hello,

I've see in fastmm4.pas code a line (12167) like :

procedure AppendMemorySize(ASize: NativeUInt);
begin
if ASize < 10*1024 then….

I Wonder if this could be replcaced by

procedure AppendMemorySize(ASize: NativeUInt);
begin
if ASize < 10240 then…

That should be more faster… If it's correct can we Apply this to other case like this one… ?
(ex : 10x1024x1024 replaced by 10485760...)

With best regards,
MLCVISTA

Problem with shared MM between exe and dll

I compile dll and exe with ShareMM option, but when I unload dll (FreeLibrary) from exe then FastMM hangs on FinalizeMemoryManager->DestroyCleanupThread called from DLL finalization.

procedure DestroyCleanupThread;
begin
  if ReleaseStackCleanupThread <> 0 then
  begin
    SetEvent(ReleaseStackCleanupThreadTerminate);
    WaitForSingleObject(ReleaseStackCleanupThread, INFINITE);  <--- hangs on waiting
    CloseHandle(ReleaseStackCleanupThread);
    ReleaseStackCleanupThread := 0;
    CloseHandle(ReleaseStackCleanupThreadTerminate);
    ReleaseStackCleanupThreadTerminate := 0;
  end;
end;

UseReleaseStack hangs at DestroyCleanupThread when registering COM dll on Win 2016 server

When I try to register a COM Dll compiles with FastMM 4.992 with the UseReleaseStack option enabled then registration using tregsvr hangs. Also adding the dll into a COM+ application hangs as well as registration using regsvr32.

The callstack created from a process dump point to FastMM4.DestroyCleanupThread

.  0  Id: 1158.ee8 Suspend: 0 Teb: 00000000`00298000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`0014f7f8 00007fff`5e3c3ebf ntdll!NtWaitForSingleObject+0x14
*** WARNING: Unable to verify checksum for ValidateUser.dll
00000000`0014f800 00000000`021a1029 KERNELBASE!WaitForSingleObjectEx+0x8f
00000000`0014f8a0 00000000`021a1072 ValidateUser!FastMM4.DestroyCleanupThread+0x29
00000000`0014f8d0 00000000`021a112e ValidateUser!FastMM4.FinalizeMemoryManager+0x12
00000000`0014f900 00000000`0218d29a ValidateUser!FastMM4.Finalization+0x1e
00000000`0014f930 00000000`0218dc24 ValidateUser!System.FinalizeUnits+0x6a
00000000`0014f990 00000000`0218d4a3 ValidateUser!System.Halt0+0xc4
00000000`0014f9d0 00000000`02195662 ValidateUser!System.StartLib+0x123
00000000`0014fa20 00000000`024bbaa8 ValidateUser!SysInit.InitLib+0x92
00000000`0014fa90 00007fff`61c3389f ValidateUser!ValidateUser.ValidateUser+0x38
00000000`0014fb30 00007fff`61c26c89 ntdll!LdrpCallInitRoutine+0x4b
00000000`0014fb90 00007fff`61c28700 ntdll!LdrpProcessDetachNode+0xf5
00000000`0014fc60 00007fff`61c4a0b3 ntdll!LdrpUnloadNode+0x40
00000000`0014fcb0 00007fff`61c4a034 ntdll!LdrpDecrementModuleLoadCountEx+0x6b
00000000`0014fce0 00007fff`5e3a02cd ntdll!LdrUnloadDll+0x94
*** ERROR: Module load completed but symbols could not be loaded for tregsvr.exe
00000000`0014fd10 00000000`00543f8c KERNELBASE!FreeLibrary+0x1d
00000000`0014fd40 00000000`0054524c tregsvr+0x143f8c
00000000`0014fe40 00000000`0054ed47 tregsvr+0x14524c
00000000`0014fed0 00007fff`5f798364 tregsvr+0x14ed47
00000000`0014ff60 00007fff`61c7e851 kernel32!BaseThreadInitThunk+0x14
00000000`0014ff90 00000000`00000000 ntdll!RtlUserThreadStart+0x21

   1  Id: 1158.165c Suspend: 0 Teb: 00000000`0029a000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`0089fb58 00007fff`61c39e4e ntdll!NtWaitForWorkViaWorkerFactory+0x14
00000000`0089fb60 00007fff`5f798364 ntdll!TppWorkerThread+0x76e
00000000`0089ff60 00007fff`61c7e851 kernel32!BaseThreadInitThunk+0x14
00000000`0089ff90 00000000`00000000 ntdll!RtlUserThreadStart+0x21

   2  Id: 1158.a4 Suspend: 0 Teb: 00000000`0029c000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`02a4f8f8 00007fff`61c28691 ntdll!NtWaitForSingleObject+0x14
00000000`02a4f900 00007fff`61c31143 ntdll!LdrpDrainWorkQueue+0xe5
00000000`02a4f940 00007fff`61c8840d ntdll!LdrpInitializeThread+0xa3
00000000`02a4fa30 00007fff`61c8832e ntdll!_LdrpInitialize+0x89
00000000`02a4fab0 00000000`00000000 ntdll!LdrInitializeThunk+0xe

I guess there could be some kind of race condition.

The registration of the same DLL does not hang on Win 2012 Server or Win 7 machine.

Document default 32-bit values of `DebugFillPattern` and `DebugReservedAddress`

From FastMM4.pas, the default values are unclear:

{$ifdef 32Bit}
  DebugFillPattern = $01010101 * Cardinal(DebugFillByte);
  {The address that is reserved so that accesses to the address of the fill
   pattern will result in an A/V. (Not used under 64-bit, since the upper half
   of the address space is always reserved by the OS.)}
  DebugReservedAddress = $01010000 * Cardinal(DebugFillByte);
{$else}
  DebugFillPattern = $8080808080808080;

I will submit a pull request with values $80808080 and $80800000 for DebugFillPattern and DebugReservedAddress.

That makes it easier when you search for the values, as right now only a line of code inside DebugFillPattern and DebugReservedAddress

FullDebugMode does not work on OSX32

There is an access violation in GetStackRange (FastMM_FullDebugMode.dpr) when it executes mov ecx, fs:[4]

procedure GetStackRange(var AStackBaseAddress, ACurrentStackPointer: NativeUInt);
asm
{$if SizeOf(Pointer) = 8}
mov rax, gs:[abs 8]
mov [rcx], rax
mov [rdx], rbp
{$else}
mov ecx, fs:[4]
mov [eax], ecx
mov [edx], ebp
{$ifend}
end;

This code can work only on Windows, it queries the Win32 Thread Information Block.
I can go further if I change
mov ecx, fs:[4]
mov [eax], ecx
to
mov [eax], $ffffffff

Regards,

Michel Terrisse

D 10.1 Berlin: crash with empty FMX application

Hello,

using FastMM4 in an empty FMX application in Delphi 10.1 Berlin results in crash in procedure TMonitor.Destroy;

steps to reproduce:

  • new empty FMX application, target Win32
  • Add FastMM4 as first unit in dpr file
  • After closing see some FMX memory leaks
  • Crash in system.pas

crash

Best regards
Dirk

BCB6 - no line numbers in memory leak report

I follow every step mentioned in FastMM4_FAQ.txt trying to have line numbers for my .cpp source be included in memory leak report but to no avail although .map file is generated.

The following options are selected and switches are on:

  • Code optimization -> None
  • Debugging -> Debug information
  • Debugging -> Line number information
  • Debugging -> Disable inline expansions
  • Linking -> Create debug information
  • Linking -> Use debug libraries
  • Map file -> Detailed
  • Pascal page: Debugging: Debug information

The following switches are off

  • Pascal page: Code generation -> Optimization

Render images in Windows sever

Hello, Im generating pdf files in windows server, but when the file contains an image, render it wrong.
I already try with
pdfDoc.VCLCanvas.StretchDraw
pdfDoc.Canvas.DrawXObject
pdfDoc.VCLCanvas.Draw
with the same results
Windows server sample
pdf.pdf
Windows 10 sample
pdf2.pdf
Server is 2012 R2 Standard

SetMMLogFileName undefined

I have a delphi ISAPI DLL program, in my project options I have Debug configuration for 32bit platorm set.
I also have the following compiler declaratives set.

{$INCLUDE FastMM4Options.inc}
{$DEFINE FullDebugMode}
{$Define LogMemoryLeakDetailToFile}
{$DEFINE EnableMemoryLeakReporting}

However, when I reference SetMMFileName it is not recognized. Any ideas why?

Problem with LogLockContention

See post:
https://plus.google.com/u/0/108264621339439827681/posts/NGXDpCcjjUZ

quote:
"I try use FastMM4 to tracking bottleneck in allocating memory.
I was inspired by this Primož movie:

https://www.youtube.com/watch?v=p-5mJyXvmrc

When I turn off FullDebugMode and turn on LogLockContention then I get compiler error:

[dcc32 Error] FastMM4.pas(3274): E2003 Undeclared identifier: 'DebugFillMem'

But when I turn on FullDebugMode then code with Collector is ommited:

function FastGetMem()
[...]
{$ifdef LogLockContention}
{$ifndef FullDebugMode} <--- this is correct?
if Assigned(ACollector) then
begin
GetStackTrace(@LStackTrace, StackTraceDepth, 1);
ACollector.Add(@LStackTrace[0], StackTraceDepth);
end;
{$endif}
{$endif}

How to correct check bottleneck in memory usage?"

UseReleaseStack does not work with ActiveX

Hello

I'm using latest 4992 version and I activated UseReleaseStack option because our application is highly, and I mean, HIGHLY multi-threaded and we were suffering a lot with thread contention.

I tried a different memory manager like ScaleMM2 and performance went to the roof, great, but somehow that library uses too much memory and we decided to go back to FastMM and we found this new flag which dramatically reduced thread contention and decreased about 30 to 40% of CPU usage from what we used to have...

The problem we are seeing now is our ActiveX controls, if we use UseReleaseStack flag, the ActiveX control does not work, it hangs... even Delphi cannot import a unit with the Interface of the activeX control (once its compiled)

If we register the control with regsrv32, regsrv32 will register it and hang after registering.

Is there a reason for the ActiveX control not work with this flag activated?

Meanwhile we will remove FastMM from our ActiveX projects because we need UseReleaseStack for our software, it makes wonders!

Configurable callstack depth

const
  {The stack trace depth. (Must be an *uneven* number to ensure that the
   Align16Bytes option works in FullDebugMode.)}
  StackTraceDepth = 11;

Please add posibility to configure value of StackTraceDepth const by modify FastMM4Options.inc

Avoiding AVX-SSE transition penalties; Faster memory copy.

Our application uses AVX (VEX-prefixed) instructions. As you know, transition between SSE instructions that don’t have VEX prefix and VEX-prefixed AVX instructions involves huge state transition penalty. You may find more information at https://software.intel.com/en-us/articles/avoiding-avx-sse-transition-penalties

The VZEROUPPER, which is supposed to help avoid transition penalty, is very expensive (slow) on some processors, and there is no reliable way on how to detect whether it is expensive or cheap. Besides that, contrary to the logic, testing shows that on Kaby Lake processors, calling VZEROUPPER at least once in an application makes subsequent non-VEX-prefixed SSE instructions 25% slower.

So the most reliable way is just to avoid mixture of AVX instructions and SSE instruction: all the instructions should have VEX prefix. See more on the VEX prefix at https://en.wikipedia.org/wiki/VEX_prefix In short, all the assembly instructions will start with “v”, i.e. instead of “movdqa” there will be “vmovdqa”, etc. So, all instructions will become vector and there will be no transitions and no need to call VZEROUPPER.

To accomplish the uniformity between the classes of instructions and avoid the mixture, we should detect if our CPU supports AVX, and, if it does, never call a single non-VEX-prefixed (legacy) SSE instruction.

I have added corresponding code to FastMM4. There were SSE code in memory copy routines inside FastMM4. I have written vector (AVX) counterparts for all SSE routines used in FastMM, and added some more routines, for some larger block sizes, up to 128 bytes. As a positive side effect (free bonus), since AVX registers are twice as large as SSE registers, we can now use larger (32-byte) registers for memory copy. Since SkyLake and later processors have two load units and one store unit, and each of the units is able to process one 32-byte AVX register load/store per clock cycle, and the CPU effectively rearranges instructions using superscalar Out-of-order Execution, we can effectively load 64 bytes (232) per clock and store 32 bytes (132) that same clock cycle, and simultaneously with that up to five simple instructions with registers can be executed on that same single clock cycle.

My modifications only apply to 64-bit code of FastMM4, since AVX only exists in 64-bit mode.
I have also improved the MoveX16LP routine so it now became up to 4 times faster – you can run your own testing to prove that – the results will vary on different microarchitectures. The MoveX16LP was particularly very slow on SkyLake/Kaby Lake processors, because these processors don’t like branches (loops) much when doing memory copy, and we had just the following loop “movdqa (load 16 bytes), movdqa (store 16 bytes), add (16 bytes to the counter), js “ – this is very slow on SkyLake/Kaby Lake - unrolling loops a little bit helps much!

Since FastMM has blocks aligned by 16 bytes, when the size of an AVX ymm register is 32 bytes – all AVX load/store memory addresses should be aligned by 32 – thus we cannot always use aligned AVX move, which is a little bit faster. So I have made checks in the MoveX16LP, and, if the addresses are aligned, we use aligned load/store, when possible
Besides that, if a processor supports the Enhanced Rep MOVSB/STOSB feature, we can also use it to gain significant speed improvement. It is the fastest way to copy memory if the feature is present in the CPUID, but the startup cost is very high, this it is only worth calling for larger block sizes.

I have added VEX memory fixed-block copy routines for both Windows and Unix, but as about the MoveX16LP, I only made it for Windows so far. I can make it for Unix as well if you wish.
At the end of each routine, we clear the ymm registers what we’ve just used: both for security reasons, to not expose the leftovers, and to not raise probable transition issues caused by dirty higher bits, on some processors.

Unfortunately, Delphi internal assembler doesn’t yet support AVX instructions, so I’ve put byte codes.
Please consider adding AVX support to FastMM – just take the attached code that I’ve written and commit it to the repository. As I wrote before, memory copy is 4 times faster with that coce, because the existing MoveX16LP wasn’t very fast.

I have also added the EnableAVX define to this code. You can make it disabled by default, if you wish.

Please note that this code relies on the CPUID structure defined in System.pas, not the CPUID called from within the FastMM itself.

LogMemoryManagerStateToFile doesn't log anything when used in a Dll

Hi and thanks for your great tools.

I'm having a rough time trying to track memory leaks for memory allocated in a dynamically loaded dll.

Trying to use the standard 'report on shutdown' facility, all objects are reported as 'unknown' with no stack trace information at all. Well I can admit it as the dll has been unloaded at this time.

I then tried the LogMemoryManagerStateToFile function inside my dll to get snapshots information and do manual comparisons. Unfortunately, the function does not report anything worthy.

FastMM State Capture:
---------------------

0K Allocated
0K Overhead
100% Efficiency

Usage Detail:

It's exactly the same behavior when trying to do the same in your Dynamically Loaded DLL demo adding a call to LogMemoryManagerStateToFile in TfDLLMain.Button1Click

I'm using full debug mode and if I add the same call in the TestApplication.exe, I get a correct report.

Hope someone here can help.

Regards
Sylvain

FastMM_FullDebugMode64.dll defect

It seems the 64bit version of the DLL still contains the defect if the executable does not contain debug information. This is reproduceable with an empty VCL application that just creates a small leak of say a TStringList and then close it. You get an AV from the DLL.

If you recompile it using a rather recent JCL version the problem is gone.

Add conditional define EnableMemoryLeakReportingUsesQualifiedClassName and support for it

I propose a new define option EnableMemoryLeakReportingUsesQualifiedClassName:

Set this option to use QualifiedClassName equivalent instead of ClassName
equivalent during memory leak reporting.

This is useful for duplicate class names (like EConversionError, which is in
units Data.DBXJSONReflect, REST.JsonReflect and System.ConvUtils,
or TClipboard being in Vcl.Clibprd and WinAPI.ApplicationModel.DataTransfer.

I will post a pull-request for this.

Strange stacktrace when detecting memory leak on x64 compiled application

Hi,

I created a small Delphi XE10.2 application explicitly using FastMM4 and memory leak detection on debug mode. It creates a simple memory leak.
When compiling for Win32, the memory log looks fine:

A memory block has been leaked. The size is: 36

This block was allocated by thread 0x1398, and the stack trace (return addresses) at the time was:
419C66 [FastMM4][DebugAllocMem$qqri]
4070B2 [System.pas][System][AllocMem$qqri][4701]
642E75 [Unit1.pas][Unit1][TForm1.Button1Click][30]
591EE9 [Vcl.Controls.pas][Vcl.Controls][Controls.TControl.Click][7454]
5A9847 [Vcl.StdCtrls.pas][Vcl.StdCtrls][Stdctrls.TCustomButton.Click][5449]
5AA355 [Vcl.StdCtrls.pas][Vcl.StdCtrls][Stdctrls.TCustomButton.CNCommand][5910]
591979 [Vcl.Controls.pas][Vcl.Controls][Controls.TControl.WndProc][7338]
7750702C [GetWindowLongW]
77507038 [GetWindowLongW]
77512420 [NotifyWinEvent]
75028D07 [Unknown function at DPA_Merge]

But when compiling for Win64, the same result looks really strange:

A memory block has been leaked. The size is: 40

This block was allocated by thread 0xB44, and the stack trace (return addresses) at the time was:
426EE9 [FastMM4][_ZN7Fastmm411DebugGetMemEx]
427405 [FastMM4][_ZN7Fastmm413DebugAllocMemEx]
409380 [System.pas][System][_ZN6System8AllocMemEx][4672]
7546A4 [Unit1.pas][Unit1][_ZN5Unit16TForm112Button1ClickEPN6System7TObjectE][30]
64C9B3 [Vcl.Controls.pas][Vcl.Controls][_ZN3Vcl8Controls8TControl5ClickEv][7454]
67001B [Vcl.StdCtrls.pas][Vcl.StdCtrls][_ZN3Vcl8Stdctrls13TCustomButton5ClickEv][5449]
670F44 [Vcl.StdCtrls.pas][Vcl.StdCtrls][_ZN3Vcl8Stdctrls13TCustomButton9CNCommandERN6Winapi8Messages10TWMCommandE][5910]
40DA85 [System.pas][System][_ZN6System7TObject8DispatchEPv][17790]
64C2A0 [Vcl.Controls.pas][Vcl.Controls][_ZN3Vcl8Controls8TControl7WndProcERN6Winapi8Messages8TMessageE][7338]
6530B0 [Vcl.Controls.pas][Vcl.Controls][_ZN3Vcl8Controls11TWinControl7WndProcERN6Winapi8Messages8TMessageE][10209]
66FA65 [Vcl.StdCtrls.pas][Vcl.StdCtrls][_ZN3Vcl8Stdctrls14TButtonControl7WndProcERN6Winapi8Messages8TMessageE][5286]

Where do that strange characters come from? Is this a known problem? And howto fix it (or where to look for it)?

kind regards,
Ulrich

Weak references crash the application on exit when used with FastMM

I have a sample application here, compiled with Tokyo 10.2.3, where two interface implementations reference each other and to avoid a leak, I use the [Weak] attribute.
However, when used with FastMM, I get the "FastMM has detected a FreeMem call after FastMM was uninstalled" error message upon closing the application.
This happens with the latest FastMM GIT content taken from this repository.

Looking around here, I saw issues #41 and #18 which both refer to RSP-16796 : https://quality.embarcadero.com/browse/RSP-16796

However, that one tells me that it was fixed by Tokyo 10.2.2 and so I believe I should not be encountering it.

Have I stumbled onto something new? If yes, is this really an issue in FastMM itself?
To me it looks like it is not and rather a new issue in Tokyo. What do you think?

As I cannot upload zip files on GitHub, I have put the content here:
https://gist.github.com/obones/abbe67b58526d6decb425de202b37aef

Many thanks for your comments.

Update #BACKUP

GExperts uses a "#BACKUP" comment to indicate the names of files that should be included when backing up a project. FastMM4Messages.pas is already included in such a comment. I suggest including FastMM_OSXUtil.pas, FastMM4DataCollector.pas and FastMM4LockFreeStack.pas in such comments too.

{#BACKUP FastMM_OSXUtil.pas}
{#BACKUP FastMM4DataCollector.pas}
{#BACKUP FastMM4LockFreeStack.pas}

64bit Delphi 10.3 does not compile

Some new functions have been added to the vmt for the 64bit compile. system.pas just calls them 'cpp_abi_1' etc. so they're not documented, but fastmm won't compile unless you add them.

I added the following to StandardVirtualMethodNames after 'Destroy':

{$IFDEF WIN64}
,'CPP_ABI_1',
'CPP_ABI_2',
'CPP_ABI_3'
{$ENDIF}

Doesn't compile for OSX 64bit

I am trying to port all my code to OSX 64 bit (per Catalina). The code does't compile for 64bit OSX - mixed asm in RaiseException in FastMM_OSXUtil and Undeclared identifier: 'TRtlCriticalSection' in FastMM4.

Delphi 10.3.2, MAC = Mojave 10.14.5, XCode = 11.1 (11A1027)

Feature request regarding option "AlwaysClearFreedMemory"

Hello all, I have a request for a new feature. Unfortunately I don't know the ins and outs of FastMM4 well enough to implement it myself. It would be nice if someone would pick up this idea.

While the secrecy feature "AlwaysClearFreedMemory" (that overwrites memory with zeroes before it is released) works well, it is not terribly useful because it costs too much CPU power. Unfortunately this feature is controlled by a global $define so it's either on or off all the time.

It would be so much better if this option could simply be switched on and off in code! That way, a programmer could simply enable the feature before executing any sensitive code (password routines etc) and disable it again when secrecy is no longer vital. In order to be thread safe, the controlling variable would need to be an (atomic) counter.

Setlength generates memory leak.

Hi,
This function generates a memory leak:
`
function AsString(const DataPtr:Pointer;const DataLen:Integer):string;
var
Offset : NativeInt;
i : Integer;
begin
if DataPtr=nil then Result := '' else
begin
Offset := NativeInt(DataPtr)-1;
SetLength(Result,DataLen);
for i:=1 to Length(Result) do Result[i] := Chr(PByte(Offset+i)^);
end;
end;

And this is the report:
A memory block has been leaked. The size is: 36

This block was allocated by thread 0x22DC, and the stack trace (return addresses) at the time was:
4077A6 [System][System.@getmem]
40B137 [System][System.@NewUnicodeString]
40C38F [System][System.@UStrSetLength]
92B012 [uCore.pas][uCore][uCore.AsString][2405]
8E32FF [uCrypt.pas][uCrypt][uCrypt.GetNameEntryByNID][544]
8E4DB0 [uCrypt.pas][uCrypt][uCrypt.X509Parse][1174]
8E7E88 [uCrypt.pas][uCrypt][uCrypt.CertToPKI][3187]
8E8031 [uCrypt.pas][uCrypt][uCrypt.PKIFromCert][3217]
C0621B [uSocket.pas][uSocket][uSocket.TExTCPClient.DoOnConnected][2681]
C037AC [uSocket.pas][uSocket][uSocket.TTCPClientThread.InternalConnect][779]
C0390F [uSocket.pas][uSocket][uSocket.TTCPClientThread.Execute][817]

The block is currently used for an object of class: UnicodeString

The allocation number is: 527357
`

UseReleaseStack hangs on D2007

Hello,

when defining UseReleaseStack, an empty program compiled with Delphi 2007 hangs in initialization:

class function TLFStack.PopLink(var chain: TReferencedPtr): PLinkedData;

Same empty program work with XE10.1 Berlin.
Any ideas?
Best regards
Dirk

access violation in GetFrameBasedStackTrace

Hi
I replaced with

procedure GetFrameBasedStackTrace(AReturnAddresses: PNativeUInt;
  AMaxDepth, ASkipFrames: Cardinal);
var
  LStackTop, LStackBottom, LCurrentFrame: NativeUInt;
begin
  {Get the call stack top and current bottom}
  GetStackRange(LStackTop, LStackBottom);
  Dec(LStackTop, 2 * SizeOf(Pointer) );
  {Get the current frame start}
  LCurrentFrame := LStackBottom;
  {Fill the call stack}
  while (AMaxDepth > 0)
    and (LCurrentFrame >= LStackBottom)
    and (LCurrentFrame <= LStackTop) do
  begin

MemTest.zip

UsageTracker: Possible demo issue with integer division.

LIndTop := (Cardinal(LMBI.BaseAddress) + Cardinal(LMBI.RegionSize)) div 65536;

When using this demo with the Default MM of Delphi 10.2, (Removed the FasMM from the uses clauses), I noticed that the logic for calculating total allocated pages is using integer division. In my use case, the LMBI.RegionSize is NOT an interval of PageSize, so the integer division "misses" the last page, causing it to sit in an infinite loop.

Changing that line to

LIndTop := Ceil((Cardinal(LMBI.BaseAddress) + Cardinal(LMBI.RegionSize)) / 65536);

And adding Math to the uses clause fixed my issue.

Unable to compile FastMM4BCB using Tokyo 10.2.1

Compiling FastMM4BCB using latest bcc32 ends with
[bcc32 Error] typeinfo.h(154): E2367 Can't inherit RTTI class from non-RTTI base 'exception'
class bad_cast : public std::exception {};
[bcc32 Error] typeinfo.h(155): E2367 Can't inherit RTTI class from non-RTTI base 'exception'
class bad_typeid : public std::exception {};
plus some more errors perhaps as a consequense of these two.

What I have found out so far is that in FastMM4BCB rtti is disabled (-RT- )and then enabled (-RT) in typeinfo.h after the exception header is included.

Are there any known solutions for this ?

Regards
PerOle

Cannot compile with Delphi 2007 if define UseReleaseStack

In Delphi 2007 the NativeInt type is 8 byte, then cannot compile FastMM4LockFreeStack unit.
You need redefine nativeint to 4byte.

{$if CompilerVersion<=18.5}
type
NativeInt = Integer;
{$ifend}

https://helloacm.com/integer-data-types-in-delphi/

"It is also worth to note that, the NativeInt (signed) and NativeUInt (unsigned) are two integer types that are platform dependent. They are supposed to be the size of the pointers e.g. On 32-bit, the pointer size is 4 bytes and on 64-bit the pointer size is 8 bytes. However, the implementation of NativeInt and NativeUInt are buggy until Delphi 2009. The sizes are incorrectly set before D2009."

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.