Comments (19)
Original comment by `` on 18 Jan 2012 at 9:46
from gperftools.
One thing we could do is cache module information -- either the module name, or
ideally, the full path to the module (I don't know if that's available via
LoadLibraryExW). The idea is that if we see once that a library doesn't have
any
libc symbols in it, we can mark that down so we ignore it at subsequent loads
and
unloads.
The current LoadLibraryExW was written under the assumption it would be called
rarely. Clearly that's not always true, and in the situations it's called
frequently, it's the same module being loaded and unloaded again.
(btw, thanks for tracking down what the problem is! I would have been stumped
otherwise.)
Would this caching idea work for you?
Original comment by [email protected]
on 7 Jan 2010 at 2:03
- Changed state: Accepted
- Added labels: Priority-Medium, Type-Defect
from gperftools.
I discovered that tcmalloc was causing GL SwapBuffers to take 4.17ms instead
of 0.17ms with the default allocator, and not expecting it to need to allocate
any memory I put a breakpoint at swap buffers then tcmalloc to discover the
module request/free issue.
I'm looking into the different options to address the performance issue.
Original comment by [email protected]
on 7 Jan 2010 at 7:58
from gperftools.
Try the following version of patch_functions.cc. I've changed it to only try
to
patch/unpatch the module in question, when it gets the LoadLibrary/FreeLibrary
windows
calls. I also keep a cache of modules that do not have any libc info, so it's
fast to
load those after the first time.
All unittests pass, but I don't have good test that load and unload lots of
windows
modules, so no promises. But try it out and let me know how it works.
Original comment by [email protected]
on 9 Jan 2010 at 3:47
- Changed state: Started
Attachments:
from gperftools.
There's some discussion of this patch going on in issue 201. It looks like
it's not
quite ready to try yet. I hope to have a new candidate patch (or rather, a new
version of patch_functions.cc) up shortly.
Original comment by [email protected]
on 11 Jan 2010 at 3:56
from gperftools.
I did test out your patch_functions.cc from January 8, and it does fix our
application start up time issue. With the Microsoft malloc the program was
taking 7
seconds, with the 1.4 tcmalloc it was 11 to 12 with the subversion tcmalloc and
above
file it was back to 6.21 seconds. The caching is making a big difference.
As far as OPENGL32 frame time load/reference/unreference I'm looking into it.
LoadLibrary, FreeLibrary is no longer the bottleneck, but the performance isn't
what
the Microsoft allocator gives. As in 3.7 ms frame time vs 6.6 ms for tcmalloc.
Original comment by [email protected]
on 11 Jan 2010 at 7:02
from gperftools.
Looking over the shoulder of my colleague, I noticed that you don't call
PatchAllModules on a module that is new to the system. The only reason that I
state
this is that DLLs can implicitly load other modules into the system that they
depend
on. Implicitly loaded modules are handled by the OS loader and do not call to
the
LoadLibrary routine. It might be best to call a function that compares the
cache to
the current and only patch those functions that need to be patched.
Ultimately, I think this problem should be addressed by Microsoft. I am going
to try
and submit some information detailing this problem, but I'm pretty sure that
this
problem will not be addressed on their side. Loading and unloading the same
library
just to call a function seems like a bad idea on their part, but I am sure they
had
their reasons for doing what they did. There are still valid reasons for
having the
cache though, as we did notice an impact on application initialization time
without
the cache in place.
Original comment by [email protected]
on 11 Jan 2010 at 7:02
from gperftools.
On second thought after I look at both our window make current and swap times
the
total is compariable to the Microsoft allocator. Sometimes when starting the
frame
time is greater in the 2D vs 3D window, and sometimes it switches, I shouldn't
have
looked at just one. Ignore the above comment about a possible performance
issue for
the frame time.
Original comment by [email protected]
on 11 Jan 2010 at 10:23
from gperftools.
Ryan's comment is right on. Here's a new version of patch_functions.cc, that
should
work better. Again, it passes all the unittests, but I don't have good
module-loading
and -unloading test cases to test it out on, so your feedback is appreciated.
Original comment by [email protected]
on 11 Jan 2010 at 10:39
from gperftools.
Let me try that attaching again...
Original comment by [email protected]
on 11 Jan 2010 at 10:49
Attachments:
from gperftools.
Over in issue 201, there's been some reports of deadlock in the current
patch_functions.cc, when multiple threads are loading/unloading libraries at
the same
time. Here's a new version of patch_functions.cc, that I think addresses that.
When
you get a chance, can you try this patch and see how it works for you?
Original comment by [email protected]
on 13 Jan 2010 at 10:12
from gperftools.
Boy, what is it with me and attaching files? Trying again...
Original comment by [email protected]
on 13 Jan 2010 at 10:13
Attachments:
from gperftools.
01-08-2009 update, startup 6 to 7 seconds, current+swap 2d&3d (from notes) =
5.71 ms
This was the PatchOneModule, ModuleEntryCopy version.
01-13-2009 update, startup 10 seconds, current+swap 2d&3d = 17.71 ms
Original comment by [email protected]
on 14 Jan 2010 at 4:11
from gperftools.
Can you remind me what the goal timings are?
Unfortunately, the 01-08 update got its speed (partially) by being incorrect.
:-(
I wonder if the optimizations I've put in are working. Can you perhaps add
printfs to
the LoadLibraryExW and FreeLibrary overload methods, to verify that the module
you
keep loading/unloading actually makes it into the nopatch_set? If not, let's
try to
understand why.
Original comment by [email protected]
on 14 Jan 2010 at 4:28
from gperftools.
We have about three times we are looking at. There's startup time, static
screen
update time, and dynamic screen update time. The default visual studio malloc
was
tripping over all the allocations in the dynamic screen update time. Switching
to
tcmalloc addresses the dynamic, but the last released version is doubling the
startup
time and trippling our base (static) screen update. It's slower but more
deterministic.
I can see a problem in free library. It is only using the patching_map which
looks
like it is only when the load library needed to patch something for that
module. I
don't think the reference count will work anyway. The reference counting
doesn't
take into account if the library was already loaded before the first tcmalloc
LoadLibrary, so it sees load ref=1, free ref=0. Or if a loaded library was
linked
against another library and brought it in and the original reference free
library was
called, and it was still loaded.
I have seen a deadlock in some of my test modifications. From what I was
seeing one
of the routines in PatchAllModules is executed with the spinlock held, but was
calling a routine that called LoadLibrary which would find the spinlock held.
In
that case the stack trace didn't show this recursion, I had to step until it
recoursed. On Windows their mutex locks (critical sections) are recursive and
would
solve that recursion deadlock if the data structures wouldn't be corrupted in
that case.
What deadlock was being seen?
Original comment by [email protected]
on 14 Jan 2010 at 6:44
from gperftools.
} I can see a problem in free library. It is only using the patching_map which
looks
} like it is only when the load library needed to patch something for that
module.
Why is that a problem?
Did you manage to run with the printf's inserted, like I mentioned? That would
be
good to see.
} The reference counting doesn't take into account if the library was
} already loaded before the first tcmalloc LoadLibrary
That's true, but it's ok. The reference count is just a small optimization;
without
it FreeLibrary may do a bit of unnecessary work trying to unpatch necessarily,
when
it could have known the unpatch was a noop. In the situation you describe, one
might
see a bit of extra work from time to time along those lines, as well.
However, it shouldn't affect the use-case you've described in your original bug
report, which is why I'm confused.
} What deadlock was being seen?
I document it in patch_functions.cc now. It wasn't a problem with recursive
locking,
it was a problem with lock inversion with respect to some internal windows lock.
Original comment by [email protected]
on 14 Jan 2010 at 6:53
from gperftools.
Early Load exit OPENGL32 handle 5ed00000 time 0.007 ms
Late free exit handle 5ed00000 time 3.825 ms
Early Load exit OPENGL32 handle 5ed00000 time 0.008 ms
Late free exit handle 5ed00000 time 3.820 ms
Early Load exit OPENGL32 handle 5ed00000 time 0.006 ms
Late free exit handle 5ed00000 time 3.752 ms
Early Load exit OPENGL32 handle 5ed00000 time 0.008 ms
Late free exit handle 5ed00000 time 3.925 ms
I included the changes to produce the above output. As you can see the
FreeLibrary
is expensive.
Original comment by [email protected]
on 14 Jan 2010 at 7:31
Attachments:
from gperftools.
timing.patch is empty. Can you try attaching it again?
Original comment by [email protected]
on 15 Jan 2010 at 1:54
from gperftools.
Merging with issue 201, where the discussion seems to have moved.
Original comment by [email protected]
on 18 Jan 2010 at 11:03
- Changed state: Duplicate
from gperftools.
Related Issues (20)
- Unloading/loading can cause crashes and deadlocks HOT 8
- profiling segmentation fault on i686 HOT 11
- CreateToolhelp32Snapshot fails spuriously in an MT application HOT 96
- x86_64 use tcmalloc is too slow HOT 8
- tcmalloc_debug leaks memory HOT 4
- Signal Raised in tcmalloc (fetchfromspans method) HOT 14
- Missing ThreadCache::InitModule() call HOT 6
- OS X Snow Leopard TCMalloc Bus Error HOT 1
- HeapProfilerDump deadlocks on OS X HOT 10
- make check fail while installing google-perftools-1.4 HOT 6
- OSX, linker terminated with signal 6 HOT 3
- Failure to compile on CentOS HOT 3
- version 1.5 fails 3 tests in FreeBSD HOT 13
- Radically different CPU profiling results for identical code after upgrade from 1.4 to 1.5 HOT 16
- compilation on SLES11/ppc64 can't find program counter (PC) HOT 7
- Build fails when code has been passed through git HOT 2
- on SLES11/ppc64 profile is not recognized by pprof HOT 12
- perftools-1.5: make fails on debian-5.0 amd64 HOT 10
- error: 'ucontext_t' was not declared in this scope HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gperftools.