GithubHelp home page GithubHelp logo

Comments (19)

GoogleCodeExporter avatar GoogleCodeExporter commented on August 28, 2024

Original comment by `` on 18 Jan 2012 at 9:46

from gperftools.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 28, 2024
One thing we could do is cache module information -- either the module name, or 
ideally, the full path to the module (I don't know if that's available via 
LoadLibraryExW).  The idea is that if we see once that a library doesn't have 
any 
libc symbols in it, we can mark that down so we ignore it at subsequent loads 
and 
unloads.

The current LoadLibraryExW was written under the assumption it would be called 
rarely.  Clearly that's not always true, and in the situations it's called 
frequently, it's the same module being loaded and unloaded again.

(btw, thanks for tracking down what the problem is!  I would have been stumped 
otherwise.)

Would this caching idea work for you?

Original comment by [email protected] on 7 Jan 2010 at 2:03

  • Changed state: Accepted
  • Added labels: Priority-Medium, Type-Defect

from gperftools.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 28, 2024
I discovered that tcmalloc was causing GL SwapBuffers to take 4.17ms instead
of 0.17ms with the default allocator, and not expecting it to need to allocate
any memory I put a breakpoint at swap buffers then tcmalloc to discover the
module request/free issue.

I'm looking into the different options to address the performance issue.

Original comment by [email protected] on 7 Jan 2010 at 7:58

from gperftools.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 28, 2024
Try the following version of patch_functions.cc.  I've changed it to only try 
to 
patch/unpatch the module in question, when it gets the LoadLibrary/FreeLibrary 
windows 
calls.  I also keep a cache of modules that do not have any libc info, so it's 
fast to 
load those after the first time.

All unittests pass, but I don't have good test that load and unload lots of 
windows 
modules, so no promises.  But try it out and let me know how it works.

Original comment by [email protected] on 9 Jan 2010 at 3:47

  • Changed state: Started

Attachments:

from gperftools.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 28, 2024
There's some discussion of this patch going on in issue 201.  It looks like 
it's not 
quite ready to try yet.  I hope to have a new candidate patch (or rather, a new 
version of patch_functions.cc) up shortly.

Original comment by [email protected] on 11 Jan 2010 at 3:56

from gperftools.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 28, 2024
I did test out your patch_functions.cc from January 8, and it does fix our
application start up time issue.  With the Microsoft malloc the program was 
taking 7
seconds, with the 1.4 tcmalloc it was 11 to 12 with the subversion tcmalloc and 
above
file it was back to 6.21 seconds.  The caching is making a big difference.

As far as OPENGL32 frame time load/reference/unreference I'm looking into it. 
LoadLibrary, FreeLibrary is no longer the bottleneck, but the performance isn't 
what
the Microsoft allocator gives.  As in 3.7 ms frame time vs 6.6 ms for tcmalloc.

Original comment by [email protected] on 11 Jan 2010 at 7:02

from gperftools.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 28, 2024
Looking over the shoulder of my colleague, I noticed that you don't call
PatchAllModules on a module that is new to the system.  The only reason that I 
state
this is that DLLs can implicitly load other modules into the system that they 
depend
on.  Implicitly loaded modules are handled by the OS loader and do not call to 
the
LoadLibrary routine.  It might be best to call a function that compares the 
cache to
the current and only patch those functions that need to be patched.

Ultimately, I think this problem should be addressed by Microsoft.  I am going 
to try
and submit some information detailing this problem, but I'm pretty sure that 
this
problem will not be addressed on their side.  Loading and unloading the same 
library
just to call a function seems like a bad idea on their part, but I am sure they 
had
their reasons for doing what they did.  There are still valid reasons for 
having the
cache though, as we did notice an impact on application initialization time 
without
the cache in place.

Original comment by [email protected] on 11 Jan 2010 at 7:02

from gperftools.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 28, 2024
On second thought after I look at both our window make current and swap times 
the
total is compariable to the Microsoft allocator.  Sometimes when starting the 
frame
time is greater in the 2D vs 3D window, and sometimes it switches, I shouldn't 
have
looked at just one.  Ignore the above comment about a possible performance 
issue for
the frame time.

Original comment by [email protected] on 11 Jan 2010 at 10:23

from gperftools.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 28, 2024
Ryan's comment is right on.  Here's a new version of patch_functions.cc, that 
should 
work better.  Again, it passes all the unittests, but I don't have good 
module-loading 
and -unloading test cases to test it out on, so your feedback is appreciated.

Original comment by [email protected] on 11 Jan 2010 at 10:39

from gperftools.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 28, 2024
Let me try that attaching again...

Original comment by [email protected] on 11 Jan 2010 at 10:49

Attachments:

from gperftools.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 28, 2024
Over in issue 201, there's been some reports of deadlock in the current 
patch_functions.cc, when multiple threads are loading/unloading libraries at 
the same 
time.  Here's a new version of patch_functions.cc, that I think addresses that. 
 When 
you get a chance, can you try this patch and see how it works for you?

Original comment by [email protected] on 13 Jan 2010 at 10:12

from gperftools.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 28, 2024
Boy, what is it with me and attaching files?  Trying again...

Original comment by [email protected] on 13 Jan 2010 at 10:13

Attachments:

from gperftools.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 28, 2024
01-08-2009 update, startup 6 to 7 seconds, current+swap 2d&3d (from notes) = 
5.71 ms
This was the PatchOneModule, ModuleEntryCopy version.
01-13-2009 update, startup 10 seconds, current+swap 2d&3d = 17.71 ms

Original comment by [email protected] on 14 Jan 2010 at 4:11

from gperftools.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 28, 2024
Can you remind me what the goal timings are?

Unfortunately, the 01-08 update got its speed (partially) by being incorrect. 
:-(

I wonder if the optimizations I've put in are working.  Can you perhaps add 
printfs to 
the LoadLibraryExW and FreeLibrary overload methods, to verify that the module 
you 
keep loading/unloading actually makes it into the nopatch_set?  If not, let's 
try to 
understand why.

Original comment by [email protected] on 14 Jan 2010 at 4:28

from gperftools.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 28, 2024
We have about three times we are looking at.  There's startup time, static 
screen
update time, and dynamic screen update time.  The default visual studio malloc 
was
tripping over all the allocations in the dynamic screen update time.  Switching 
to
tcmalloc addresses the dynamic, but the last released version is doubling the 
startup
time and trippling our base (static) screen update.  It's slower but more 
deterministic.

I can see a problem in free library.  It is only using the patching_map which 
looks
like it is only when the load library needed to patch something for that 
module.  I
don't think the reference count will work anyway.  The reference counting 
doesn't
take into account if the library was already loaded before the first tcmalloc
LoadLibrary, so it sees load ref=1, free ref=0.  Or if a loaded library was 
linked
against another library and brought it in and the original reference free 
library was
called, and it was still loaded.

I have seen a deadlock in some of my test modifications.  From what I was 
seeing one
of the routines in PatchAllModules is executed with the spinlock held, but was
calling a routine that called LoadLibrary which would find the spinlock held.  
In
that case the stack trace didn't show this recursion, I had to step until it
recoursed.  On Windows their mutex locks (critical sections) are recursive and 
would
solve that recursion deadlock if the data structures wouldn't be corrupted in 
that case.

What deadlock was being seen?

Original comment by [email protected] on 14 Jan 2010 at 6:44

from gperftools.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 28, 2024
} I can see a problem in free library.  It is only using the patching_map which 
looks
} like it is only when the load library needed to patch something for that 
module.

Why is that a problem?

Did you manage to run with the printf's inserted, like I mentioned?  That would 
be 
good to see.

} The reference counting doesn't take into account if the library was
} already loaded before the first tcmalloc LoadLibrary

That's true, but it's ok.  The reference count is just a small optimization; 
without 
it FreeLibrary may do a bit of unnecessary work trying to unpatch necessarily, 
when 
it could have known the unpatch was a noop.  In the situation you describe, one 
might 
see a bit of extra work from time to time along those lines, as well.

However, it shouldn't affect the use-case you've described in your original bug 
report, which is why I'm confused.

} What deadlock was being seen?

I document it in patch_functions.cc now.  It wasn't a problem with recursive 
locking, 
it was a problem with lock inversion with respect to some internal windows lock.

Original comment by [email protected] on 14 Jan 2010 at 6:53

from gperftools.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 28, 2024
Early Load exit OPENGL32 handle 5ed00000 time  0.007 ms
Late free exit handle 5ed00000 time  3.825 ms
Early Load exit OPENGL32 handle 5ed00000 time  0.008 ms
Late free exit handle 5ed00000 time  3.820 ms
Early Load exit OPENGL32 handle 5ed00000 time  0.006 ms
Late free exit handle 5ed00000 time  3.752 ms
Early Load exit OPENGL32 handle 5ed00000 time  0.008 ms
Late free exit handle 5ed00000 time  3.925 ms

I included the changes to produce the above output.  As you can see the 
FreeLibrary
is expensive.

Original comment by [email protected] on 14 Jan 2010 at 7:31

Attachments:

from gperftools.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 28, 2024
timing.patch is empty.  Can you try attaching it again?

Original comment by [email protected] on 15 Jan 2010 at 1:54

from gperftools.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 28, 2024
Merging with issue 201, where the discussion seems to have moved.

Original comment by [email protected] on 18 Jan 2010 at 11:03

  • Changed state: Duplicate

from gperftools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.