GithubHelp home page GithubHelp logo

Lots of tempfiles being left around about diffy HOT 7 CLOSED

samg avatar samg commented on August 21, 2024
Lots of tempfiles being left around

from diffy.

Comments (7)

jonkelleyatrackspace avatar jonkelleyatrackspace commented on August 21, 2024

From ruby 1.9.3 documentation: http://www.ruby-doc.org/stdlib-1.9.3/libdoc/tempfile/rdoc/Tempfile.html From ruby docs, they suggest calling unlink in an ensure block.

e,g,

file = Tempfile.new('my_temp')
begin
   ...do something with file...
ensure
   file.close
   file.unlink   # deletes the temp file so your /tmp directory doesn't fill up
end

I could be wrong, but from diffy source, it seems like diffy has an explicit design to not unlink / remove old resources. I do not see an .unlink method called anywhere within the source tree.
I suggest not using this in production until this is fixed. :( If you have to, better setup a cronjob.

Diffy source (an unlink method needs to be added):

def tempfile(string)
  t = Tempfile.new('diffy')
  # ensure tempfiles aren't unlinked when GC runs by maintaining a
  # reference to them.
  @tempfiles ||=[]
  @tempfiles.push(t)
  t.print(string)
  t.flush
  t.close
  t.path
end

Unlink comes from the POSIX API. unlink() deletes a name from the filesystem. If that name was the last link to a file and no processes have the file open the file is deleted and the space it was using is made available for reuse.

If the name was the last link to a file but any processes still have the file open the file will remain in existence until the last file descriptor referring to it is closed.

If the name referred to a socket, fifo or device the name for it is removed but processes which have the object open may continue to use it.

http://www.tutorialspoint.com/unix_system_calls/unlink.htm

from diffy.

jonkelleyatrackspace avatar jonkelleyatrackspace commented on August 21, 2024

There's a class out there that makes it better. Called Better::Tempfile.
http://better.rubyforge.org/classes/Better/Tempfile.html

Seems ruby tempfile stdlib has issues.

Ruby 1.8’s version can generate “weird” path names that can confuse certain command line tools such as Curl. Better::Tempfile is based on Ruby 1.9’s version and generates saner filenames.
Ruby 1.8’s version has a bug which makes unlink-before-close (as described below) unusable: it raises an an exception when close is called if the tempfile was unlinked before.
Ruby 1.9.1’s version closes the file when unlink is called. This makes unlink-before-close unusable.
Ruby’s bundled version deletes the temporary file in its finalizer, even when unlink was called before. As a result it may potentially delete other Ruby processes’ temp files when it’s not supposed to.

from diffy.

samg avatar samg commented on August 21, 2024

I initially wrote Diffy in the 1.8 days when tempfiles were automatically unlinked by a finalizer, and it seemed somewhat dangerous to attempt to unlink them explicitly. Seems like this behavior has changed (presumably for the better in more recent Rubies) and that Diffy should be updated to explicitly remove the tempfiles it creates after it no longer needs them.

I'll look into making this change, though I'd also be happy to accept a pull request.

from diffy.

dkowis avatar dkowis commented on August 21, 2024

Interestingly enough @samg, we're running in 1.8 still, and it's still happening. It may be because we're using passenger, and this stuff is part of a display rendering, and it's LOTS of diffs.

It probably fits somewhat outside the original use case you had for this thing :)

from diffy.

samg avatar samg commented on August 21, 2024

yeah, there's definitely some weirdness around how ruby handles Tempfiles.

In theory the files should be cleaned up when GC runs on the tempfile objects (i.e. after the Diffy::Diff objects are de-referenced and GC'd). It's possible that due to the nature of your app, these diffs are staying in memory and thus the unlink is never occurring (or only occurring after it's caused a system problem). It's also possible that there's some other reason that the "magic" unlinking is failing (e.g. processes being shutdown or killed before the finalizers can run).

I think a better approach would be for Diffy to clean up the files explicitly (as you suggest) after it's generated the diff content, being careful to catch any errors that result from files that have already been removed.

I wrote Diffy a while back for a small personal project and never used it at any significant scale. It's gained quite a bit of popularity since then, so I'd like to make sure it's a reasonably responsible citizen in others' production environments. ;-)

from diffy.

samg avatar samg commented on August 21, 2024

I've just pushed version 3.0.3 which I believe will prevent the issue you experienced from occurring (it's no longer necessary to wait for GC for tempfiles to be deleted, rather they'll be unlinked as soon as the diff has been generated).

Let me know if you experience any other issues, or if this doesn't resolve the problem, and thanks for reporting this.

from diffy.

dkowis avatar dkowis commented on August 21, 2024

@jonkelleyatrackspace Woot! Lets get this badboy into prod.

from diffy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.