GithubHelp home page GithubHelp logo

Comments (50)

JuanCaicedo avatar JuanCaicedo commented on June 19, 2024 3

Not at all, thanks for the reminder. I've been prioritizing making a gh-pages version of the website, but that should be up soon and then I'll look into this. I'm going to assign this so I'll remember it

from oboe.js.

badisa avatar badisa commented on June 19, 2024

+1

When I do a regular Http request and take a Heap Snapshot on one of my pages in Chrome I get 20mb, but when I run oboe the snapshot jumps up to 100mb. And that is on a very small JSON object, on a large one I get up to 1000mb.

from oboe.js.

Amberlamps avatar Amberlamps commented on June 19, 2024

+1

Also, awesome issue reporting.

from oboe.js.

goloroden avatar goloroden commented on June 19, 2024

@Amberlamps Thanks :-)

from oboe.js.

goloroden avatar goloroden commented on June 19, 2024

BTW… for everyone who has the same problems that we had: We used Oboe.js for transferring data using the JSON Lines protocol.

For this very special use case we have written a replacement that does not suffer from the memory issue. If you have this very special use case, too, you might be interested in our modules json-lines and json-lines-client.

from oboe.js.

badisa avatar badisa commented on June 19, 2024

So @ar7em figured this out for me. But when I pushed the node data to an array the node itself wasn't being garbage collected, even with return oboe.drop.

However by doing the following the memory leak disappeared.

.node('{scores info}', function(node){
      node = JSON.stringify(node)
      node = JSON.parse(node)
      resultsData.push(node);
      return oboe.drop;
})

Not entirely sure why this changes anything, but it reduces memory use by up to 300mb.

@jimhigson Is there any plan to fix this?

from oboe.js.

JuanCaicedo avatar JuanCaicedo commented on June 19, 2024

@badisa Here's my thought: I think it might be because your array stores a reference to the node, which keeps the node from being garbage collected. Doing this node = JSON.parse(node) means that node is now a reference to a new object created from the original node, which would allow the original one to be garbage collected. Not entirely sure why that would be less memory intensive though.

from oboe.js.

egervari avatar egervari commented on June 19, 2024

I am experiencing the same issue. I have a 400MB JSON file. It contains an array of arrays. Each sub-array contains anywhere from 1 to 500 objects. There are probably 12000-13000 objects in total if you hypothetically flattened the arrays.

Depending on how I have my server chunk it, I can read in around 4300 of those 13,000 objects before I get an "Aw, Snap" message within Chrome. And it does this because of the same problem. I am using the drop return result, just as above, but the memory is not getting garbage collected.

This is a very serious bug. Is it fixed?

from oboe.js.

JuanCaicedo avatar JuanCaicedo commented on June 19, 2024

@egervari Have you tried the JSON stringify/parse trick that @badisa did? It would be interesting to see if that works

from oboe.js.

egervari avatar egervari commented on June 19, 2024

I will try it now and report back. However - like you - I don't see how this would fix the problem actually. It also feels hacky. I'm kind of concerned about it and tempted to create a very sub-standard, low-level parser that simply nulls and deletes the values manually, just to see what it actually does and to see if it's different than what oboe is doing. If that also doesn't work - or if that's what oboe is doing - then maybe there's a very bad bug in v8 itself.

from oboe.js.

egervari avatar egervari commented on June 19, 2024

Okay, I tried the above solution and it still didn't solve it :( So, so sad.

from oboe.js.

egervari avatar egervari commented on June 19, 2024

Does the format of the JSON file have anything to do with things being released? For example, I am sending json such as:

[
   [{..},{..},{..}],
   ...
   [{..},{..},{..}]
]

Should I instead send it as:

{
   "partition1:": [{..},{..},{..}],
   "partition2:": [{..},{..},{..}],
   ....
   "partitionN:": [{..},{..},{..}]
}

?

from oboe.js.

badisa avatar badisa commented on June 19, 2024

@egervari What exactly are you doing with the nodes? I had no memory leak until I was pushing the nodes into a scoped Angular array.

from oboe.js.

egervari avatar egervari commented on June 19, 2024

Each node (in my case) is an array since I am trying to process an array of arrays. For the sake of clarity, let's call these partitions. What I'd like to do is put all of the objects in this partition into PouchDB. However, even if I ignore pouch altogether, and simply do a console.log(partition[0].whatever), it'll crash after 6000+ objects are processed. That's just a little more than 1/3 of the objects to process.

When it's processing the nodes, at first, it goes really fast. Then it just slows down and keeps going slower until the "Ah Snap" message shows up. Essentially, my oboe code is doing nothing:

            oboe(url, {
                cached: false // tried it with and without this option
            }).node('![*]', function(documents) {
                console.log(documents[0].documentType);

                return oboe.drop;
            }).done(function(leftOver) {
                console.log(leftOver);
            });

from oboe.js.

badisa avatar badisa commented on June 19, 2024

@egervari So reading more closely it sounds like you are managing to read in about 250mb into your browser before it quits. That is quite a bit of memory to use, so there is a possibility that it is dying due to that and not because of a memory leak. Have you done what the first poster did? With the heap snapshot?

Also did you try:

            .node('![*]', function(documents) {
                documents = JSON.stringify(documents);
                documents = JSON.parse(documents)
                console.log(documents[0].documentType);

                return oboe.drop;
            })

from oboe.js.

egervari avatar egervari commented on June 19, 2024

Yes, I tried exactly that :) It did not work.

I would say though that I am not getting 250MB on each pass... each 'documents' variable probably has 13MB worth of data on average, although I've tried smaller chunks too.

But here's the thing - I have tried streaming and parsing 1 document at a time too, and it still bombs - it can just process more documents before it bombs (perhaps 3000 more, but there is still so many left that it didn't get to).

I didn't get a graph of the heap, although I saw that Buffer % in chrome slowly go up to 100% and then crashed.

from oboe.js.

egervari avatar egervari commented on June 19, 2024

Okay, I saw the heap graph and it was the exact same - the graph you can see in chrome.

from oboe.js.

magic890 avatar magic890 commented on June 19, 2024

Any news? @egervari have you solved your issue? If yes, how?

from oboe.js.

JuanCaicedo avatar JuanCaicedo commented on June 19, 2024

In my mind, this is a pretty important issue because Oboe makes the claim on the website that it makes it able to handle JSON that is bigger than the available memory. This is an awesome claim, and I think it'll be totally true after this bug is handled.

I'll take a look and see what I can figure out!

from oboe.js.

egervari avatar egervari commented on June 19, 2024

@magic890 No, I never solved it, and I gave up on it. I implemented my own from scratch - was just easier for me - and I got it to work that way.

from oboe.js.

JuanCaicedo avatar JuanCaicedo commented on June 19, 2024

@egervari Do you have it up on github, or would you be willing to? I'd love to compare what you have an what Oboe does to try to figure out where this memory leak is.

from oboe.js.

egervari avatar egervari commented on June 19, 2024

@JuanCaicedo Mine is not a framework or anything like that - it is just a something small I put directly into my project. It is not an all-encompassing solution or anything like that. It's not a personal project regardless, so I'm reluctant to share it. Honestly, I just did the simplest possible thing - I had the server send the json in chunks, converted it to a real json object when it got to the client, sent those objects to pouchdb, and then removed them from memory with null. It works for data up to 1.8 gig on chrome, firefox and safari.

A good tip is not to deal with 1000+ non-trivial objects at the same time. That will kill it on small devices. Chunk up the data and stream it and you will be fine. You don't need a framework/library.

from oboe.js.

JuanCaicedo avatar JuanCaicedo commented on June 19, 2024

@egervari I'm trying to get some data to try do recreate your issue and some of the other ones one here. Do you have any tips on acquiring something that size? And then how to reformat it if I have it? Or were you producing your own data?

from oboe.js.

egervari avatar egervari commented on June 19, 2024

I am exporting large json documents intended to be put into PouchDB from a
large ms sql database. Sometimes the objects are quite large, and having
any more than 1000 of them in memory causes heap errors/crashes. I would
say that some of the largest documents have a dozen properties with 5 or 6
levels of collections. I can't give this json data out though - the data
itself is valuable and needs to be protected from non-clients obviously. In
the largest cases, I am sending around 1.8gb of json.

On Sun, Jan 24, 2016 at 6:00 PM, Juan Caicedo [email protected]
wrote:

@egervari https://github.com/egervari I'm trying to get some data to
try do recreate your issue and some of the other ones one here. Do you have
any tips on acquiring something that size? And then how to reformat it if I
have it? Or were you producing your own data?


Reply to this email directly or view it on GitHub
#68 (comment).

from oboe.js.

JuanCaicedo avatar JuanCaicedo commented on June 19, 2024

@egervari totally understand that. I'm going to try with https://github.com/zeMirco/sf-city-lots-json. From what I can tell, it's a JSON document with a property features which is a really big array. Think that might be similar enough to your situation?

from oboe.js.

egervari avatar egervari commented on June 19, 2024

My situation is about 12x worse than that, haha. In my cases, a lot of the
text properties contain html content that contains a lot of text. Each
object/document might have 6 or 8 of those at least. But there's a lot of
arrays containing objects that contain more arrays that contain more
objects, etc. It is a very large object graph.

On Sun, Jan 24, 2016 at 6:10 PM, Juan Caicedo [email protected]
wrote:

@egervari https://github.com/egervari totally understand that. I'm
going to try with https://github.com/zeMirco/sf-city-lots-json. From what
I can tell, it's a JSON document with a property features which is a
really big array. Think that might be similar enough to your situation?


Reply to this email directly or view it on GitHub
#68 (comment).

from oboe.js.

JuanCaicedo avatar JuanCaicedo commented on June 19, 2024

I'm working on a repo to reproduce these errors. Right now, as a sanity check, I've been able to establish that oboe.drop does work in at least some scenarios.

I'll have to play around with either the data or the front end code to try to reproduce. Github doesn't let you upload files larger than 100 mb, so I'm going to have to find an alternate way of hosting them if it comes down to needing bigger data. Any ideas welcome!

from oboe.js.

egervari avatar egervari commented on June 19, 2024

In my case, I have a Java application using Spring running on Tomcat that
is exposign the json as a rest-based api.

On Sun, Jan 24, 2016 at 9:56 PM, Juan Caicedo [email protected]
wrote:

I'm working on a repo https://github.com/JuanCaicedo/oboe-memory-bug.git
to reproduce these errors. Right now, as a sanity check, I've been able to
establish that oboe.drop does work in at least some scenarios.

I'll have to play around with either the data or the front end code to try
to reproduce. Github doesn't let you upload files larger than 100 mb, so
I'm going to have to find an alternate way of hosting them if it comes down
to needing bigger data. Any ideas welcome!


Reply to this email directly or view it on GitHub
#68 (comment).

from oboe.js.

JuanCaicedo avatar JuanCaicedo commented on June 19, 2024

Hmm, setting up Spring and Tomcat is a whole extra level of complexity (I can probably get it though, I come from a Java background), so I think I'll try to repro @goloroden's case first.

from oboe.js.

lukeasrodgers avatar lukeasrodgers commented on June 19, 2024

For what it's worth, I have some code using oboe.js to parse about 750 megs of JSON with nodejs. Without return oboe.drop; memory usage steadily climbs until it crashes with

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Abort trap: 6

When I add return oboe.drop; everything works fine, and the process runs to completion.

from oboe.js.

JuanCaicedo avatar JuanCaicedo commented on June 19, 2024

@lukeasrodgers Awesome, I got the same results. I suspect there might be something else going on that's causing the problem.

@Amberlamps @badisa did the two of you also have a problem with this? If so, could you share any other information that could help pin down what's happening? (i.e. what type of server you're running, how you're sending the data, and if possible what your oboe client-side looks like). Thanks!

from oboe.js.

JuanCaicedo avatar JuanCaicedo commented on June 19, 2024

@egervari Do you still have a version of your oboe code you could try something on? If so, on this line of code

}).node('![*]', function(documents) {

Could you change ![*] to !.[*] and see if that fixes your problem?

from oboe.js.

JuanCaicedo avatar JuanCaicedo commented on June 19, 2024

@goloroden I have a suspicion your error might be because of the event notation you're using on the client side.

If you check out that test repo I made, there's a branch named goloroden where I change

.node('!.features[*]', function(feature) {

to

.on('node:!.features[*]',function(feature) {

Doing that causes Chrome to run out of memory and display an error message.

By the way, once you start the server off that repo, be sure to go to http://localhost:3000/home?drop=true, which causes the client side to use oboe.drop

from oboe.js.

JuanCaicedo avatar JuanCaicedo commented on June 19, 2024

I can't find a way to recreate this, so I'm going to close the issue. I'll wait until Feb 28 in case anyone in the thread can help me reproduce the bug 😃

@goloroden @badisa @Amberlamps @egervari @magic890 @lukeasrodgers

from oboe.js.

pavelerofeev avatar pavelerofeev commented on June 19, 2024

@JuanCaicedo For me the issue reproduces on slow connections, you can use recent Chrome to set network throttling at 4 Mb/s. Currently I work around it with stringily-parse mentioned above.

from oboe.js.

goloroden avatar goloroden commented on June 19, 2024

Sorry for the late answer, I'm currently investigating a few ideas and will report back… thanks so far for your help :-)

from oboe.js.

goloroden avatar goloroden commented on June 19, 2024

Yay, we have a result :-)))

When you run the old demo code as shown in the original post, the memory leak is still there:

1

When you change the line

}).on('node:!.*', function (event) {

to

}).node('!.*', function (event) {

the memory leak is gone:

2

This is really good news, as this not only means that there is a workaround, but it also means that oboe.drop actually works - but only if you directly use the node function.

So the essential question is: What is (and why is there) the difference between on and node?

from oboe.js.

JuanCaicedo avatar JuanCaicedo commented on June 19, 2024

Oh wow, that's really interesting and definitely a bug!

Would it be possible for you to share the code you used to profile this? Ideally if I could clone a repo and be able to reproduce the same results as you, I could look into this 😃

from oboe.js.

goloroden avatar goloroden commented on June 19, 2024

Oh, it's just the code of the original post to this issue.

What I did is the following:

  • Run docker run -d --name wolkenkit-profiling -p 80:80 -p 2003:2003 -p 8125:8125/udp hopsoft/graphite-statsd to get a statsd/Graphite container up and running.
  • Save the server file as server.js and the client file as client.js.
  • Run node server.js.
  • Run node client.js.
  • Open a browser and point it to http://192.168.9.130 (or whatever the IP of your Docker host is) to get to Graphite.
  • Switch to the dashboard view and add the graphs from stats.gauges.client.js.Schneehase.local.memory.* to it (and merge them to a single view if you want).
  • Wait for a few hours… ;-)
  • After that, in the client, change the questionable line of code, and run it again.

from oboe.js.

badisa avatar badisa commented on June 19, 2024

I am experiencing the memory leak with

.node('{scores info}', function(node){
                resultsData.addData(node);
                $scope.$evalAsync();
                return oboe.drop;
            })

Unless I do the stringify/parse so it seems like it is not limited to just .on(), but my case might be due to the {scores info} portion of my code.

from oboe.js.

goloroden avatar goloroden commented on June 19, 2024

@badisa I guess it's because of your line

resultsData.addData(node);

where you explicitly keep a reference to the node you just received. Dropping it then of course does not have an effect.

from oboe.js.

goloroden avatar goloroden commented on June 19, 2024

@JuanCaicedo Any insights on this?

from oboe.js.

JuanCaicedo avatar JuanCaicedo commented on June 19, 2024

Haven't been able to look at it, I'm hoping for some time on Saturday 😃

from oboe.js.

goloroden avatar goloroden commented on June 19, 2024

Don't want to be pushy, but I am curious: Any news on this?

from oboe.js.

will-l-h avatar will-l-h commented on June 19, 2024

I'm also curious if there has been any progress on this.

from oboe.js.

cnyzgkn avatar cnyzgkn commented on June 19, 2024

any progress on it? I still can't load big data even using }).node('!.*', function (event) {

from oboe.js.

mweimer avatar mweimer commented on June 19, 2024

I've also been experiencing this memory leak, however, I have found that making a copy of the node seems to provide a workaround:

.node('!.*', node => {
   const copy = JSON.parse(JSON.stringify(node));
   events.push(copy);
   return oboe.drop;
})

from oboe.js.

JuanCaicedo avatar JuanCaicedo commented on June 19, 2024

I thought I'd share an update. I'm currently the only one actively working on the project, and I've been dedicating most of my open source time towards a workshop I'm giving. I expect I should have more time to dedicate to oboe by the end of next week.

My first priority after that is to improve how the tests and build processes works. Right now these things make it fairly challenging for me to work on the source code and I think they'll make the issue easier to diagnose.

If anyone is interested in helping me do that, especially to get to know the codebase to narrow down where this might be, I would love the help 😄

from oboe.js.

Parboard avatar Parboard commented on June 19, 2024

@JuanCaicedo Any updates here?

from oboe.js.

JuanCaicedo avatar JuanCaicedo commented on June 19, 2024

Please refer to #137 (comment)

from oboe.js.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.