Comments (50)
Not at all, thanks for the reminder. I've been prioritizing making a gh-pages version of the website, but that should be up soon and then I'll look into this. I'm going to assign this so I'll remember it
from oboe.js.
+1
When I do a regular Http request and take a Heap Snapshot on one of my pages in Chrome I get 20mb, but when I run oboe the snapshot jumps up to 100mb. And that is on a very small JSON object, on a large one I get up to 1000mb.
from oboe.js.
+1
Also, awesome issue reporting.
from oboe.js.
@Amberlamps Thanks :-)
from oboe.js.
BTW… for everyone who has the same problems that we had: We used Oboe.js for transferring data using the JSON Lines protocol.
For this very special use case we have written a replacement that does not suffer from the memory issue. If you have this very special use case, too, you might be interested in our modules json-lines and json-lines-client.
from oboe.js.
So @ar7em figured this out for me. But when I pushed the node data to an array the node itself wasn't being garbage collected, even with return oboe.drop
.
However by doing the following the memory leak disappeared.
.node('{scores info}', function(node){
node = JSON.stringify(node)
node = JSON.parse(node)
resultsData.push(node);
return oboe.drop;
})
Not entirely sure why this changes anything, but it reduces memory use by up to 300mb.
@jimhigson Is there any plan to fix this?
from oboe.js.
@badisa Here's my thought: I think it might be because your array stores a reference to the node, which keeps the node from being garbage collected. Doing this node = JSON.parse(node)
means that node is now a reference to a new object created from the original node, which would allow the original one to be garbage collected. Not entirely sure why that would be less memory intensive though.
from oboe.js.
I am experiencing the same issue. I have a 400MB JSON file. It contains an array of arrays. Each sub-array contains anywhere from 1 to 500 objects. There are probably 12000-13000 objects in total if you hypothetically flattened the arrays.
Depending on how I have my server chunk it, I can read in around 4300 of those 13,000 objects before I get an "Aw, Snap" message within Chrome. And it does this because of the same problem. I am using the drop return result, just as above, but the memory is not getting garbage collected.
This is a very serious bug. Is it fixed?
from oboe.js.
@egervari Have you tried the JSON stringify/parse trick that @badisa did? It would be interesting to see if that works
from oboe.js.
I will try it now and report back. However - like you - I don't see how this would fix the problem actually. It also feels hacky. I'm kind of concerned about it and tempted to create a very sub-standard, low-level parser that simply nulls and deletes the values manually, just to see what it actually does and to see if it's different than what oboe is doing. If that also doesn't work - or if that's what oboe is doing - then maybe there's a very bad bug in v8 itself.
from oboe.js.
Okay, I tried the above solution and it still didn't solve it :( So, so sad.
from oboe.js.
Does the format of the JSON file have anything to do with things being released? For example, I am sending json such as:
[
[{..},{..},{..}],
...
[{..},{..},{..}]
]
Should I instead send it as:
{
"partition1:": [{..},{..},{..}],
"partition2:": [{..},{..},{..}],
....
"partitionN:": [{..},{..},{..}]
}
?
from oboe.js.
@egervari What exactly are you doing with the nodes? I had no memory leak until I was pushing the nodes into a scoped Angular array.
from oboe.js.
Each node (in my case) is an array since I am trying to process an array of arrays. For the sake of clarity, let's call these partitions. What I'd like to do is put all of the objects in this partition into PouchDB. However, even if I ignore pouch altogether, and simply do a console.log(partition[0].whatever), it'll crash after 6000+ objects are processed. That's just a little more than 1/3 of the objects to process.
When it's processing the nodes, at first, it goes really fast. Then it just slows down and keeps going slower until the "Ah Snap" message shows up. Essentially, my oboe code is doing nothing:
oboe(url, {
cached: false // tried it with and without this option
}).node('![*]', function(documents) {
console.log(documents[0].documentType);
return oboe.drop;
}).done(function(leftOver) {
console.log(leftOver);
});
from oboe.js.
@egervari So reading more closely it sounds like you are managing to read in about 250mb into your browser before it quits. That is quite a bit of memory to use, so there is a possibility that it is dying due to that and not because of a memory leak. Have you done what the first poster did? With the heap snapshot?
Also did you try:
.node('![*]', function(documents) {
documents = JSON.stringify(documents);
documents = JSON.parse(documents)
console.log(documents[0].documentType);
return oboe.drop;
})
from oboe.js.
Yes, I tried exactly that :) It did not work.
I would say though that I am not getting 250MB on each pass... each 'documents' variable probably has 13MB worth of data on average, although I've tried smaller chunks too.
But here's the thing - I have tried streaming and parsing 1 document at a time too, and it still bombs - it can just process more documents before it bombs (perhaps 3000 more, but there is still so many left that it didn't get to).
I didn't get a graph of the heap, although I saw that Buffer % in chrome slowly go up to 100% and then crashed.
from oboe.js.
Okay, I saw the heap graph and it was the exact same - the graph you can see in chrome.
from oboe.js.
Any news? @egervari have you solved your issue? If yes, how?
from oboe.js.
In my mind, this is a pretty important issue because Oboe makes the claim on the website that it makes it able to handle JSON that is bigger than the available memory. This is an awesome claim, and I think it'll be totally true after this bug is handled.
I'll take a look and see what I can figure out!
from oboe.js.
@magic890 No, I never solved it, and I gave up on it. I implemented my own from scratch - was just easier for me - and I got it to work that way.
from oboe.js.
@egervari Do you have it up on github, or would you be willing to? I'd love to compare what you have an what Oboe does to try to figure out where this memory leak is.
from oboe.js.
@JuanCaicedo Mine is not a framework or anything like that - it is just a something small I put directly into my project. It is not an all-encompassing solution or anything like that. It's not a personal project regardless, so I'm reluctant to share it. Honestly, I just did the simplest possible thing - I had the server send the json in chunks, converted it to a real json object when it got to the client, sent those objects to pouchdb, and then removed them from memory with null. It works for data up to 1.8 gig on chrome, firefox and safari.
A good tip is not to deal with 1000+ non-trivial objects at the same time. That will kill it on small devices. Chunk up the data and stream it and you will be fine. You don't need a framework/library.
from oboe.js.
@egervari I'm trying to get some data to try do recreate your issue and some of the other ones one here. Do you have any tips on acquiring something that size? And then how to reformat it if I have it? Or were you producing your own data?
from oboe.js.
I am exporting large json documents intended to be put into PouchDB from a
large ms sql database. Sometimes the objects are quite large, and having
any more than 1000 of them in memory causes heap errors/crashes. I would
say that some of the largest documents have a dozen properties with 5 or 6
levels of collections. I can't give this json data out though - the data
itself is valuable and needs to be protected from non-clients obviously. In
the largest cases, I am sending around 1.8gb of json.
On Sun, Jan 24, 2016 at 6:00 PM, Juan Caicedo [email protected]
wrote:
@egervari https://github.com/egervari I'm trying to get some data to
try do recreate your issue and some of the other ones one here. Do you have
any tips on acquiring something that size? And then how to reformat it if I
have it? Or were you producing your own data?—
Reply to this email directly or view it on GitHub
#68 (comment).
from oboe.js.
@egervari totally understand that. I'm going to try with https://github.com/zeMirco/sf-city-lots-json. From what I can tell, it's a JSON document with a property features
which is a really big array. Think that might be similar enough to your situation?
from oboe.js.
My situation is about 12x worse than that, haha. In my cases, a lot of the
text properties contain html content that contains a lot of text. Each
object/document might have 6 or 8 of those at least. But there's a lot of
arrays containing objects that contain more arrays that contain more
objects, etc. It is a very large object graph.
On Sun, Jan 24, 2016 at 6:10 PM, Juan Caicedo [email protected]
wrote:
@egervari https://github.com/egervari totally understand that. I'm
going to try with https://github.com/zeMirco/sf-city-lots-json. From what
I can tell, it's a JSON document with a property features which is a
really big array. Think that might be similar enough to your situation?—
Reply to this email directly or view it on GitHub
#68 (comment).
from oboe.js.
I'm working on a repo to reproduce these errors. Right now, as a sanity check, I've been able to establish that oboe.drop
does work in at least some scenarios.
I'll have to play around with either the data or the front end code to try to reproduce. Github doesn't let you upload files larger than 100 mb, so I'm going to have to find an alternate way of hosting them if it comes down to needing bigger data. Any ideas welcome!
from oboe.js.
In my case, I have a Java application using Spring running on Tomcat that
is exposign the json as a rest-based api.
On Sun, Jan 24, 2016 at 9:56 PM, Juan Caicedo [email protected]
wrote:
I'm working on a repo https://github.com/JuanCaicedo/oboe-memory-bug.git
to reproduce these errors. Right now, as a sanity check, I've been able to
establish that oboe.drop does work in at least some scenarios.I'll have to play around with either the data or the front end code to try
to reproduce. Github doesn't let you upload files larger than 100 mb, so
I'm going to have to find an alternate way of hosting them if it comes down
to needing bigger data. Any ideas welcome!—
Reply to this email directly or view it on GitHub
#68 (comment).
from oboe.js.
Hmm, setting up Spring and Tomcat is a whole extra level of complexity (I can probably get it though, I come from a Java background), so I think I'll try to repro @goloroden's case first.
from oboe.js.
For what it's worth, I have some code using oboe.js to parse about 750 megs of JSON with nodejs. Without return oboe.drop;
memory usage steadily climbs until it crashes with
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Abort trap: 6
When I add return oboe.drop;
everything works fine, and the process runs to completion.
from oboe.js.
@lukeasrodgers Awesome, I got the same results. I suspect there might be something else going on that's causing the problem.
@Amberlamps @badisa did the two of you also have a problem with this? If so, could you share any other information that could help pin down what's happening? (i.e. what type of server you're running, how you're sending the data, and if possible what your oboe client-side looks like). Thanks!
from oboe.js.
@egervari Do you still have a version of your oboe code you could try something on? If so, on this line of code
}).node('![*]', function(documents) {
Could you change ![*]
to !.[*]
and see if that fixes your problem?
from oboe.js.
@goloroden I have a suspicion your error might be because of the event notation you're using on the client side.
If you check out that test repo I made, there's a branch named goloroden
where I change
.node('!.features[*]', function(feature) {
to
.on('node:!.features[*]',function(feature) {
Doing that causes Chrome to run out of memory and display an error message.
By the way, once you start the server off that repo, be sure to go to http://localhost:3000/home?drop=true, which causes the client side to use oboe.drop
from oboe.js.
I can't find a way to recreate this, so I'm going to close the issue. I'll wait until Feb 28 in case anyone in the thread can help me reproduce the bug 😃
@goloroden @badisa @Amberlamps @egervari @magic890 @lukeasrodgers
from oboe.js.
@JuanCaicedo For me the issue reproduces on slow connections, you can use recent Chrome to set network throttling at 4 Mb/s. Currently I work around it with stringily-parse mentioned above.
from oboe.js.
Sorry for the late answer, I'm currently investigating a few ideas and will report back… thanks so far for your help :-)
from oboe.js.
Yay, we have a result :-)))
When you run the old demo code as shown in the original post, the memory leak is still there:
When you change the line
}).on('node:!.*', function (event) {
to
}).node('!.*', function (event) {
the memory leak is gone:
This is really good news, as this not only means that there is a workaround, but it also means that oboe.drop
actually works - but only if you directly use the node
function.
So the essential question is: What is (and why is there) the difference between on
and node
?
from oboe.js.
Oh wow, that's really interesting and definitely a bug!
Would it be possible for you to share the code you used to profile this? Ideally if I could clone a repo and be able to reproduce the same results as you, I could look into this 😃
from oboe.js.
Oh, it's just the code of the original post to this issue.
What I did is the following:
- Run
docker run -d --name wolkenkit-profiling -p 80:80 -p 2003:2003 -p 8125:8125/udp hopsoft/graphite-statsd
to get a statsd/Graphite container up and running. - Save the server file as
server.js
and the client file asclient.js
. - Run
node server.js
. - Run
node client.js
. - Open a browser and point it to
http://192.168.9.130
(or whatever the IP of your Docker host is) to get to Graphite. - Switch to the dashboard view and add the graphs from
stats.gauges.client.js.Schneehase.local.memory.*
to it (and merge them to a single view if you want). - Wait for a few hours… ;-)
- After that, in the client, change the questionable line of code, and run it again.
from oboe.js.
I am experiencing the memory leak with
.node('{scores info}', function(node){
resultsData.addData(node);
$scope.$evalAsync();
return oboe.drop;
})
Unless I do the stringify/parse so it seems like it is not limited to just .on()
, but my case might be due to the {scores info}
portion of my code.
from oboe.js.
@badisa I guess it's because of your line
resultsData.addData(node);
where you explicitly keep a reference to the node you just received. Dropping it then of course does not have an effect.
from oboe.js.
@JuanCaicedo Any insights on this?
from oboe.js.
Haven't been able to look at it, I'm hoping for some time on Saturday 😃
from oboe.js.
Don't want to be pushy, but I am curious: Any news on this?
from oboe.js.
I'm also curious if there has been any progress on this.
from oboe.js.
any progress on it? I still can't load big data even using }).node('!.*', function (event) {
from oboe.js.
I've also been experiencing this memory leak, however, I have found that making a copy of the node seems to provide a workaround:
.node('!.*', node => {
const copy = JSON.parse(JSON.stringify(node));
events.push(copy);
return oboe.drop;
})
from oboe.js.
I thought I'd share an update. I'm currently the only one actively working on the project, and I've been dedicating most of my open source time towards a workshop I'm giving. I expect I should have more time to dedicate to oboe by the end of next week.
My first priority after that is to improve how the tests and build processes works. Right now these things make it fairly challenging for me to work on the source code and I think they'll make the issue easier to diagnose.
If anyone is interested in helping me do that, especially to get to know the codebase to narrow down where this might be, I would love the help 😄
from oboe.js.
@JuanCaicedo Any updates here?
from oboe.js.
Please refer to #137 (comment)
from oboe.js.
Related Issues (20)
- TypeError: Cannot read property '0' of undefined
- 'window is not defined' in web worker. v2.1.5 only HOT 5
- Order of node execution unclear HOT 1
- how to abort the streaming request? HOT 1
- Performance improvement needed, too slow for large data HOT 1
- oboe not releasing memory using oboe.drop HOT 8
- Parsing error HOT 1
- Streams broken as of NodeJS 10.20.1 HOT 1
- oboejs.com domain expired HOT 1
- Possible to stream chunks of JSON data? HOT 2
- Domain expired HOT 6
- Done function executes for each root object instead of final object response
- Oboe.js cannot be used in a web worker because window is not defined HOT 2
- huge npm package > 7Mb HOT 4
- Oboe.js, Promises, and Jest HOT 1
- Reference JSONPath
- Is oboe.js dead? HOT 4
- length of undefined error during parsing
- Memory leak on subscription HOT 6
- Website is down HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from oboe.js.