GithubHelp home page GithubHelp logo

Comments (3)

di avatar di commented on August 30, 2024

Hi @syxolk! Thanks for the report, and sorry that vladiate was eating up your memory.

Is there any way I can stop the validation early? Can Vladiate detect if there were too many wrong fields and stop validating?

This is currently not possible, but it could be a valuable optional feature.

I'm a little more interested in figuring out why so much memory is getting consumed and if the footprint can be reduced (I'm guessing it can, as I haven't really tested the upper bounds of this tool).

from vladiate.

syxolk avatar syxolk commented on August 30, 2024

Hey, I found two issues that lead to high memory usage in my particular case:

First, validate() collects all exceptions in the failures dictionary. In my case, all rows were failures -> Every row's exception was collected in there.

Second, the SetValidator raises a ValidationException that stringifies all elements of valid_set. In my case, valid_set contained 140k UUIDs. That resulted in a pretty huge string.

If we can fix at least one of the two issues the memory problem should be gone:

  • The exceptions are printed as debug messages and the logger defaults to the info level, so no exceptions are printed by default. Can we disable collecting all the exception objects?
  • Don't stringify the entire valid_set. Instead cut it at 100 elements, similarly to what is done in _log_validator_failures

PS: I'm happy to contribute!

from vladiate.

di avatar di commented on August 30, 2024

The exceptions are printed as debug messages and the logger defaults to the info level, so no exceptions are printed by default. Can we disable collecting all the exception objects?

I think there's two things we could do here:

  1. We could only collect the failure exceptions if debugging is turned on. We'll need to add a --verbose flag or something, and I think we'll need to add another additional variable to determine if there were failures when not in debug mode (since we can't just check for a non-empty failures.

  2. We could collect something other than the entire exception (which is probably huge) into failures, like maybe just the exception's message?

I'm leaning towards the first one, since it seems like less work, and still preserves the entire exception for debugging, but I could be convinced otherwise.

Don't stringify the entire valid_set. Instead cut it at 100 elements, similarly to what is done in _log_validator_failures

I think this is a great idea.

PS: I'm happy to contribute!

PRs are welcome!

from vladiate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.