GithubHelp home page GithubHelp logo

Error on exit from mzML about psims HOT 10 CLOSED

bretttully avatar bretttully commented on May 27, 2024
Error on exit from mzML

from psims.

Comments (10)

mobiusklein avatar mobiusklein commented on May 27, 2024 1

I'll put this in the XMLDocumentWriter.end method to mitigate the error. It's odd that I never encountered it during the three years prior to psims being formally published, where I wrote some very big files.

I'll likely make another release within the next day, but the change is in 8e7ad24 if you can work with the development version.

from psims.

mobiusklein avatar mobiusklein commented on May 27, 2024

Hmm. This doesn't seem to happen on the CI server, ever, but I have seen it on one of the Windows machines I use, but with a different long as the error code. Unfortunately, I've only seen it intermittently too. Does it cause the XML file to be incomplete/malformed when it does happen?

from psims.

bretttully avatar bretttully commented on May 27, 2024

The error code changes each time, which makes me think it might be dereferencing a null pointer somewhere? I will check the mzML when I'm at my work machine.

from psims.

bretttully avatar bretttully commented on May 27, 2024

Yes, the mzML file is complete and readable.

Looks like others have seen this too: https://pastebin.com/xJcEME60
and https://bitbucket.org/openpyxl/openpyxl/issues/1046/serialisationerror-unknown-error

from psims.

bretttully avatar bretttully commented on May 27, 2024

To clarify on the above, the error code is constant for a given file, but it changes for different files. I will keep investigating.

from psims.

mobiusklein avatar mobiusklein commented on May 27, 2024

This problem boils out of libxml2, and it is shielded from view at the Python level because the attribute where the error is stored is only visible from C, and the C-extension class is defined @cython.internal, so it can't be accessed without rebuilding lxml from scratch.

from psims.

bretttully avatar bretttully commented on May 27, 2024

It seems to happen pretty frequently once the mzML file gets about 2 or 3 GB when running in our Docker container. Thinking the easiest way forward might be this:

try:
    with MzMLWriter(open(self._mzml_fname, 'wb')) as writer:
        # do work...
        pass
except etree.SerialisationError as e:
    self.logger.warning(f'Error closing file\n\n{e}\n')

from psims.

bretttully avatar bretttully commented on May 27, 2024

Looks like it might be a known problem for lxml for large files: https://bugs.launchpad.net/lxml/+bug/1570388

Slightly safer try/except:

try:
    with IndexedMzMLWriter(open(self.mzml_fname, 'wb') , close=True) as writer:
        pass
except etree.SerialisationError as e:
    if str(e).startswith('unknown error'):
        # There seems to be a bug when closing the mzML file, usually when they get large
        # see here: https://bugs.launchpad.net/lxml/+bug/1570388
        self.logger.debug(f'Error closing file: {e}')
    else:
        raise

from psims.

mobiusklein avatar mobiusklein commented on May 27, 2024

Fix is live in https://pypi.org/project/psims/0.1.27/

from psims.

bretttully avatar bretttully commented on May 27, 2024

Have tested, and throws as expected on a file that used to crash:

/Users/dev/anaconda3/envs/dev/lib/python3.7/site-packages/psims/xml.py:882: UserWarning: Error closing file: unknown error -1476144936
  warnings.warn("Error closing file: {}".format(err))

Thanks for the fix

from psims.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.