GithubHelp home page GithubHelp logo

Comments (11)

carlosafonso avatar carlosafonso commented on August 17, 2024 2

Can confirm. Our Symfony app breaks after using PhpSpreadsheet because it fails to properly read XML files.

from phpspreadsheet.

amcsi avatar amcsi commented on August 17, 2024

FYI it's because of these lines:
https://github.com/PHPOffice/PhpSpreadsheet/blob/develop/src/PhpSpreadsheet/Settings.php#L346
https://github.com/PHPOffice/PhpSpreadsheet/blob/develop/src/PhpSpreadsheet/Settings.php#L363

For anyone that's looking for a workaround for your SOAP calls or other external xml reading, call libxml_disable_entity_loader(false) before instantiating SoapClient or other things that involve reading external xml

from phpspreadsheet.

inspiran avatar inspiran commented on August 17, 2024

@amcsi thx for mentioning a work around, works fine.

from phpspreadsheet.

PowerKiKi avatar PowerKiKi commented on August 17, 2024

It would be appreciated it someone can come up with a proper fix that is covered by unit tests.

Without digging into it I suppose something like remembering previous global settings and re-set them afterwards might be the way to go. What do you think ?

from phpspreadsheet.

agopaul avatar agopaul commented on August 17, 2024

I did a bit or research on this.

Apparently, the incriminated piece of code calls the libxml_disable_entity_loader() function which is global and affects all libxml-bases libraries, so basically, every other library/code that works with XML is impacted by this. The function call was added 3 years ago for security reasons (CVE-2014-2054).

It seems that that call should be left in the codebase, so maybe the best workaround would be to make an XML reader "factory" for each XML library (we have all of them here: SimpleXML, DOMDocument, and XMLReader) that wraps the code required to open the XML file. Here is an example taken from the last link:

$oldValue = libxml_disable_entity_loader(true);
$dom = new DOMDocument();
$dom->loadXML($xml);
libxml_disable_entity_loader($oldValue);

Settings::getLibXmlLoaderOptions() has 40 occurrences in total, so this would be a fairly big change.

Edit: Just wanted to clarify. The change wouldn't be that big per se, but of the four classes involved, only one is covered by unit tests. I'm not sure if this is something that could be changed with confidence without having all the necessary tests in place.

from phpspreadsheet.

PowerKiKi avatar PowerKiKi commented on August 17, 2024

Unfortunately unit tests are lacking and there isn't much we can do except to add new one progressively.

Your suggestion sounds good to me as long as we are sure that those options only have effect when loading the document, and not later when using the objects. If that would be the case, that would mean we would have to keep "our" options until the end of usage of the reader/writer, which may be hard to do. Is there are a to confirm or infirm that ?

Also on a side note and slightly out of scope, but I wouldn't mind getting rid of SimpleXML in favor of DOMDocument. If it's not too much work, not too much risky, and could simplify our code base (future factories), then that could/should be done.

from phpspreadsheet.

amcsi avatar amcsi commented on August 17, 2024

@agopaul Would you really have to unit test every usage? I would think that you could just create a wrapper class around DOMDocument/SimpleXML, use and unit test those. You already assume that DOMDocument and SimpleXML work as expected due to being external classes, so if you just created a wrapper, you could get away with just testing the differences.

from phpspreadsheet.

agopaul avatar agopaul commented on August 17, 2024

Theoretically, that is true, but there is a marginal risk of breaking something due to unexpected behavior of the XML libraries involved.

Also, I'm not 100% sure that a wrapper would do the job: like @PowerKiKi said, we don't know if the options are only used while loading the XML or if they are also used in other parts of the libraries.

Maybe I'll do some testing on the weekend.

In the meantime, if this is breaking your application, a quick workaround would be using the libxml_disable_entity_loader() function after you've used PhpSpreadsheet and re-enable the entity loader.

from phpspreadsheet.

agopaul avatar agopaul commented on August 17, 2024

Coincidently I ran into an error due to this very issue just yesterday when a test suite on a project that I'm working on was failing unexpectedly and sure enough, it was because of this function calls.

Anyway, the whole point of disabling the external entities loader via libxml_disable_entity_loader() is to prevent XXE/XEE attack vectors.

Digging inside the codebase I've found that @MarkBaker actually fixed the XXE/XEE vulnerability in another way after the commit that added the calls to the libxml_disable_entity_loader() function: PHPOffice/PHPExcel@0ab614f

So there are 2 fixes for the same problem.

A few considerations on @MarkBaker's solution:

  • It's thread safe, whereas libxml_disable_entity_loader() is not: https://bugs.php.net/bug.php?id=64938
  • It doesn't touch libxml global configuration so it doesn't conflict with other libraries that interact with libxml
  • It's more aggressive since libxml_disable_entity_loader() just disables the external entity loading whereas this implementation just scans for entities and throws an Exception if one is found. That, of course, is because (I assume) none of the XML files parsed by PhpSpreadsheet should use XML entities.

I looked up how the Zend Framework team fixed this and they created a library that parses XML safely and they adopted both approaches (libxml_disable_entity_loader() and Mark's). The lib uses one or the other depending on the thread safety of the current PHP context.

Links:

In other words, I think it's safe to say that these libxml_disable_entity_loader() calls inside the Settings class can just be removed.

I think this is probably a safe change and we wouldn't need to have a complete test coverage of the readers to do it. I can work on a PR if needed.

from phpspreadsheet.

PowerKiKi avatar PowerKiKi commented on August 17, 2024

Thanks for your in-depth analysis. If I understand things correctly we can remove libxml_disable_entity_loader() if and only if every single loaded XML is checked with BaseReader::securityScanFile() beforehand. It seems it was done by Mark, but it would be best to double-check. If you can do so, then I'll merge your PR.

from phpspreadsheet.

PowerKiKi avatar PowerKiKi commented on August 17, 2024

Closed via #113

from phpspreadsheet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.