GithubHelp home page GithubHelp logo

note / xml-lens Goto Github PK

View Code? Open in Web Editor NEW
32.0 3.0 5.0 2.84 MB

XML Optics library for Scala

Home Page: https://note.github.io/xml-lens/

License: MIT License

Scala 100.00%
scala xml lenses optics

xml-lens's Introduction

XML Optics library for Scala. Documentation available here: https://note.github.io/xml-lens/

Build Status Coverage Status

Motivation

XML scala libraries are kind of neglected. That stands in stark contrast to JSON for which Scala has dozens of projects. Of course, JSON is much more popular and XML at the same time is regarded as a legacy standard but still - there are many situations where you need to work with XML.

Status of project

Some early versions of the project has been released. It's definitely not very mature yet. In next releases I would like to focus on DSL and optics aspects as suprisingly in the first release a lot of time was spent on io module. Don't expect rapid development as it's just side project made in my free time.

Various

How to generate documentation

sbt docs/makeMicrosite

After docs has been successfully generated you can serve it with:

cd docs/target/site
jekyll serve

Pushing documentation to github page

You can push generated documentation with:

docs/ghpagesPushSite

Mind that you have to have access to push to repository defined in build.sbt to make above snippet work.

How to run JMH benchmark

Example:

bench/jmh:run -i 10 -wi 10 -f1 -t1 -prof gc .*Roundtrip*.

Contributing

Contributions are very welcome. All code or documentation that is provided must be licensed with the same license xml-lens is licensed with (MIT license, available here.

License

All code is available to you under the MIT license, available here.

Acknowledgements

Many thanks to scalac that funded early development of xml-lens.

xml-lens's People

Contributors

lolgab avatar note avatar scalolli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

xml-lens's Issues

Normalization of XML

At some point we will want to have reasonable output. Outside of pure formatting aspect it would be nice to e.g. try to avoid multiple namespace declarations for the same namespaces. Probably all namespace declarations should be moved to root element.

Such operations should be optional - there may be some cases when user want to avoid unneccessary transformations as want to have output as much similar to input as it's possible.

There's an example of such behavior (namely - many namespace declarations for one namespace) in test replaceOrAddAttr for ResolvedNameMatcher in OpticsBuilderSpec

Equivalent to `javax.xml.stream.isCoalescing` in `XmlParser`

When replacingEntityReferences is enabled it may be observed that a few Text in row appears. In theory javax.xml.stream.isCoalescing should control this behavior but unfortunately while setting it to true solves that issue it has some unexpected side effects - namely EntityReferences are not parsed if replacingEntityReferences is set to true. It may seems that we can set isCoalescing only when replacing... is set to true but it will not work as it also causes CData not being parsed. It's described here: https://docs.oracle.com/cd/E17802_01/webservices/webservices/docs/1.5/sjsxp/ReleaseNotes.html

To avoid relying on strange behaviors of Java parsers I think xml-lens should provide coalescing functionality by itself. Either as part of parser or as post-processing (same as minimize is done)

How would I focus on only the instances of an element which have a particular attribute value?

This lib looks great!

But I'm a bit stuck on how to do something that I'd expected to be straight forward. Maybe I just don't know enough about optics. What I want is to select only the elements that have an attribute with a given value.

So given this example:

val xml =
  """
    |<a>
    |  <b>
    |    <c example="1">1234</c>
    |    <c example="2">5678</c>
    |    <c example="3">9123</c>
    |  </b>
    |</a>
  """.stripMargin

I'd be after only the <c> that has a example attribute with value 2. Using this expression gives me all of the <c> elements:

val c = root \ "b" \ "c"
println(pl.msitko.xml.parsing.XmlParser.parse(xml).map(c.getAll))

//prints:
//Right(List(Element(Vector(Attribute(ResolvedName(,,example),1)),List(Text(1234)),Vector()), Element(Vector(Attribute(ResolvedName(,,example),2)),List(Text(5678)),Vector()), Element(Vector(Attribute(ResolvedName(,,example),3)),List(Text(9123)),Vector())))

I've tried using having, something like this:

val c = root \ "b" \ "c" having {
  case LabeledElement(_, Element(attr, _, _)) => attr.find(_.key.localName == "example").exists(_.value == "1")
}

but that seems to only pass child elements of <c> to the partial function, which doesn't give me a chance to inspect the attributes.

`Element`s don't compare equal when attributes are in different order

Possibly related to #7

We find that two parsed ASTs often don't compare equal because the attributes/namespace declarations are in a different order. The order of attributes should be irrelevant - https://www.w3.org/TR/REC-xml/#sec-starttags

This seems to caused by the attributes/namespacedecs being stored in a Seq:

final case class Element(attributes: Seq[Attribute] = Seq.empty, children: Seq[Node] = Seq.empty, namespaceDeclarations: Seq[NamespaceDeclaration] = Seq.empty)

Could this be solved by using a Map? For instance:

final case class Element(attributes: Map[ResolvedName, String] = Map.empty, children: Seq[Node] = Seq.empty, namespaceDeclarations: Map[String, String] = Map.empty)

I guess this looses the use of the explict Attribute and NamespaceDeclaration types but is worth the tradeoff IMHO

Add more options to PrinterConfig

Ideas of additional options in PrinterConfig:

  • add Boolean option for repairing namespaces (i.e. automatically defining used namespaces in case they're not yet defined)
  • add option which defines how to treat multiple attributes for the same elements (namely <a attr="val1" attr="val2"></a>). Exemplary behaviors - ignore it and print all of them, flatten them by concatenating them separated by spaces, use the last value, use the first value)

Performance tests mimicking real usage

There are already some simple tests but they're very synthetic. They're useful in the sense that they allow us to easily find what the bottleneck is. Besides of them we should have tests mimicking real world usage (doing some transformations on real world XML, trying to operate on quite big files (e.g. a few MBs may be also interesting).

Would be nice to add test results to doc (probably a separated MD file not to clutter the main docs)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.