GithubHelp home page GithubHelp logo

css4j / css4j Goto Github PK

View Code? Open in Web Editor NEW
14.0 3.0 0.0 8.83 MB

CSS parser with Event and Object Model APIs, a DOM wrapper and a CSS-aware DOM implementation. Written in the Java™ language.

Home Page: https://css4j.github.io/

License: BSD 3-Clause "New" or "Revised" License

Java 85.93% CSS 13.12% HTML 0.90% Groovy 0.03% Shell 0.01%
css dom cssom java css-parser

css4j's Issues

Escaped hex serialization is not list-safe

The current serialization for hex-escaped CSS identifiers is convenient because it mostly preserves the way that the identifier was specified, but unfortunately this has a few side effects. When an hex-escaped identifier is specified in a context where the final white space is not necessary, and then reused later in another context, this could lead to bugs.

Consider the following sequence of identifiers being used to specify font-family:

font-family: Font Awesome \\35;

The last identifier does not have a trailing space because it is not needed there. However, if we retrieve the property value as a list and add a new value:

ValueList list = (ValueList) style.getPropertyCSSValue("font-family");
StyleValue value = (new ValueFactory()).parseProperty("Free");
list.add(value);
System.out.println(list.getCssText());

the result would be Font Awesome \\35 Free, which unescaped is Font Awesome 5Free instead of the expected Font Awesome 5 Free.

The above example is a bit forced, but the behaviour is not compliant with the specification. The problem affects whitespace-separated value lists, including space-separated bracket lists (although I do not expect that a lot of grid lines are going to have hex-escaped names).

[DOM] Add a getter for innerText

The innerText DOM property was introduced by MS Internet Explorer and adopted by the other major browsers without a lot of enthusiasm, however it has its defenders and its use cases. See The poor, misunderstood innerText by Juriy Zaytsev for background information.

The property is specified in HTML, although I did not use it to write the code:

https://html.spec.whatwg.org/multipage/dom.html#the-innertext-idl-attribute

As said in the aforementioned blog post, a typical use case is about rich text editing in a browser, however it is useful for any application where a rich-text representation of a document fragment (with all its HTML tags and styles) is used, but one also wants a plain text version of that content (to store in a database field or a plain text document).

This library focuses on non-browser use cases, and this implementation is not intended to be completely equivalent to what web browsers do. For example, I try to avoid some empty lines that often appear in innerText as given by browsers, and a white space is added before a list item —as suggested by J. Zaytsev's blog post, which has been very useful— although only if the list-style-position CSS property is set to inside.

The fe-innertext branch contains the patch intended for merging.

Implement the `color()` function from Color Level 4

The color() function is specified by Color Level 4, and is closely related to the @color-profile rule that nobody has implemented yet.

Browser support

The function was implemented by Webkit in 2016 (for p3 color space). The following are the relevant trackers for Chrome and Firefox:

I could not find any bug for implementing @color-profile in Chrome nor in Webkit, and the Firefox bug is quiet, so at this point I'm not implementing a specific @color-profile rule (it can always be handled as a generic "ignorable rule" in NSAC, or a "unknown rule" in css4j's CSSOM).

Implementation details

The idea is that RGB colors implement the RGBAColor interface regardless of how they were specified. That is, the following color values are implementing RGBAColor:

  • #00f.
  • rgb(0 0 255).
  • color(srgb 0 0 1).
  • color(display-p3 0 0 1).

This is different to the approach that Houdini's TypedOM is following, where RGB values specified via the color() function implement the CSSColor interface but not the CSSRGB one which is only implemented by rgb() values. See https://drafts.css-houdini.org/css-typed-om-1/#dom-csscolorvalue-colorspace for the color value-interface mapping in TypedOM.

In the implementation that this project is adopting, one cannot know whether a sRGB color was specified through a color(srgb ...) or rgb(...) function unless the color is serialized, but that's not really something new. Even with Houdini's TypedOM one cannot tell whether a color implementing CSSRGB was specified with a #00f or a rgb(0 0 255).

One consequence of the approach that I'm following is that the RGBColorValue interface is now deprecated. In fact, one no longer needs to cast color values to any CSSColorValue sub-interface, just cast the CSSColor returned by CSSColorValue.getColor() to the relevant RGBAColor, XYZColor etc., according to the value returned by CSSColorValue.getColorModel(). Or even do not cast at all, and access the color components through CSSColor.item().

OM expressions need stricter sanity checks for missing operands

Object Model expressions assume that certain sanity checks were performed at the NSAC level, however calc() expressions may have been parsed with a var() value inside. Once that value is substituted, the resulting expression may not meet these expectations.

To deal with those cases, the Object Model expressions need stricter sanity checks about missing operands.

Support matching level 4 `:lang` pseudoclass in CSSOM

The library fully supports the :lang pseudoclass from level 3 selectors, both in parsing/serializing and in selector matching (style computation). Level 4 adds comma-separated lists of language specs, and wildcards. The library can parse and serialize level 4 correctly, but won't match the new level 4 selectors when computing styles.

No major browser supports this yet, but things are starting to move in Chrome. Here are the relevant trackers for the main user agents:

Perhaps css4j should match :lang according to the new level 4 features.

Regression parsing custom properties that contain a top-level asterisk

A regression was introduced in version 3.5.2: custom properties that contain a top-level asterisk, like

--foo: 2em * 3;

are triggering an error (the above value would be wrong, if it was a regular property). The problem is that the low-level NSAC parser was applying essentially the same error processing to both regular and custom properties (and that error processing became stricter in 3.5.2).

I'm preparing a fix, together with several unit tests to help catch similar regressions in the future.

Css4j lacks a single-step build

Currently, to build the library one has to fetch the css4j-dist repository, execute a script and then execute Gradle. This is not only annoying for developers, it is also difficulting the set up of a good CI system.

This library has three other modules in addition to the core module: css4j-agent, css4j-awt and css4j-dom4j. The idea has always been to promote them to independent projects, and in fact they are rarely updated. But downstream users seem to prefer that they all come in a single release, together with the main module. And admittedly, quite a few people would never learn about the other modules if they were kept as completely independent projects.

So my plan is to integrate css4j-agent, css4j-awt and css4j-dom4j into the main css4j (this repository). If you'd prefer to have them as completely independent projects (or keep things like they are now) please comment or downvote this post.

I'm going to post a heads-up about this in the css4j Google Group.

var() substitution in computed values was vulnerable to DoS attack

When computing a value that depended on the evaluation of a var() function, a denial of service attack similar to the "billion laughs attack" was possible.

Affected versions:

  • 2.0: from 2.0.0 to 2.0.4.
  • 1.0: from 1.0.0 to 1.0.6.

The issue was fixed in 2.0.5 and 1.0.7, released on July 28, 2020.

A new test, VarBLATest.java, was added to verify that the code is no longer vulnerable. The following CSS is tested:

body {
  --bla0: lol lol lol lol lol lol lol lol lol lol lol lol lol lol lol lol lol lol lol lol;
  --bla1: var(--bla0) var(--bla0) var(--bla0) var(--bla0) var(--bla0) var(--bla0) var(--bla0);
  --bla2: var(--bla1) var(--bla1) var(--bla1) var(--bla1) var(--bla1) var(--bla1) var(--bla1);
  --bla3: var(--bla2) var(--bla2) var(--bla2) var(--bla2) var(--bla2) var(--bla2) var(--bla2);
  --bla4: var(--bla3) var(--bla3) var(--bla3) var(--bla3) var(--bla3) var(--bla3) var(--bla3);
  --bla5: var(--bla4) var(--bla4) var(--bla4) var(--bla4) var(--bla4) var(--bla4) var(--bla4);
  --bla6: var(--bla5) var(--bla5) var(--bla5) var(--bla5) var(--bla5) var(--bla5) var(--bla5);
  --bla7: var(--bla6) var(--bla6) var(--bla6) var(--bla6) var(--bla6) var(--bla6) var(--bla6);
  --bla8: var(--bla7) var(--bla7) var(--bla7) var(--bla7) var(--bla7) var(--bla7) var(--bla7);
  --bla9: var(--bla8) var(--bla8) var(--bla8) var(--bla8) var(--bla8) var(--bla8) var(--bla8);
  --bla10: var(--bla9) var(--bla9) var(--bla9) var(--bla9) var(--bla9) var(--bla9) var(--bla9);
  --bla11: var(--bla10) var(--bla10) var(--bla10) var(--bla10) var(--bla10) var(--bla10) var(--bla10);
  --bla12: var(--bla11) var(--bla11) var(--bla11) var(--bla11) var(--bla11) var(--bla11) var(--bla11);
  --bla13: var(--bla12) var(--bla12) var(--bla12) var(--bla12) var(--bla12) var(--bla12) var(--bla12);
  --bla14: var(--bla13) var(--bla13) var(--bla13) var(--bla13) var(--bla13) var(--bla13) var(--bla13);
  --bla15: var(--bla14) var(--bla14) var(--bla14) var(--bla14) var(--bla14) var(--bla14) var(--bla14);
  --bla16: var(--bla15) var(--bla15) var(--bla15) var(--bla15) var(--bla15) var(--bla15) var(--bla15);
  --bla17: var(--bla16) var(--bla16) var(--bla16) var(--bla16) var(--bla16) var(--bla16) var(--bla16);
  --bla18: var(--bla17) var(--bla17) var(--bla17) var(--bla17) var(--bla17) var(--bla17) var(--bla17);
  --bla19: var(--bla18) var(--bla18) var(--bla18) var(--bla18) var(--bla18) var(--bla18) var(--bla18);
  --bla:  var(--bla19) var(--bla19) var(--bla19) var(--bla19) var(--bla19) var(--bla19) var(--bla19);
}
div {
  margin-left: var(--bla) var(--bla) var(--bla) var(--bla) var(--bla) var(--bla) var(--bla);
}

With CSS like the above and old versions of this library, an attacker could put in trouble any VM trying to compute styles from it.

It is believed that most users have already upgraded to more recent versions, and anyone who didn't is encouraged to use the latest available releases (currently 3.1, 2.2 and 1.1).

Problem with table indexing and pseudo classes

Hello,

I am trying to generate XHTML with applying CSS to it. I am struggling with following CSS Code:

table#corporate tr:first-child { background-color:#002a55; color:#fff; font-weight:bold } table#corporate tr:nth-child(odd):not(:first-child) { background-color:#f5f5f5; color:#000; } table#corporate tr:nth-child(even) { background-color:#fff; color:#000 }

This should be applied to a table:

<table id="corporate"> <tbody> <tr> <td>Test</td> <td>.</td> </tr> <tr> <td>.</td> <td>.</td> </tr> <tr> <td>.</td> <td>.</td> </tr> <tr> <td>.</td> <td>.</td> </tr> <tr> <td>.</td> <td>.</td> </tr> </tbody> </table>

but the CSS only get's applied to the first 4 table rows, the last row is skipped every time I use some kind of pseudoclass selectors.
I also tried specifically selecting the 5th nth-child but with no luck. I think the algorithm stops at the last entry of the table row entries.

The HTML after applying CSS:

<table id="corporate"> <tbody> <tr style="background-color: #002a55; color: #fff; font-weight: bold; "> <td>Test</td> <td>.</td> </tr> <tr style="background-color: #fff; color: #000; "> <td>.</td> <td>.</td> </tr> <tr style="background-color: #f5f5f5; color: #000; "> <td>.</td> <td>.</td> </tr> <tr style="background-color: #fff; color: #000; "> <td>.</td> <td>.</td> </tr> <tr> <td>.</td> <td>.</td> </tr> </tbody> </table>

The css4j version I'm using: css4j 3.7.1

Support HWB colors as NSAC lexical units

Currently, HWB colors are parsed as generic functions at the NSAC level, then assigned to a HWBColorValue by the CSSOM value factory. This made sense as the HWB colors had been specified for a long time but not implemented by the main browsers.

However, now all the main user agents have implemented HWB (except Opera). It is time to add HWB colors as a full NSAC lexical type.

Release version 1.3.0

Version 1.2.0 was released in December 2020, and although the 1-stable branch is nowhere near the level of features of master, it is still the easiest upgrade path for people that use other SAC implementations, or that are stuck with Java 7.

Therefore, I have been backporting a few master fixes to 1-stable, and aim at the release of 1.3.0 at the end of this month or early February. A new release from master (3.7.1) should follow shortly after, to avoid confusing people with the "latest" Github release being 1.3.0.

If there is a specific feature or fix that you would like to see in 1.3.0, this is the place to comment at.

Customizable minified serialization of computed styles

Currently, DeclarationFormattingContext allows to customize the normal serialization of computed styles, given by computedStyle.getCssText(). But it would be useful to be able to customize computedStyle.getMinifiedCssText() as well.

Add a writeMinifiedValue() method to DeclarationFormattingContext.

Migrate to a Maven/Gradle source directory layout

The current directory layout is simple and just works, however some tools (including Jazzer) have the Maven directory structure hard-coded. The current Jazzer integration (under the src directory) is problematic and I'd rather migrate to a Maven/Gradle layout than waiting for a mistake to happen.

Make `MediaQueryList.getMediaQuery(int)` publicly visible

This library's implementations of MediaQueryList internally use a getMediaQuery(int) method to gain access to MediaQuery instances. The current MediaQueryList is a text-only interface (as specified by W3C), with the advantage of fine-grained object-model accesses being lost.

With getMediaQuery(int) being part of MediaQueryList, developers could obtain MediaQuery objects and use getCondition() to obtain the query conditionals. A new MediaQueryPredicate interface would help in determining the type of predicate and the name of the media type or feature involved.

Clamping out-of-gamut lab/lch colors

The current procedure to clamp out-of-gamut colors when converting from lab() and lch() to RGB (sRGB color space) in toRGBColorValue() and CSSColorValue.toRGBColorValue(boolean) is inadequate, and a more elaborated process should be used.

This issue is opened as a reminder that whatever comes out from CSSWG's (csswg-drafts) issue 5191 should be looked upon.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.