ericchiang / pup Goto Github PK

Parsing HTML at the command line

License: MIT License

Go 14.79% Ruby 0.27% Python 0.18% HTML 84.72% Shell 0.04%

pup's Introduction

pup

pup is a command line tool for processing HTML. It reads from stdin, prints to stdout, and allows the user to filter parts of the page using CSS selectors.

Inspired by jq, pup aims to be a fast and flexible way of exploring HTML from the terminal.

Install

Direct downloads are available through the releases page.

If you have Go installed on your computer just run go get.

go get github.com/ericchiang/pup

If you're on OS X, use Homebrew to install (no Go required).

brew install https://raw.githubusercontent.com/EricChiang/pup/master/pup.rb

Quick start

$ curl -s https://news.ycombinator.com/

Ew, HTML. Let's run that through some pup selectors:

$ curl -s https://news.ycombinator.com/ | pup 'table table tr:nth-last-of-type(n+2) td.title a'

Okay, how about only the links?

$ curl -s https://news.ycombinator.com/ | pup 'table table tr:nth-last-of-type(n+2) td.title a attr{href}'

Even better, let's grab the titles too:

$ curl -s https://news.ycombinator.com/ | pup 'table table tr:nth-last-of-type(n+2) td.title a json{}'

Basic Usage

$ cat index.html | pup [flags] '[selectors] [display function]'

Examples

Download a webpage with wget.

$ wget http://en.wikipedia.org/wiki/Robots_exclusion_standard -O robots.html

Clean and indent

By default pup will fill in missing tags and properly indent the page.

$ cat robots.html
# nasty looking HTML
$ cat robots.html | pup --color
# cleaned, indented, and colorful HTML

Filter by tag

$ cat robots.html | pup 'title'
<title>
 Robots exclusion standard - Wikipedia, the free encyclopedia
</title>

Filter by id

$ cat robots.html | pup 'span#See_also'
<span class="mw-headline" id="See_also">
 See also
</span>

Filter by attribute

$ cat robots.html | pup 'th[scope="row"]'
<th scope="row" class="navbox-group">
 Exclusion standards
</th>
<th scope="row" class="navbox-group">
 Related marketing topics
</th>
<th scope="row" class="navbox-group">
 Search marketing related topics
</th>
<th scope="row" class="navbox-group">
 Search engine spam
</th>
<th scope="row" class="navbox-group">
 Linking
</th>
<th scope="row" class="navbox-group">
 People
</th>
<th scope="row" class="navbox-group">
 Other
</th>

Pseudo Classes

CSS selectors have a group of specifiers called "pseudo classes" which are pretty cool. pup implements a majority of the relevant ones them.

Here are some examples.

$ cat robots.html | pup 'a[rel]:empty'
<a rel="license" href="//creativecommons.org/licenses/by-sa/3.0/" style="display:none;">
</a>

$ cat robots.html | pup ':contains("History")'
<span class="toctext">
 History
</span>
<span class="mw-headline" id="History">
 History
</span>

$ cat robots.html | pup ':parent-of([action="edit"])'
<span class="wb-langlinks-edit wb-langlinks-link">
 <a action="edit" href="//www.wikidata.org/wiki/Q80776#sitelinks-wikipedia" text="Edit links" title="Edit interlanguage links" class="wbc-editpage">
  Edit links
 </a>
</span>

For a complete list, view the implemented selectors section.

`+`, `>`, and `,`

These are intermediate characters that declare special instructions. For instance, a comma , allows pup to specify multiple groups of selectors.

$ cat robots.html | pup 'title, h1 span[dir="auto"]'
<title>
 Robots exclusion standard - Wikipedia, the free encyclopedia
</title>
<span dir="auto">
 Robots exclusion standard
</span>

Chain selectors together

When combining selectors, the HTML nodes selected by the previous selector will be passed to the next ones.

$ cat robots.html | pup 'h1#firstHeading'
<h1 id="firstHeading" class="firstHeading" lang="en">
 <span dir="auto">
  Robots exclusion standard
 </span>
</h1>

$ cat robots.html | pup 'h1#firstHeading span'
<span dir="auto">
 Robots exclusion standard
</span>

Implemented Selectors

For further examples of these selectors head over to MDN.

pup '.class'
pup '#id'
pup 'element'
pup 'selector + selector'
pup 'selector > selector'
pup '[attribute]'
pup '[attribute="value"]'
pup '[attribute*="value"]'
pup '[attribute~="value"]'
pup '[attribute^="value"]'
pup '[attribute$="value"]'
pup ':empty'
pup ':first-child'
pup ':first-of-type'
pup ':last-child'
pup ':last-of-type'
pup ':only-child'
pup ':only-of-type'
pup ':contains("text")'
pup ':nth-child(n)'
pup ':nth-of-type(n)'
pup ':nth-last-child(n)'
pup ':nth-last-of-type(n)'
pup ':not(selector)'
pup ':parent-of(selector)'

You can mix and match selectors as you wish.

cat index.html | pup 'element#id[attribute="value"]:first-of-type'

Display Functions

Non-HTML selectors which effect the output type are implemented as functions which can be provided as a final argument.

`text{}`

Print all text from selected nodes and children in depth first order.

$ cat robots.html | pup '.mw-headline text{}'
History
About the standard
Disadvantages
Alternatives
Examples
Nonstandard extensions
Crawl-delay directive
Allow directive
Sitemap
Host
Universal "*" match
Meta tags and headers
See also
References
External links

`attr{attrkey}`

Print the values of all attributes with a given key from all selected nodes.

$ cat robots.html | pup '.catlinks div attr{id}'
mw-normal-catlinks
mw-hidden-catlinks

`json{}`

Print HTML as JSON.

$ cat robots.html  | pup 'div#p-namespaces a'
<a href="/wiki/Robots_exclusion_standard" title="View the content page [c]" accesskey="c">
 Article
</a>
<a href="/wiki/Talk:Robots_exclusion_standard" title="Discussion about the content page [t]" accesskey="t">
 Talk
</a>

$ cat robots.html | pup 'div#p-namespaces a json{}'
[
 {
  "accesskey": "c",
  "href": "/wiki/Robots_exclusion_standard",
  "tag": "a",
  "text": "Article",
  "title": "View the content page [c]"
 },
 {
  "accesskey": "t",
  "href": "/wiki/Talk:Robots_exclusion_standard",
  "tag": "a",
  "text": "Talk",
  "title": "Discussion about the content page [t]"
 }
]

Use the -i / --indent flag to control the intent level.

$ cat robots.html | pup -i 4 'div#p-namespaces a json{}'
[
    {
        "accesskey": "c",
        "href": "/wiki/Robots_exclusion_standard",
        "tag": "a",
        "text": "Article",
        "title": "View the content page [c]"
    },
    {
        "accesskey": "t",
        "href": "/wiki/Talk:Robots_exclusion_standard",
        "tag": "a",
        "text": "Talk",
        "title": "Discussion about the content page [t]"
    }
]

If the selectors only return one element the results will be printed as a JSON object, not a list.

$ cat robots.html  | pup --indent 4 'title json{}'
{
    "tag": "title",
    "text": "Robots exclusion standard - Wikipedia, the free encyclopedia"
}

Because there is no universal standard for converting HTML/XML to JSON, a method has been chosen which hopefully fits. The goal is simply to get the output of pup into a more consumable format.

Flags

Run pup --help for a list of further options

pup's People

Contributors

Stargazers

Watchers

Forkers

nivertech duncanbeevers neuroradiology eiriklv roclv tianwalker2012 danielshir bitsworking lirongjun carvrooom tml mattn bryangarza aaaliua bincker samuell mewbak obrus4u2c karlpilkington mixacom lazywei claudioc kublaj rayleyva vanderwal imace kmonsoor shenzhenjesenindustrialcoltd trocker phiol thinkbox lemonhall beardannihilator review--- zofuthan aidens jean ripta cybernetics simpledataio jacobxk pombredanne tomazbracic yodamaster hihihippp flyeven maybejosh laszlo-kiss rmoorman imjerrybao iask shotishu helitools nycmonkey leminhtuan2015 codeyu sebm fschaefer bunam jn7163 rodenluo izouxv epiphyllum tianguanghui franciscocpg xcage15 nwut wrenth04 harendranathvegi9 masterlink720 summerhearts bryant1410 eticzon simple555a schezuk davyhua hxv havenwood kawaken mastahype pkoppstein integrateddigital cngolang wolfgang42 secthinn cwonrails boomlee sanjayl andygithubchen frank0718 mhess-connectify priestd09 jessiewy loisaidasam winsx mrummuka swop4a hullgj arno01 rleverence-godaddy

pup's Issues

streaming implementation request

when feed html to pup using pipe with infinite streaming content, pup do not output anything but just eat up memory. This behavior restricts pup's use in shell pipe when large input is encountered

Accesible via homebrew?

This is such a neat program! I usually opt for writing python/node.js scripts; but pup is perfect for simple bash scripts.

It'd be neat if pup is installable via homebrew: http://brew.sh/ for osx.

i.e. brew install pup

afaik, the name is not taken: http://braumeister.org/search/pup

Just a suggestion. Getting dist releases via github is not really a big deal.

Matching classes

With this HTML file:

<div class="a b c">
  Hello
</div>

If I wanted to match that <div>, I have to do:

cat file.html | pup 'div.a b c'

Yeah, spaces included. Doing div.a or div.b or div.c or div.a.b.c doesn't work D:

I think it would be very useful to have a REPL mode in pup, for exploring the output of different selectors, or for drilling down through the content by applying one selector, and then applying another selector to the output of the first, etc.

I don't have any personal experience with REPL libraries, but as a starting point, this one looks good and was used by PhantomJS. It looks like there is a golang wrapper here

Separate CSS selector logic into it's own package

The CSS parsing in pup is a bit of a mess right now. It's extremely ad hoc, hard to test, and sometimes just incorrect (#46 and #52).

A proposed improvement is a regexp style compiler + executer. Something along the lines of

package css

import "golang.org/x/net/html"

type Selector struct {
    // fields
}

func Compile(expr string) (*Selector, error) {
    // compile css selector
}

func (s *Selector) Select(node *html.Node) []*html.Node {
    // run the selector
}

This would massively simplify pup and help with testing.

Please show me how '+' operator used

I can not understand usage of the '+' operator as wiki said.
pup 'selector + selector'
pup 'div + a' , Is this usage right? but it does not show any results.
Please give more examples. Thank you

Getting strange characters for eg.  

When the content contains  , I will get "Â" character.

Kind regards,
Melroy

Using Pup as a library/Feature Request

Hi, is there a way to use pup as a library? E.g.

Package main
import ("github.com/ericchiang/pup"
"net/http")

cssSelector :=`...` //CSS selectors
 func main(){
page,err := http.Get(news.ycombinator.com)
result:=Pup.load(page).select(cssSelector)
fmt.Println(result)
}

Thanks in advance!

exact match class selector

I am having an "issue" it seems pup only likes exact matches on element class?

example.html

<div class="nav clearfix">
  <!-- Other elements Here -->
</div>

$ pup < example.html .nav                                                                                 
$ pup < example.html ".nav clearfix"                                                                               
<div class="nav clearfix">
</div>

json{} is not isomorphic

This one is a bit troublesome: it's a big issue (I think), but it may break backward compatibility :/

The issue is that pup will return a json array in most cases, except if there is a single element. It's an issue, because it makes it harder to work with the general case (when we don't know how much results we will get). Could it be possible to change this behavior?

EOF when using find and xargs

find ./ -name "*" -type f -print0 | xargs -0 pup 'style text{}'
Expected: returns style text
Actual: returns EOF for every file passes to pup

String encoding error ?

Hi,

When I download an html file with the following command: curl -O www.mysite.com/index.html, I get the following snippet:

[...]
<div class="quote">
    “Normal si c'est lent, on est sur le serveur de dev, pas de prod.”
</div>
[...]

Now if I want to parse that with pup like cat index.html | pup 'div[class="quote"] text{}', I get the following result:
“Normal si c'est lent, on est sur le serveur de dev, pas de prod.”

Is that normal that a single quote becomes '&#39 ;' ? It also happens with text containing accents.

And if I want to parse my file to get a Json output, I get the following result:

{
 "class": "quote",
 "tag": "div",
 "text": "“Normal si c\u0026#39;est lent, on est sur le serveur de dev, pas de prod.”"
}%

Now the quote changes from '&#39 ;' to '\u0026#39;'. Is there a way to have a proper encoding for accents, quotes, etc so that I don't get strange character in my files?

Namaste,
Mehdi

Attribute value condition escaping

The following gives me an errror:

$ curl -sL URL | pup 'a[href^=product.php?sku=] attr{href}'

I tried enclosing product.php?sku= in single and double quotes, but it doesn't work.

The error I get is:

Selector parse error: more than one '='

Can ":not" only negate simple selector?

e.g ':not(:first-child)' will yield parsing error on pup 0.3.7.

Change documentation to point to something other than w3schools

Please checkout http://www.w3fools.com/. It highlights several things wrong with the site you point to in your README file. May I suggest pointing to https://developer.mozilla.org/en-US/docs/Web/Guide/CSS/Getting_started/Selectors

Not selector?

Is it possible to select all the nodes except something matching a particular selector?

For instance, I’d like to turn:

<span>something <br style="page-break-before:auto" clear="all"></span>

Into:

<span>something</span>

By selecting doing something like (just randomly inventing this syntax):

cat myfile.html | pup *:not:br[style="page-break-before:auto"]

(Thanks for pup, it’s awesome.)

Feature Request: :eq(index) selector

Hi,

Could it be possible to implement the jQuery :eq() selector? Sometimes :nth-child is not enough to select the right element :)

+ and > swapped?

When I use the Copy Unique selector function of Firefox Inspector, and try to use the selector with pup on the same html file, it doesn't give any results. After changing all >s to +s it returns the expected element.

Enable Sourcegraph

I want to use Sourcegraph code search and code review with pup. A project maintainer needs to enable it to set up a webhook so the code is up-to-date there.

Could you please enable pup on @sourcegraph by going to https://sourcegraph.com/github.com/EricChiang/pup and clicking on Settings? (It should only take 15 seconds.)

Thank you!

:nth-child psudo selectors

I'm trying to parse an HTML table with some interesting data on it, pup works great for one part of it

curl -s "www.talkenglish.com/Vocabulary/Top-500-Adjectives.aspx" | pup "#GridView3 a text{}"

however, when trying to pull out how common they were, (the third TD element)

curl -s "www.talkenglish.com/Vocabulary/Top-500-Adjectives.aspx" | pup "#GridView3 td:nth-child(3) text{}"

no workey 😭

I think I can grab it using the align="right" attribute as a work around, however I think nth-child selectors should be a priority for you, because if you inspect a webpage with firefox and right click on an element, you can select "Copy Unique Selector", which will give you this in your clipboard #GridView3 > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(3), which is a very handy piece of pup can work with.

Templated Transformations

Rationale

pup has incredible potential as a quick web scraping tool.

To extract and reformat information from an HTML stream, the user would have to:

Download the HTML file
Filter it with pup multiple times, saving output to one or more additional files
Compile the extracted text in a structured format

To reach the next level of awesome, this entire workflow could be automated by pup.

Proposal

I propose adding a new flag to pup: -t | --template [file].

When the -t flag is present, pup ignores command-line selectors. Instead, it reads a file written in a simple templating language and renders the template replacing some special syntax with pup selector query results.

For example:

HTML

<html>
<head>
    <title>Some Page</title>
</head>
<body>
    <p>
        A link to <a href="http://www.google.com">Google</a>
    </p>
</body>
</html>

Template

{
    'title': '<# title text{} >',
    'link' : '<# body p a [href] >'
}

Pup output

{
    'title': 'Some Page',
    'link' : 'http://www.google.com'
}

decode html entities when using text{}

Maybe it would make sense to provide an option to decode html entities when using text{} output and thus output &/</... instead of &/&lt/....

I'm now piping the output of pup in 'recode html' which seems to do the trick.

Outputting to CSV

Hi, is there a way to output to csv?

Move compiled binaries to a different repo

Please.

They increment the size of this repo, and some people (me, at least) just want to get the code.

If I run go get the whole repo will be fetched, including the bins and all their committed versions.

Pup gives Trace/breakpoint trap

Using : https://github.com/ericchiang/pup/releases/download/v0.3.9/pup_linux_386.zip

bash-4.2$ curl -s https://news.ycombinator.com/ | pup 'table table tr:nth-last-of-type(n+2) td.title a'
Trace/breakpoint trap

Is it some known issue is someone looking onto this?

pup doesn't work when using single quotes

pup -f myfile "p" works
pup -f myfile 'p' does not work
(on windows)

Result in UTF-8

Hi,

Is there any setting to set result in UTF-8?
I execute this command:
curl http://wiki.seg.org/wiki/Dictionary:Deconvolution | pup div#mw-content-text
it gives me some weird characters.
i.e.
(de kon vō’ lū sh∂n) is translated as â lÅ« shân) kon vÅ
(1995, 285 and 292–303) is translated as (1995, 285 and 292â303)

Thanks

print each matching element on one line

When I do curl -s https://news.ycombinator.com/item?id=9996333 | pup '.comment' I'm having a hard time taking the output and passing it along, maybe greping for text in the matched elements or sorting or something.
Do you see an easier way to use the output than having an option to put each matched element on one line?

json{} doesn't decode html entities

Hi, maybe it would be interesting to automatically decode the html entities when building a json object, don't you think?

ie. that

<div>&amp;</div>

would give

{
    "text" : "&",
    ...
}

rather than

{
    "text" : "&amp;",
    ...
}

Since people will almost always have to convert it anyway (and that jq doesn't seem to have any builtin for this :()

Special characters are not encoded in output

HTML entities are decoded on input, but special characters are not encoded in the output.

$ echo '<div data-foo="&quot;">&amp;</div>' | pup div
<div data-foo=""">
 &
</div>

Change character after 0 -

Try to execute the following:

echo '0&nbsp;-' | pup

This will gives you:

0Â -

I expect no strange 'Â' character. Try for example:
echo '0 -fsdkfdsf dasd' | pup

See issue: #55

Feature request: line numbers in JSON output

How about adding a line_number key to the json object generated the json{} display function?

The use-case I have in mind is calling out to pup from vim and loading the results back into into vim's quickfix list to quickly jump between matched lines.

Preformatted text gets messed up

Prettification and indentation by pup messes up preformatted text.

Support for generic xml

If I try to match a tag which is not defined in html, pup will silently fail.
For example:
echo "" | pup "x"

will fail and print nothing. But both of

echo "<x class="y" />" | pup ".y"

echo "<a class="y" />" | pup "a"

will print the tag.

I would love to be able to use pup for filtering arbitrary xml-files, and not just actual html-files.

Unrecognized flag '--number'

On version 0.3.7 (and Ubuntu 14.04, if that matters):

   curl -s http://example.com | pup --number 'p'
  Unrecognized flag '--number'
  (23) Failed writing body

Some selectors do do work properly.

pup 'html > body > *' <<<'<a>test</a>'
```
<body>
<a>
test
</a>
</body>
```
should not output <body> tags.
pup 'html>body' <<<'<a>test</a>'
No output?
pup 'head ~ body' <<<'<a>test</a>'
No output?
pup 'head+body' <<<'<a>test</a>'
No output?
pup 'another' <<<'<another>test</another>'
No support for custom tags.

Binary pup files for all platforms

So I can install on linux server without installing go.

Missing line return after output when using the json{} option

When using the json{} option, there is a missing line return after the output in a shell.

Example:
curl -s https://news.ycombinator.com/item?id=9996333 | pup '.comment json{}'
.... lots and lots of output ... then these last lines ...
"class": "comment",
"tag": "span"
}
]virtzilla@monstervm:~/Desktop

Note the "]" isn't on it's own line on the last line as one would expect. Yes, it's a visual nitpick.

Using pup to modify html

I have a use case where I want to add a class="table" to all table tags, that don't already have that class specified. Currently I use a hacky sed to do it, but I was wondering if pup could be a more robust way of doing that.

attr{attrkey} is not working

I try your example.
curl http://www.pro-football-reference.com/years/2013/games.htm | pup table#games 'a[href_=boxscores]'
It gives me a result.
However, if I use curl http://www.pro-football-reference.com/years/2013/games.htm | pup table#games 'a[href_=boxscores]' attr{href}, it returns empty.

I use https://github.com/EricChiang/pup/releases/download/0.1.1/pup_linux_386 to execute it.

Thanks

Feature Request: Iterate over selected nodes and print out some data

Example HTML:

<h1 id="firstHeading" class="firstHeading" lang="en">
 <span dir="auto">
  Robots exclusion standard
 </span>
 <span dir="auto2">
  date: xyz
 </span>
</h1>

Now it would be great if pup could be used in a way:

iterate over all h1
for each h1, print out Robots exclusion standard, date: xyz, so the value of span with dir attribute auto, concatenated by a comma and the value of span with dir attribute auto2.

Multiple commands don't work properly

I'm trying to execute this multiple command rule without success:

> cat example.html | pup "h1 span text{} , img#poster attr{src}"
http://images...
> cat example.html | pup "h1 span text{}"
Demo
> cat example.html | pup "img#poster attr{src}"
http://images...

I don't know if I´m doing something wrong or there is a bug, but I can not get this expected result:

> cat example.html | pup "h1 span text{} , img#poster attr{src}"
Demo http://images...

Feature Request:XPath

Hi, any plans for XPath support?

Use several selectors?

Hello, is it possible to use several selectors one after the other. For example from here:

http://voxeu.org/article/economics-secession

I can get separately h1, .article-teaser, article-content

with

curl hIttp://voxeu.org/article/economics-secession | ./pup 'h1 text{}'

and so on, but I would like the text to appear one after the other. That is, the text from h1, then the text from .article-teaser, and then the text from .article-content.

import pup in code

Hi,

fairly new to Go, I don't know how to use pup in my code, without relying on pup binary.
(I would like to transform a script using curl to a pure Go solution).

How to call ParseHTML(pupIn, pupCharset) as it's in main package?

How is it best feasible. Sorry if the question is trivial and if it is more about documentation or packaging as a dependency than a real issue.

Thanks in advance.

Slice function does not return the expected result

First of all, thanks for this great work! Pup is a awesome tool!

I'm using the arm linux version of pup in my Raspberry Pi. The slice function always return empty on my tests. Even with the examples on readme like: pup < robots.html a slice{0}

Thanks in advance

Does pup supports XML?

I'm trying to extract an attribute from a node in this XML (taken from http://www.radio-canada.ca/Medianet/2010/CBF/DeRemarquablesOublies201001210200_m.asx):

<ASX xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <TITLE>Écoutez la tribune sur François-Xavier Aubry</TITLE>
  <PARAM NAME="isJeunesse" VALUE="False" />
  <PARAM NAME="isVirtualClip" VALUE="False" />
  <ENTRY>
    <REF HREF="http://medias-wm.radio-canada.ca/diffusion/2010/medianet/CBF/DeRemarquablesOublies201001210200_m.wma" />
    <DURATION VALUE="00:55:00.000" />
    <PARAM NAME="ASX" VALUE="http://www.radio-canada.ca/Medianet/2010/CBF/DeRemarquablesOublies201001210200_m.asx" />
    <PARAM NAME="isVideo" VALUE="False" />
    <PARAM NAME="fileType" VALUE="wma" />
    <PARAM NAME="isLive" VALUE="false" />
    <PARAM NAME="emission" VALUE="De remarquables oubliés" />
    <PARAM NAME="chaine" VALUE="CBF" />
    <PARAM NAME="diffusion" VALUE="integral" />
    <PARAM NAME="station" VALUE="Montréal" />
    <STARTTIME VALUE="00:00:00.000" />
  </ENTRY>

I couldn't find any means to extract REF's HREF attribute. I tried 'ref attr{href}' and many other things ('entry:first-child attr{href}' comes to mind).

The -c flag works but mangles the DOM a little, which lead me to think that parsing that XML would work fine. Extracting title works.

Is slice{} no longer included?

on laptop

[foo@bar ]$ pup --version
0.3.2
[foo@bar ]$ curl -s www.google.com | pup a slice{0}
<a class="gb1" href="http://www.google.com/imghp?hl=en&amp;tab=wi">
  Images
  </a>
[foo@bar ]$

on desktop

[foo@bar pup]$ ./pup --version
0.3.7
[foo@bar pup]$ curl -s www.google.com | ./pup a slice{0}
[foo@bar pup]$

Single/double quotes appear as html entities

Single and double qoutes are displayed as html entities even if text{} modifier is used
Actual:

$ echo "<div>\"quote'd\"</div>" | pup "div text{}"
&#34;quote&#39;d&#34;

Expected

$ echo "<div>\"quote'd\"</div>" | pup "div text{}"
"quote'd"

':parent-of' & ':contains' pseudo-selectors

Hi,

I suggest adding a :parent-of(selector) (non-standard) and :contains("text") (standard) to the selector list.

It would be very useful to select the right elements when they don't have any specific class.