GithubHelp home page GithubHelp logo

The root element doesn't change. about grab HOT 6 CLOSED

lorien avatar lorien commented on May 18, 2024
The root element doesn't change.

from grab.

Comments (6)

lorien avatar lorien commented on May 18, 2024

Sorry, I can't change that. I use lxml for a years and this behaviour seems to me natural. That is how lxml works:

>>> from lxml.html import fromstring
>>> HTML = "<body><table><tr><td>1</td><td>2</td></tr></table></body"
>>> elem = fromstring(HTML).xpath('//td')[0]
>>> elem.text
'1'
>>> elem.xpath('//td')
[<Element td at 0x1b89b30>, <Element td at 0x1b89bf0>]

You can see that elem.xpath('//td') searches elements from the root of the document and not from the current elemen.

If you want to search from the current element use: .//

from grab.

madcat1991 avatar madcat1991 commented on May 18, 2024

It's easier than you think :), just make your element the tree element:

In [1]: from lxml.html import fromstring

In [2]: HTML = "<body><table><tr><td>1</td><td>2</td></tr></table></body"

In [3]: elem = fromstring(HTML).xpath('//td')[0]

In [4]: elem.text
Out[4]: '1'

In [5]: elem.xpath('//td')
Out[5]: [<Element td at 0x994c11c>, <Element td at 0x994c38c>]

In [9]: import lxml

In [11]: root = lxml.etree.ElementTree(elem)

In [12]: root.xpath('//td')
Out[12]: [<Element td at 0x994c11c>]

from grab.

lorien avatar lorien commented on May 18, 2024

It was not an example of why that could not be done. It was an example of that when you get some Element from results of calling lxml xpath method the "//" queries on that elements are addressed from the root of the tree.

Besides that, there is really bad drawback (at least, for me) in making element the root of tree. In this case you loose the connection to other parts of the tree

>>> root.xpath('..')
[]

from grab.

madcat1991 avatar madcat1991 commented on May 18, 2024

Ok, I understand your point of view. But, nevertheless, I think, that when somebody iterates through the selected elements he/she doesn't interesting in returning back to parent node. Like here:

for item in grab.doc.select('//div[@class="item"]'):
    # i interested only in item's content

If I want to get parent node I would just create the variable of parent node, and then iterate through it. But of course this is just my programming style.

from grab.

lorien avatar lorien commented on May 18, 2024

I can't find right now examples of source code but I remember I used multiple times addressing to parent or sibling elements via xpath expressions. BTW, do you know how scrapy selectors work? Do they use Grab approach or your approach?

from grab.

madcat1991 avatar madcat1991 commented on May 18, 2024

Unfortunately I never work with scrapy :)

from grab.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.