<div class="highlight highlight-source-r notranslate position-relative overflow-auto" dir="auto" dat

I was thinking about this from an <a href="http://stackoverflow.com/a/26702887/1457051

I think it would be better to return NULL for <code c

@FrankCPhoto as <a class="user-mention notranslate" data-hovercard-type="user" data-ho

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Error when no matching nodes about rvest HOT 13 CLOSED

tidyverse commented on August 22, 2024

Error when no matching nodes

from rvest.

Comments (13)

hrbrmstr commented on August 22, 2024

I was thinking about this from an SO post a while back. Perhaps either (a) there could be a separate html_nodes_exists function (similar to the one I hacked up at that link) or (b) html_nodes could have something like a check_first logical parameter that wraps the resultant XPath with a boolean() and does a test first or (c) some combo of both. I can take a stab at this depending on which way you'd be leaning (or if you have other ideas).

from rvest.

hadley commented on August 22, 2024

I think it would be better to return NULL for html_nodes(). Then that gives an easy existence check

from rvest.

FrankCData commented on August 22, 2024

Not sure if this belongs here (or should be noted separately) - but I'm having a related issue scraping a page with 1 missing value. I.E. for one "row" of data, a value is missing. The node is also completely absent in the html (not just empty or containing NA, so I get an error "arguments imply differing number of rows").

e.g.
Flickr_One_Camera_Data <- html("https://www.flickr.com/cameras/canon")

FACD <- data.frame(brand = "canon",
name=Flickr_One_Camera_Data %>% html_nodes(" td a") %>% html_text(),
type=Flickr_One_Camera_Data %>% html_nodes("td:nth-child(6) ") %>% html_text(),
stringsAsFactors=FALSE)

Rank (row) 28 has no entry for type, and I can't seem to be able to trap this in any way, and replace it with a default value - presumably something related to html_nodes()

from rvest.

hrbrmstr commented on August 22, 2024

You can work around that by using html_table and setting fill=TRUE:

html_table(Flickr_One_Camera_Data, fill=TRUE)[[1]][28,]
##    Rank ▾  Name # of items Avg. daily users Activity Factor Type
## 28     28 EOS M  3,422,744              224               6 <NA>

Then cleaning up the columns afterwards. The resultant XPath to retrieve the td nodes in the html_nodes call is behaving as it's expected to. I'm not sure html_nodes needs a fill option.

from rvest.

FrankCData commented on August 22, 2024

Thanks - that deals nicely with the missing value.

However, not being able to use html_nodes is giving me a lot of unwanted data in some columns I will need to clean up (as you mention).
So, if "fill" is undesirable - is there some other way of detecting a missing node value ?

(In my original version, I only select the actual data I need - using SelectorGadget and CSS selectors. The node contains less data than a column, so no cleaning-up is needed - but I have the problem of missing nodes, hence my original post)

from rvest.

hadley commented on August 22, 2024

@FrankCPhoto how can rvest detect something that is missing? I don't see how you could tell which column is the one that's missing.

from rvest.

FrankCData commented on August 22, 2024

(Disclaimer first - I'm a relative beginner in R)

I was thinking along the lines : when reading tabular data, each column should have be present - or have some way to trap it if not. Of course, I appreciate that not all scraped data is tabular - and a node could just be a single value from anywhere on a web page. So, this would probably have to apply to tabular data only.

It's not an problem I've had before i.e the complete absence of a node, as opposed to an empty "field". But I imagine it might be a common enough issue for web scraping. If I can trap it somehow (I was trying things like is.na etc) then I can handle it.

from rvest.

hadley commented on August 22, 2024

@FrankCPhoto as @hrbrmstr suggested you'll need to use fill = TRUE and then clean up yourself. I can't see anyway that httr could help you more than that.

from rvest.

FrankCData commented on August 22, 2024

@hadley (& @hrbrmstr) Thanks for the help - have manged to work around it using fill.

from rvest.

sebastianbarfort commented on August 22, 2024

I really think a html_nodes_exist function would be useful. I get the Error in class(out) <- "XMLNodeSet" : attempt to set an attribute on NULL error message a lot and not I can figure out how to write one myself. The SO answer uses xpath but that really defeats the purpose of using CSS selectors.

from rvest.

wqr786 commented on August 22, 2024

I think the html_nodes() should return a NULL value in case if a value doesn't exist.

I see @hadley committed and closed the issue. But I'm trying it out and it doesn't seem to work for me. I am trying to fetch some values from pages and for one of the page it is giving this error:
L::xmlValue, ..., .type = character(1)) :
Unknown input of class: NULL

Anyone know of this?

from rvest.

hrbrmstr commented on August 22, 2024

Can you post an example?

On Sat, Aug 1, 2015 at 7:44 AM, Waqar [email protected] wrote:

I think the html_nodes() should return a NULL value in case if a value
doesn't exist.

I see @hadley https://github.com/hadley committed and closed the issue.
But I'm trying it out and it doesn't seem to work for me. I am trying to
fetch some values from pages and for one of the page it is giving this
error:
L::xmlValue, ..., .type = character(1)) :
Unknown input of class: NULL

Anyone know of this?

—
Reply to this email directly or view it on GitHub
#31 (comment).

from rvest.

hadley commented on August 22, 2024

Please file a reproducible example as a new issue

from rvest.

Error when no matching nodes about rvest HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs