GithubHelp home page GithubHelp logo

Comments (9)

spekulatius avatar spekulatius commented on May 10, 2024 1

Hey @robertgarrigos

I can replicate the problem. It looks as if the error comes from the DOM crawler, not PHPScraper itself. The xPath could use some tweaking:

$myClassElements = $web->filter("//*[@class='prose']");

with ->text() you should get the text of the sub-nodes:

$myClassElements = $web->filter("//*[@class='prose']")->text();

I've also tried to use other PHPScraper built-in selectors and they worked. The $web->lists for example returns the lists as expected.

I hope this helps,
Peter

from phpscraper.

spekulatius avatar spekulatius commented on May 10, 2024 1

Hey everyone,

I've added a page to document the way custom selectors can be used: https://phpscraper.de/examples/custom-selectors.html

There are also some new tests for this: https://github.com/spekulatius/PHPScraper/blob/master/tests/CustomSelectorTest.php

Please let me know if you think anything is missing.

Cheers,
Peter

from phpscraper.

gcijuentes avatar gcijuentes commented on May 10, 2024

I have the same question :(

from phpscraper.

spekulatius avatar spekulatius commented on May 10, 2024

Hello @Kkiomen and @gcijuentes,

sorry for the late reply.

Have you tried the filterXPath method? It should allow you to simply filter by any class name using an xPath like $myClassElements = $web->filterXPath("//[@class='my-class']");.

Cheers,
Peter

from phpscraper.

robertgarrigos avatar robertgarrigos commented on May 10, 2024

While trying it, I'm getting this error:

Call to undefined method spekulatius\core::filterXPath()

I just installed PHPScrapper (0.6.2) with Composer and the first example of getting a website's title worked fine.

What am I missing?

from phpscraper.

spekulatius avatar spekulatius commented on May 10, 2024

Hey @robertgarrigos

Oh sorry, I mixed up the naming with the underlying package. It's filter instead of filterXPath. filterXPath is used in the DOM crawler package: https://github.com/symfony/dom-crawler/blob/8cb4c6e6c8d30c26f70529ed5e50d79a09576c0c/Crawler.php#L686

Please try again with filter. CC @Kkiomen and @gcijuentes

Cheers,
Peter

from phpscraper.

robertgarrigos avatar robertgarrigos commented on May 10, 2024

Still not working:

Warning: DOMXPath::query(): Invalid expression in /app/vendor/symfony/dom-crawler/Crawler.php on line 1013 Fatal error: Uncaught InvalidArgumentException: Expecting a DOMNodeList or DOMNode instance, an array, a string, or null, but got "bool". in /app/vendor/symfony/dom-crawler/Crawler.php:145 Stack trace: #0 /app/vendor/symfony/dom-crawler/Crawler.php(1013): Symfony\Component\DomCrawler\Crawler->add(false) #1 /app/vendor/symfony/dom-crawler/Crawler.php(771): Symfony\Component\DomCrawler\Crawler->filterRelativeXPath('descendant-or-s...') #2 /app/vendor/spekulatius/phpscraper/src/phpscraper.php(165): Symfony\Component\DomCrawler\Crawler->filterXPath('descendant-or-s...') #3 /app/vendor/spekulatius/phpscraper/src/phpscraper.php(60): spekulatius\core->filter('//[@class='pros...') #4 /app/phpscraper.php(11): spekulatius\phpscraper->__call('filter', Array) #5 {main} thrown in /app/vendor/symfony/dom-crawler/Crawler.php on line 145

from phpscraper.

spekulatius avatar spekulatius commented on May 10, 2024

from phpscraper.

robertgarrigos avatar robertgarrigos commented on May 10, 2024
require __DIR__ . '/vendor/autoload.php';

$web = new \spekulatius\phpscraper;

$web->go('https://www.lieder.net/lieder/get_settings.html?ComposerId=2520');

// print_r($web->title);

$myClassElements = $web->filter("//[@class='prose']");

print_r($myClassElements);

from phpscraper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.