GithubHelp home page GithubHelp logo

Comments (5)

vesper8 avatar vesper8 commented on May 22, 2024

So I've been debugging.. I can confirm that all the data is returned from the API.

I can also confirm that the wikitables gets stripped as a result of this line

$sections = (new SectionsParser($title, $body))->sections();

That is, if I comment out this line then the tables are not removed. I'm still trying to understand why exactly this is happening

from laravel-wikipedia-grabber.

dmitry-ivanov avatar dmitry-ivanov commented on May 22, 2024

It looks like Wikipedia responding in such a way for that page (yeah, it happens for some pages).

You can check the pure Wiki response by this query:
https://en.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&redirects=1&prop=extracts&exlimit=1&explaintext=1&exsectionformat=wiki&titles=2019_Australian_Open

Before the sections parsing there is no Matches info as well.

Maybe you see the "day info", which is above the table and consider it as a table contents?

Because I don't see it neither in response, nor in $body dump (which is the same).

from laravel-wikipedia-grabber.

dmitry-ivanov avatar dmitry-ivanov commented on May 22, 2024

PS: Actually I've seen that tables content is not present in the output from their API many times.

I'm requesting "wiki" format, which is the plain text. Maybe they consider that tables are some kind of rich content, which is NOT the plain text.

from laravel-wikipedia-grabber.

vesper8 avatar vesper8 commented on May 22, 2024

Oops. I was running your package as part of a console command and using grep to determine if the wikitables content was included. If you comment out the line I pointed to above then you get an error of course since the $sections variable is missing. As part of the error message you can see the full body which does include the wikitables hence my grep was returning true

So it isn't necessarily that part of your code that strips away the wikitables

However, the api call IS returning the wikitables. Your grabber is somehow stripping it at some point or discarding it.

Here's the API call generated by your package (which I was able to get by adding debug=true to the guzzle params)

https://en.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&redirects=1&prop=pageprops%7Cextracts%7Crevisions%7Cpageimages%7Cimages&titles=2019%20Australian%20Open&ppprop=disambiguation&exlimit=1&explaintext=1&exsectionformat=wiki&rvprop=content&rvcontentformat=text%2Fx-wiki&piprop=thumbnail%7Coriginal&pithumbsize=300&imlimit=max

You can search for the string "Matches on main courts" inside that response to see that the matches are included.. however they are included as part of a revisions which I'm not sure how that should affect it's availability as part of the Page object

from laravel-wikipedia-grabber.

dmitry-ivanov avatar dmitry-ivanov commented on May 22, 2024

I use revisions to link sections with the proper images.

The content of the page is taken from the plain "extract", and there is no table data there, unfortunately.

That's why you don't see it in output.

from laravel-wikipedia-grabber.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.