hi, is there any sample which shows how to use custom parsar callback instead of use d

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

for python part, my demo code seems like this <a target="_blank" rel="noopener nor

sorry <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

custom parsar callback sample about crawly HOT 7 CLOSED

ziyouchutuwenwu commented on June 9, 2024

custom parsar callback sample

from crawly.

Comments (7)

oltarasenko commented on June 9, 2024

@Ziinc probably can give more info here.

But could you please describe the use case? Why can't you use parse_item?

from crawly.

ziyouchutuwenwu commented on June 9, 2024

here is my usage scenario:

for site demo.com, i need to get some info such as title, category for the main page.
and get the sub url from some links
when i get the sub url, i send requests, then parse data from response, here i need to get some detail info, such as author, price and etc.

the data parsar from sub page should be different from main page, i don't know how to do it through crawly.

great thanks.

from crawly.

ziyouchutuwenwu commented on June 9, 2024

for python part, my demo code seems like this

from crawly.

oltarasenko commented on June 9, 2024

So... Do you have different items on different pages? Or same data just structured differently?

from crawly.

ziyouchutuwenwu commented on June 9, 2024

yes, basiclly, i have different data structure on different pages, but according to the sample code, i don't know how to write the code.
It will be appreciate if there are some examples that can help me.

from crawly.

oltarasenko commented on June 9, 2024

Sorry I still don't understand if that's one of these two:

Same item which can be extracted with other selectors
Two different items

from crawly.

Ziinc commented on June 9, 2024

sorry @ziyouchutuwenwu I only just saw this, must have missed the ping.

Parsers are meant for commonly used logic that you want to reuse across spiders. A parser is simply a Pipeline module, with the result of each Parser being passed to the next. The opts 3rd positional arg allows you to provide spider-specific configuration to your parser.

For example, on site 1, you want to extract all links with a h1 tag but filter them out based on some site-specific filter function, and build requests from all extracted links:

# spider 1
parsers: [
  {MyCustomRequestParser, [selector: ".h1", filter: &my_filter_function/1]}
]

Then, in spider 2 that is crawling site 2, we only want h2 tags, but without using any filtering:

# spider 2
parsers: [
  {MyCustomRequestParser, [selector: ".h2"]}
]

Then your MyCustomRequestParser.run/3 contains the logic required to select and build the requests

from crawly.

Recommend Projects

custom parsar callback sample about crawly HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs