adriankumpf / findmeaflat Goto Github PK
View Code? Open in Web Editor NEWGet notified of new listings on popular German real estate portals.
Get notified of new listings on popular German real estate portals.
Hi,
when I try to run the bot, I get the error
Caught Error: TypeError: Cannot read property 'split' of undefined
at normalize (~/findmeaflat/lib/sources/kleinanzeigen.js:7:21)
at Array.map (<anonymous>)
at FlatFinder.run (~/findmeaflat/lib/flatfinder.js:22:8)
at runMicrotasks (<anonymous>)
at processTicksAndRejections (node:internal/process/task_queues:94:5)
at async Promise.all (index 3) {
[stack]: "TypeError: Cannot read property 'split' of undefined\n" +
' at normalize (~/findmeaflat/lib/sources/kleinanzeigen.js:7:21)\n' +
' at Array.map (<anonymous>)\n' +
' at FlatFinder.run (~/findmeaflat/lib/flatfinder.js:22:8)\n' +
' at runMicrotasks (<anonymous>)\n' +
' at processTicksAndRejections (node:internal/process/task_queues:94:5)\n' +
' at async Promise.all (index 3)',
[message]: "Cannot read property 'split' of undefined"
However, if I replace the line
const address = o.address.split('\n')[4].trim()
in lib/sources/kleinanzeigen.js by
const address = o.address
the bot works fine.
Cheers!
Unfortunately, while working on the PR #22, I have experienced a blocking behavior in case of ImmoScout. It seems like they have bumped up their crawler detection game.
Any way to resolve this? Would https://www.npmjs.com/package/captcha-solver help?
Is there a way to respect Telegram's message limits in your bot? They are described in https://core.telegram.org/bots/faq#my-bot-is-hitting-limits-how-do-i-avoid-this:
When sending messages inside a particular chat, avoid sending more than one message per second. We may allow short bursts that go over this limit, but eventually you'll begin receiving 429 errors. If you're sending bulk notifications to multiple users, the API will not allow more than 30 messages per second or so. Consider spreading out notifications over large intervals of 8โ12 hours for best results. Also note that your bot will not be able to send more than 20 messages per minute to the same group.
I need to restart the docker container multiple times a day to make it catch up new offers. Is there something I could do to troubleshoot this behavior? I would prefer if this restart is not necessary...
Hi there, i am starting with java and tried to understand how the crawler works. so i tried to make a howoge crawler.
but i am really not so sure how i would start it. maybe you could help me with that?
here is an example link:
https://www.howoge.de/wohnungen-gewerbe/wohnungssuche.html?tx_howsite_json_list%5Bpage%5D=1&tx_howsite_json_list%5Blimit%5D=12&tx_howsite_json_list%5Blang%5D=&tx_howsite_json_list%5Bkiez%5D%5B%5D=Marzahn&tx_howsite_json_list%5Bkiez%5D%5B%5D=99&tx_howsite_json_list%5Bkiez%5D%5B%5D=Buch&tx_howsite_json_list%5Bkiez%5D%5B%5D=Alt-Hohensch%C3%B6nhausen&tx_howsite_json_list%5Bkiez%5D%5B%5D=Neu-Hohensch%C3%B6nhausen&tx_howsite_json_list%5Bkiez%5D%5B%5D=Fennpfuhl&tx_howsite_json_list%5Bkiez%5D%5B%5D=Alt-Lichtenberg&tx_howsite_json_list%5Bkiez%5D%5B%5D=Friedrichsfelde&tx_howsite_json_list%5Bkiez%5D%5B%5D=Karlshorst&tx_howsite_json_list%5Bkiez%5D%5B%5D=Treptow-K%C3%B6penick&tx_howsite_json_list%5Bkiez%5D%5B%5D=Pankow&tx_howsite_json_list%5Brent%5D=900&tx_howsite_json_list%5Barea%5D=70&tx_howsite_json_list%5Brooms%5D=2&tx_howsite_json_list%5Bwbs%5D=all-offers
i then created a howoge.js in the sources folder with this howoge object:
const howoge = {
name: 'howoge',
enabled,
url: !enabled || config.providers.howoge.url,
crawlContainer: '#immoobject-list',
crawlFields: {
id: '.aditem@data-adid | int',
price: '.div:nth-child(1) div div.content div.row div:nth-child(1) div div:nth-child(1) div.attributes-content.color-secondary | removeNewline | trim',
size: '.div:nth-child(1) div div.content div.row div:nth-child(1) div div:nth-child(2) div.attributes-content | removeNewline | trim',
title: '.div:nth-child(1) div div.content div.notice | removeNewline | trim',
link: '.div:nth-child(1) div div.content div.address a@href | removeNewline | trim',
description: '.div:nth-child(1) div div.content div.notice | removeNewline | trim',
address: '.div:nth-child(2) div div.content div.address a | removeNewline | trim',
rooms: '.div:nth-child(2) div div.content div.row div:nth-child(1) div div.wrap-xs.d-md-none div div.attributes-content | removeNewline | trim',
},
paginate: 'div:nth-child(6) div div div:nth-child(2) div:nth-child(3) div ul li.pagination--page-next a@href',
normalize: normalize,
filter: applyBlacklist,
}
i now dont really know how i would find out the correct selectors for the properties. my way was to try "copy selector" within chrome:
so for example i came up with this selector in chrome for 'price':
#immoobject-list > div:nth-child(4) > div > div.content > div.row > div:nth-child(1) > div > div:nth-child(1) > div.attributes-content.color-secondary
and converted it to what you have used in the other crawlers:
'.div:nth-child(1) div div.content div.row div:nth-child(1) div div:nth-child(1) div.attributes-content.color-secondary | removeNewline | trim',
so here is what i dont understand:
Thank you so much in advance for help, and sorry for asking so dumb questions - i am just starting to learn java...
Cheers
When I have a search for ebay kleinanzeigen, the scraper fails with
TypeError: Cannot read property 'join' of undefined
at Object.isOneOf (/app/lib/utils.js:2:42)
at applyBlacklist (/app/lib/sources/kleinanzeigen.js:13:32)
at Array.filter (<anonymous>)
at FlatFinder.run (/app/lib/flatfinder.js:21:8)
at process._tickCallback (internal/process/next_tick.js:68:7)
If I add this lines to utils.js
it seems to work:
function isOneOf(word, arr) {
if (arr === undefined) {
return true
}
I tried to debug what the reason is for this behavior but I was not able to find it out. Can you reproduce?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.