Use scrapy with a list of proxies generated from proxynova.com
The first run will generate the list of proxies from http://proxynova.com and store it in the cache.
It will individually check each proxy to see if they work and remove the ones that timed out or cannot connect to.
Example:
./run_example.sh
To regenerate the proxy list, run: python proxies.py
In settings.py add the following line: DOWNLOADER_MIDDLEWARES = { 'scrapy_proxynova.middleware.HttpProxyMiddleware': 543 }
Set these options in the settings.py
.
- PROXY_SERVER_LIST_CACHE_FILE โ a file to store proxies list. Default:
proxies.txt
. - PROXY_BYPASS_PERCENT โ probability for a connection to use a direct connection and not use a proxy