What does this script do?
It takes a list of cities and job skills as inputs, searches computer jobs, emails you jobs (post title, link, and associated skils), and dumps the results to sqlite. Duplicate jobs are not sent or added to the database.
Setup
- Clone:
git clone [email protected]:mjhea0/ultimate-craigslist-scraper.git
- Navigate to the directory -
cd ultimate-craigslist-scraper
- Activate virtualenv:
virtualenv --no-site-packages env
source env/bin/activate
- Install requirements -
pip install -r requirements.txt
- Setup the database:
python database.py
- Update your settings.py file:
- EMAIL_USER, EMAIL_PASSWORD, TO_EMAIL, SMTP_SERVER, SMTP_PORT
- MY_SKILLS_LIST
- CITIES_LIST
- Navigate to the scrapy root -
cd craigslist_jobs
- Run the scrapper:
scrapy crawl gigs
- Setup a cron to run once per week:
cron - 5 8 * * 6 /path/to/file scrapy crawl gigs
to do
- Search other categories besides computer jobs
- Option to dump to CSV
- Proxy Servers
- Web App