GithubHelp home page GithubHelp logo

seykotron / scraplinks Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 886 KB

Project to show how to scrape a web to get all the links of a specific section

License: GNU General Public License v3.0

Python 100.00%

scraplinks's Introduction

scraplinks

Project to show how to scrape a web to get all the links of a specific section

First of all you need to install your dependencies, in this time you will need those who are in requirements.txt

Just run in your virtual enviroment of python3 the next command:

pip install -r requirements.txt

Now you have to know the target you want to scrape, in this case I will scrape www.maxmovil.com which is a vertical of mobile phones in Spain.

This url will be the base:

I will get all the url's to the details of a mobile, the image, the price and the name and show it in a pandas dataframe. In pandas its very easy to export to excel, csv, etc, thats why.

Alt text

Now I want to find where are the url of each link, for that I use the "Inspect Element" tool of google, find in the text (Command+F in Mac, Ctrl + F in Windows/Linux) and in this particular case I search for the price (because the brand of the car could appear in Select's Boxes):

Alt text

We can see in the image that the anchor is wrapped in a div with css class "item-area", we would have to find childs with the anchor, image, title and price!

This scene will help us to:

  • Get an attribute of an HTML Element
  • Get the text inside of an HTML element
  • Travel to the child of an element and get the text
  • Get the url of an image in the website

Once you complete the "tutorial" you will be able to do that by yourself and earn some extra cash scrapping some webs for fun&profit!!

Thank you!

Follow me on:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.