GithubHelp home page GithubHelp logo

scraper's Introduction

Scraper

Scraping web pages as a way to learn more about Elixir, Erlang, tools, CSV, HTTP, and CSS.

edit mix.exs ... add deps
mix deps.get
... test:
iex -S mix
response = HTTPotion.get "http://cleesmith.github.io/"
HTTPotion.Response.success?(response)
response.headers
response.body

# the default timeout is 5000 ms, but can be changed:
response = HTTPotion.get "http://cleesmith.github.io/", [timeout: 10_000]

parsed = Floki.parse(response.body)
ttl = Floki.find(response.body, "title")
i ttl
... floki find returns a list []:
  [{"title", [], ["Thoughts about health, human nature, programming, GoLang, Ruby, Python, C, and Java"]}]
... get the 1st one from the list, ie the head:
ttt = hd ttl
i ttt
... returns a tuple {}:
  {"title", [], ["Thoughts about health, human nature, programming, GoLang, Ruby, Python, C, and Java"]}
tl = elem(ttt, 2)
i tl
... returns a list []:
  ["Thoughts about health, human nature, programming, GoLang, Ruby, Python, C, and Java"]
tls = hd tl
i tls
... returns a string:
  "Thoughts about health, human nature, programming, GoLang, Ruby, Python, C, and Java"


... how to get meta description:
meta_description = response.body |> Floki.find("meta[name='description']") |> Floki.attribute("content")

... CSV stuff:
File.stream!("cls.csv") |>
CSV.decode(separator: ?\t) |>
Enum.map(fn row ->
  Enum.map(row, &String.upcase/1)
end)

File.stream!("cls.csv") |>
CSV.decode(separator: ?\t) |>
Enum.each(&IO.puts/1) ... or Enum.each(&IO.puts &1)

1 .. 100 |>
Enum.map(fn (i) -> "http://localhost/?#{i}" end) |>
Enum.to_list

scraper's People

Contributors

cleesmith avatar

Watchers

 avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.