GithubHelp home page GithubHelp logo

Comments (8)

kota-kaihori avatar kota-kaihori commented on July 29, 2024 2

とりあえず渋谷のお天気10日分はとれた!!

# URLにアクセスするためのライブラリの読み込み
require 'open-uri'
# Nokogiriライブラリの読み込み
require 'nokogiri'

# スクレイピング先のURL
url = 'https://tenki.jp/forecast/3/16/4410/13113/'

charset = nil
html = open(url) do |f|
  charset = f.charset # 文字種別を取得
  f.read # htmlを読み込んで変数htmlに渡す
end

# htmlをパース(解析)してオブジェクトを生成
doc = Nokogiri::HTML.parse(html, nil, charset)

#今日と明日の日付(まだ整形できてません)
doc.xpath('//h3[@class="left-style"]').each do |node|
        p node.inner_text
end

# 3~10日後の天気(まだ整形できてません)
doc.xpath('//td[@class="cityday"]').each do |node|
        p node.inner_text
end

# 今日と明日の天気
doc.xpath('//div[@class="weather-icon"]').each do |node|
        p node.css('img').attribute('title').value
end

# 3~10日後の天気
doc.xpath('//td[@class="weather-icon"]').each do |node|
        p node.css('img').attribute('title').value

end

結果

"今日 07月10日(火)[先勝]"
"明日 07月11日(水)[友引]"
"天気ガイド"
"注目の情報"
"07月12日      \n      (木)\n      "
"07月13日      \n      (金)\n      "
"07月14日      \n      (土)\n      "
"07月15日      \n      (日)\n      "
"07月16日      \n      (月)\n      "
"07月17日      \n      (火)\n      "
"07月18日      \n      (水)\n      "
"07月19日      \n      (木)\n      "
"曇"
"曇"
"曇"
"曇"
"晴時々曇"
"晴時々曇"
"曇時々晴"
"曇時々晴"
"晴"
"晴"

from test-project.

kota-kaihori avatar kota-kaihori commented on July 29, 2024 1

https://morizyun.github.io/blog/ruby-nokogiri-scraping-tutorial/#7
http://inobo52.hatenablog.com/entry/2014/09/04/Ruby%E3%81%A7HTML%E8%A7%A3%E6%9E%90%E3%81%8C%E8%B6%85%E4%BD%99%E8%A3%95%E3%81%AA%E3%82%93%E3%81%A7%E3%81%99
できそうですな~

from test-project.

inaohiro avatar inaohiro commented on July 29, 2024 1

スクレイピング関数を作って、最後に下のような感じで return する感じです

中のデータは適当です。

def scraping

// tenki.jp からデータをとってきて欲しい情報だけ抜き出し

{
    data: [
        {
            date: "2018/07/10",
            weather: "晴れ",
            temparature: {
                max: "36度",
                min: "30度"
            }
        },
        {
            date: "2018/07/11",
            weather: "晴れのち曇り",
            temparature: {
                max: "32度",
                min: "27度"
            }
        }
    ]
}
end

from test-project.

inaohiro avatar inaohiro commented on July 29, 2024

次は、画面に出す情報は何が必要なのか ( 天気、温度、洗濯指数等 ) を考えないとですね

from test-project.

inaohiro avatar inaohiro commented on July 29, 2024

ウェブスクレイピングする方法は確立したと思うので、そろそろ close しちゃいましょうか
つづき (実装) は #3

from test-project.

kota-kaihori avatar kota-kaihori commented on July 29, 2024

整形まで

# 日付整形
def shaping(s,n,m)
  s2 = s.split("")
  s3 = s2[n..m]
  s3.select!{ |item| item =~ /^[0-9日月火水木金土]/}
  if s3.empty? == false then
    p s3
  end
end

def scrape()

  # URLにアクセスするためのライブラリの読み込み
  require 'open-uri'
  # Nokogiriライブラリの読み込み
  require 'nokogiri'

  # スクレイピング先のURL
  url = 'https://tenki.jp/forecast/3/16/4410/13113/'

  charset = nil
  html = open(url) do |f|
  charset = f.charset # 文字種別を取得
  f.read # htmlを読み込んで変数htmlに渡す
  end

  # htmlをパース(解析)してオブジェクトを生成
  doc = Nokogiri::HTML.parse(html, nil, charset)

  # 今日と明日の日付
  doc.xpath('//h3[@class="left-style"]').each do |node|
    s = node.inner_text
    shaping(s,3,11)
  end
  # 3~10日後の日付
  doc.xpath('//td[@class="cityday"]').each do |node|
    s = node.inner_text
    shaping(s,0,20)
  end

  # 今日と明日の天気
  doc.xpath('//div[@class="weather-icon"]').each do |node|
    p node.css('img').attribute('title').value
  end

  # 3~10日後の天気
  doc.xpath('//td[@class="weather-icon"]').each do |node|
    p node.css('img').attribute('title').value
  end

end

scrape()

結果

["0", "7", "月", "1", "7", "日", "火"]
["0", "7", "月", "1", "8", "日", "水"]
["0", "7", "月", "1", "9", "日", "木"]
["0", "7", "月", "2", "0", "日", "金"]
["0", "7", "月", "2", "1", "日", "土"]
["0", "7", "月", "2", "2", "日", "日"]
["0", "7", "月", "2", "3", "日", "月"]
["0", "7", "月", "2", "4", "日", "火"]
["0", "7", "月", "2", "5", "日", "水"]
["0", "7", "月", "2", "6", "日", "木"]
"晴"
"曇のち晴"
"晴のち曇"
"曇時々晴"
"晴時々曇"
"晴時々曇"
"晴時々曇"
"晴時々曇"
"晴"
"曇一時雨"

from test-project.

kota-kaihori avatar kota-kaihori commented on July 29, 2024

降水確率も含めました。

# 日付整形
def shaping(s,n,m)
  s2 = s.split("")
  s3 = s2[n..m]
  s3.select!{ |item| item =~ /^[0-9日月火水木金土]/}
end

def scrape()
  # URLにアクセスするためのライブラリの読み込み
  require 'open-uri'
  # Nokogiriライブラリの読み込み
  require 'nokogiri'

  # スクレイピング先のURL
  url = 'https://tenki.jp/forecast/3/16/4410/13113/'

  charset = nil
  html = open(url) do |f|
  charset = f.charset # 文字種別を取得
  f.read # htmlを読み込んで変数htmlに渡す
  end

  # htmlをパース(解析)してオブジェクトを生成
  doc = Nokogiri::HTML.parse(html, nil, charset)

  date = Array.new
  weather = Array.new
  rainprobability = Array.new
  
  # 今日と明日の日付
  doc.xpath('//h3[@class="left-style"]').each do |node|
    s = node.inner_text
    s2 = shaping(s,3,11)
    if s2.empty? == false then
      date.push(s2)
    end
  end
  # 3~10日後の日付
  doc.xpath('//td[@class="cityday"]').each do |node|
    s = node.inner_text
    s2 = shaping(s,0,20)
    if s2.empty? == false then
      date.push(s2)
    end
  end

  # 今日と明日の天気
  doc.xpath('//div[@class="weather-icon"]').each do |node|
    s = node.css('img').attribute('title').value
    weather.push(s)
  end

  # 3~10日後の天気
  doc.xpath('//td[@class="weather-icon"]').each do |node|
    s = node.css('img').attribute('title').value
    weather.push(s)
  end
  
  # 今日と明日の降水確率(最大値)
  doc.xpath('//tr[@class="rain-probability"]').each do |node|
    s = node.inner_text
    s2 = s.split("\n      ")
    s4 = s2[2..5].map!{|item| item.delete("  ")}
    s4.delete("---")
    s4.map!{|item| item.delete("%")}.map!(&:to_i)
    rainprobability.push(s4.max)
  end
  
  # 3~10日後の降水確率
  doc.xpath('//p[@class="precip"]').each do |node|
    s = node.inner_text
    s.delete!("%").to_i
    rainprobability.push(s)
  end

  # ハッシュ作成
  hash = {}
  for i in 0..9 do
    data = [weather[i], rainprobability[i]]
    hash.store(date[i],data)
  end
  p hash
  
end

scrape()

結果

{["0", "7", "月", "1", "8", "日", "水"]=>["晴", 0], ["0", "7", "月", "1", "9", "日", "木"]=>["曇時々晴", 0], ["0", "7", "月", "2", "0", "日", "金"]=>["晴", "10"], ["0", "7", "月", "2", "1", "日", "土"]=>["晴のち曇", "30"], ["0", "7", "月", "2", "2", "日", "日"]=>["晴時々曇", "30"], ["0", "7", "月", "2", "3", "日", "月"]=>["晴時々曇", "30"], ["0", "7", "月", "2", "4", "日", "火"]=>["晴時々曇", "20"], ["0", "7", "月", "2", "5", "日", "水"]=>["晴一時雨", "60"], ["0", "7", "月", "2", "6", "日", "木"]=>["晴一時雨", "60"], ["0", "7", "月", "2", "7", "日", "金"]=>["晴一時雨", "60"]}

from test-project.

inaohiro avatar inaohiro commented on July 29, 2024

方法は決まって,出力できるようになったし一旦 close

from test-project.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.