GithubHelp home page GithubHelp logo

Comments (10)

Tails avatar Tails commented on May 18, 2024 4

I will somewhere this week.

from kimuraframework.

seliverstov-maxim avatar seliverstov-maxim commented on May 18, 2024 1

Works for me (developing compilation):
Dockerfile

FROM ruby:2.5.3-stretch
RUN gem install kimurai
RUN apt-get update && apt-get install -q -y git unzip lsof wget tar openssl xvfb chromium \
                                        firefox-esr libsqlite3-dev sqlite3 mysql-client default-libmysqlclient-dev

RUN cd /tmp && \
    wget https://chromedriver.storage.googleapis.com/2.39/chromedriver_linux64.zip && \
    unzip chromedriver_linux64.zip -d /usr/local/bin && \
    rm -f chromedriver_linux64.zip

RUN cd /tmp && \
    wget https://github.com/mozilla/geckodriver/releases/download/v0.21.0/geckodriver-v0.21.0-linux64.tar.gz && \
    tar -xvzf geckodriver-v0.21.0-linux64.tar.gz -C /usr/local/bin && \
    rm -f geckodriver-v0.21.0-linux64.tar.gz

RUN apt install -q -y chrpath libxft-dev libfreetype6 libfreetype6-dev libfontconfig1 libfontconfig1-dev && \
    cd /tmp && \
    wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2 && \
    tar -xvjf phantomjs-2.1.1-linux-x86_64.tar.bz2 && \
    mv phantomjs-2.1.1-linux-x86_64 /usr/local/lib && \
    ln -s /usr/local/lib/phantomjs-2.1.1-linux-x86_64/bin/phantomjs /usr/local/bin && \
    rm -f phantomjs-2.1.1-linux-x86_64.tar.bz2

RUN mkdir -p /app

ADD Gemfile /app

RUN cd /app && bundle install

Gemfile

source 'https://rubygems.org' do
  gem 'kimurai'
  gem 'byebug'
end

Build

docker build . -t simple-kimurai 

Run (it opens container with installed env. for developing with mounetd current_dir)

docker run --rm -it -v ${PWD}:/app -w /app simple-kimurai bash

from kimuraframework.

seliverstov-maxim avatar seliverstov-maxim commented on May 18, 2024 1

It would be great if owner creates oficial docker image.

from kimuraframework.

iwoogy avatar iwoogy commented on May 18, 2024 1

I have put together an updated version for the docker configuration.

https://github.com/iwoogy/kimurai-docker-example

Hope it could help.

from kimuraframework.

vifreefly avatar vifreefly commented on May 18, 2024

@Tails, would you be interested to make a PR for this?

from kimuraframework.

patrykk21 avatar patrykk21 commented on May 18, 2024

How do you use this?

from kimuraframework.

seliverstov-maxim avatar seliverstov-maxim commented on May 18, 2024

IMHO docker image would be enough

from kimuraframework.

thanhtoan1196 avatar thanhtoan1196 commented on May 18, 2024

@seliverstov-maxim Dockerfile is great, but it crashes when running with multithreads

I, [2021-05-07 08:17:08 +0000#1693] [C: 47304296299360]  INFO -- MySpider: Info: visits: requests: 7, responses: 6
D, [2021-05-07 08:17:08 +0000#1693] [C: 47304296299360] DEBUG -- MySpider: Browser: driver.current_memory: 3837
I, [2021-05-07 08:17:08 +0000#1693] [C: 47304296299360]  INFO -- MySpider: Browser: driver selenium_chrome has been destroyed
#<Thread:0x0000560bc78df6c0@/usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:299 run> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
	19: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:305:in `block (2 levels) in in_parallel'
	18: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:305:in `each'
	17: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:313:in `block (3 levels) in in_parallel'
	16: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:204:in `request_to'
	15: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:204:in `public_send'
	14: from a.rb:33:in `try_parse'
	13: from a.rb:52:in `parse_question_page'
	12: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/capybara_ext/session.rb:21:in `visit'
	11: from /usr/local/bundle/gems/capybara-3.35.3/lib/capybara/session.rb:278:in `visit'
	10: from /usr/local/bundle/gems/capybara-3.35.3/lib/capybara/selenium/driver.rb:104:in `visit'
	 9: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/navigation.rb:32:in `to'
	 8: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/oss/bridge.rb:52:in `get'
	 7: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/oss/bridge.rb:587:in `execute'
	 6: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/bridge.rb:167:in `execute'
	 5: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:64:in `call'
	 4: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/default.rb:114:in `request'
	 3: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:88:in `create_response'
	 2: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:88:in `new'
	 1: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/response.rb:34:in `initialize'
/usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/response.rb:72:in `assert_ok': unknown error: session deleted because of page crash (Selenium::WebDriver::Error::UnknownError)
from unknown error: cannot determine loading status
from tab crashed
  (Session info: headless chrome=73.0.3683.75)
  (Driver info: chromedriver=2.39.562737 (dba483cee6a5f15e2e2d73df16968ab10b38a2bf),platform=Linux 5.10.25-linuxkit x86_64)
I, [2021-05-07 08:17:08 +0000#1693] [M: 47304283293120]  INFO -- MySpider: Browser: driver selenium_chrome has been destroyed
F, [2021-05-07 08:17:08 +0000#1693] [M: 47304283293120] FATAL -- MySpider: Spider: stopped: {:spider_name=>"MySpider", :status=>:failed, :error=>"#<Selenium::WebDriver::Error::UnknownError: unknown error: session deleted because of page crash\nfrom unknown error: cannot determine loading status\nfrom tab crashed\n  (Session info: headless chrome=73.0.3683.75)\n  (Driver info: chromedriver=2.39.562737 (dba483cee6a5f15e2e2d73df16968ab10b38a2bf),platform=Linux 5.10.25-linuxkit x86_64)>", :environment=>"development", :start_time=>2021-05-07 08:16:42 +0000, :stop_time=>2021-05-07 08:17:08 +0000, :running_time=>"25s", :visits=>{:requests=>7, :responses=>6}, :items=>{:sent=>0, :processed=>0}, :events=>{:requests_errors=>{}, :drop_items_errors=>{}, :custom=>{}}}
I, [2021-05-07 08:17:08 +0000#1693] [C: 47304296275900]  INFO -- MySpider: Browser: driver selenium_chrome has been destroyed
I, [2021-05-07 08:17:08 +0000#1693] [C: 47304296321600]  INFO -- MySpider: Browser: driver selenium_chrome has been destroyed
I, [2021-05-07 08:17:08 +0000#1693] [C: 47304296845720]  INFO -- MySpider: Browser: driver selenium_chrome has been destroyed
Traceback (most recent call last):
	19: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:305:in `block (2 levels) in in_parallel'
	18: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:305:in `each'
	17: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:313:in `block (3 levels) in in_parallel'
	16: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:204:in `request_to'
	15: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:204:in `public_send'
	14: from a.rb:33:in `try_parse'
	13: from a.rb:52:in `parse_question_page'
	12: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/capybara_ext/session.rb:21:in `visit'
	11: from /usr/local/bundle/gems/capybara-3.35.3/lib/capybara/session.rb:278:in `visit'
	10: from /usr/local/bundle/gems/capybara-3.35.3/lib/capybara/selenium/driver.rb:104:in `visit'
	 9: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/navigation.rb:32:in `to'
	 8: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/oss/bridge.rb:52:in `get'
	 7: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/oss/bridge.rb:587:in `execute'
	 6: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/bridge.rb:167:in `execute'
	 5: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:64:in `call'
	 4: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/default.rb:114:in `request'
	 3: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:88:in `create_response'
	 2: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:88:in `new'
	 1: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/response.rb:34:in `initialize'
/usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/response.rb:72:in `assert_ok': unknown error: session deleted because of page crash (Selenium::WebDriver::Error::UnknownError)
from unknown error: cannot determine loading status
from tab crashed
  (Session info: headless chrome=73.0.3683.75)
  (Driver info: chromedriver=2.39.562737 (dba483cee6a5f15e2e2d73df16968ab10b38a2bf),platform=Linux 5.10.25-linuxkit x86_64)

from kimuraframework.

hjhart avatar hjhart commented on May 18, 2024

I'm having the same issues with multithreading inside of a docker container. Code works great on my Mac OS X box.

::WebDriver::Error::UnknownError: unknown error: session deleted because of page crash\nfrom unknown error: cannot determine loading status\nfrom tab crashed\n  (Session info: headless chrome=86.0.4240.111)>", :environment=>"development", :start_time=>2021-07-25 18:06:00.6242447 +0000, :stop_time=>2021-07-25 18:06:18.1101284 +0000, :running_time=>"17s", :visits=>{:requests=>2, :responses=>1}, :items=>{:sent=>0, :processed=>0}, :events=>{:requests_errors=>{}, :drop_items_errors=>{}, :custom=>{}}}
/usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/response.rb:72:in `assert_ok': unknown error: session deleted because of page crash (Selenium::WebDriver::Error::UnknownError)
from unknown error: cannot determine loading status
from tab crashed
  (Session info: headless chrome=86.0.4240.111)
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/response.rb:34:in `initialize'
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:88:in `new'
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:88:in `create_response'
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/default.rb:114:in `request'
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:64:in `call'
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/bridge.rb:167:in `execute'
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/w3c/bridge.rb:567:in `execute'
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/w3c/bridge.rb:59:in `get'
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/navigation.rb:32:in `to'
        from /usr/local/bundle/gems/capybara-3.35.3/lib/capybara/selenium/driver.rb:104:in `visit'
        from /usr/local/bundle/gems/capybara-3.35.3/lib/capybara/session.rb:278:in `visit'
        from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/capybara_ext/session.rb:21:in `visit'
        from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:201:in `request_to'
        from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:313:in `block (3 levels) in in_parallel'
        from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:305:in `each'
        from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:305:in `block (2 levels) in in_parallel'

@thanhtoan1196 did you figure out a workaround?

from kimuraframework.

tellodaniel avatar tellodaniel commented on May 18, 2024

@hjhart @thanhtoan1196 In my case I can't modify certain configurations of my docker container so I added the following flag: --disable-dev-shm-usage and everything worked like a charm. The downside is that now is using /tmp folder and probably your spider will be slower.

Problem is described here: https://stackoverflow.com/questions/53902507/unknown-error-session-deleted-because-of-page-crash-from-unknown-error-cannot

from kimuraframework.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.