GithubHelp home page GithubHelp logo

google_dl's People

Contributors

dependabot[bot] avatar juanj avatar kenorb avatar lu43n avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

google_dl's Issues

google_dl doesn't do anything.

The script doesn't downloads files anymore (it was some time ago).

Example:

$ google_dl -v http://example.com
$ google_dl "foo" -f pdf

I'm expecting that script will download the files as before.
It may be the issue in xgoogle lib, if so, send PR into mycognitive/xgoogle.

UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in position 48: ordinal not in range(128)

This command:

$ google_dl -x -r -s http://www.example.com/foo/bar/ "foo"
Downloading 'b'%0d%0a\xc3\xb6m_bar.pdf'' from 'b'http://www.example.com/%250d%250a-foo%C3%B6m_bar.pdf'' into . ...
Traceback (most recent call last):
  File "~/google_dl", line 171, in <module>
    print("File '%s' already exists, skipping." % (path))
UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in position 48: ordinal not in range(128)

fails with encoding error.

UnicodeEncodeError: 'ascii' codec can't encode character

Error:

Traceback (most recent call last):
  File "google_dl", line 169, in <module>
    print("Downloading '%s' from '%s' into %s..." % (filename, url, dirname))
UnicodeEncodeError: 'ascii' codec can't encode character '\u0142' in position 22: ordinal not in range(128)

ModuleNotFoundError: No module named 'xgoogle.search'

Hi
Followed your instructions in Readme. But search module is not found.

$ ./google_dl.py
Traceback (most recent call last):
  File "./google_dl.py", line 11, in <module>
    from xgoogle.search import GoogleSearch, SearchError
ModuleNotFoundError: No module named 'xgoogle.search'

xgoogle is installed:

$ pip list
Package        Version
-------------- -------
beautifulsoup4 4.4.1
chardet        2.3.0
colorama       0.3.2
html5lib       0.999
nltk           3.0.5
pip            19.2.3
requests       2.4.3
setuptools     41.2.0
six            1.10.0
urllib3        1.9.1
wheel          0.24.0
xgoogle        1.4

search.py is present in xgoogle directory:

$ ls xgoogle/xgoogle/
BeautifulSoup.py   browser.py         realtime.py        sponsoredlinks.py
__init__.py        googlesets.py      search.py          translate.py

http.client.BadStatusLine

Downloading 'b'foo.pdf'' from 'b'http://www.example.com/foo.pdf'' into ....
Traceback (most recent call last):
  File "google_dl", line 173, in <module>
    page.dlFile(url, path)
  File "google_dl", line 54, in dlFile
    with urllib.request.urlopen(request) as i, open(path, "wb") as o:
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 161, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 463, in open
    response = self._open(req, data)
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 481, in _open
    '_open', req)
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 441, in _call_chain
    result = func(*args)
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 1210, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 1185, in do_open
    r = h.getresponse()
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 1171, in getresponse
    response.begin()
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 351, in begin
    version, status, reason = self._read_status()
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 321, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: ''

This may happen at random.

Doesn't return all the repeated results

$ google_dl -v -x -r "site:www.imdb.com/board"
site:www.imdb.com/board
Trying to download results from page #1  (results 1-47)
...

$ python3
>>> from xgoogle.search import GoogleSearch
>>> gs = GoogleSearch("site:www.imdb.com/board", repeat=True)

The tool is only downloading 47 items, but Google reports 5,740 results. It needs to take into account all of them.

The fix should be either in this repo or in xgoogle, depending where is the problem.

image

Support for omitted results

The following command downloads only one file:

google_dl -s mast.queensu.ca -f pdf ""

The expected result is to download all the files.

For example when googling site:mast.queensu.ca filetype:pdf it also shows one file, but when I click on repeat the search with the omitted results included, it shows 6k of other files.

The above script should automatically follow the omitted result criteria, so it can download all the files.

If the option can be parameterized, this method can be activated by specifying -r (to repeat the search with the omitted results included).

Est. 2-4h

NameError: name 'name2codepoint' is not defined

Traceback (most recent call last):
  File "binfiles/google_dl", line 161, in <module>
    for results in page:
  File "binfiles/google_dl", line 79, in __next__
    results = self.gs.get_results()
  File "~/.python/xgoogle/search.py", line 201, in get_results
    results = self._extract_results(page)
  File "~/.python/xgoogle/search.py", line 288, in _extract_results
    eres = self._extract_result(result)
  File "~/.python/xgoogle/search.py", line 295, in _extract_result
    title, url = self._extract_title_url(result)
  File "~/.python/xgoogle/search.py", line 309, in _extract_title_url
    title = self._html_unescape(title)
  File "~/.python/xgoogle/search.py", line 369, in _html_unescape
    return re.sub(r'&([^;]+);', entity_replacer, s, re.U)
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/re.py", line 191, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "~/.python/xgoogle/search.py", line 356, in entity_replacer
    if entity in name2codepoint:
NameError: name 'name2codepoint' is not defined

Example command:

google_dl -s http://www.marquette.edu/maqom/ -f pdf ""

ValueError: unknown url type

Traceback (most recent call last):
  File "~/binfiles/google_dl", line 165, in <module>
    path = get_path_via_url(url, args.dest, args.dirs)
  File "~/binfiles/google_dl", line 105, in get_path_via_url
    request = urllib.request.Request(url, method="HEAD")
  File "~/anaconda/lib/python3.6/urllib/request.py", line 329, in __init__
    self.full_url = url
  File "~/anaconda/lib/python3.6/urllib/request.py", line 355, in full_url
    self._parse()
  File "~/anaconda/lib/python3.6/urllib/request.py", line 384, in _parse
    raise ValueError("unknown url type: %r" % self.full_url)
ValueError: unknown url type: '/search?q=%C5%93+%C5%93+%C5%93+%C5%93+filetype:pdf&num=50&hl=en&prmd=ivns&tbm=isch&tbo=u&source=univ&sa=X&ved=0ahUKEwiRx83UpdTWAhWKLVAKHQ4rB1c49ABQsAAInwA'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.