GithubHelp home page GithubHelp logo

Issue at every first run about twarc HOT 6 CLOSED

docnow avatar docnow commented on July 20, 2024
Issue at every first run

from twarc.

Comments (6)

edsu avatar edsu commented on July 20, 2024

Are you sure you are running the latest code? requests_oauthlib is included in both the requirements.txt and the setup.py

from twarc.

remagio avatar remagio commented on July 20, 2024

Sorry to say @edsu but now I got on OSX same results of the old Debian box:
I installed Twarc from your Git on Monday. So, I repeated anyway again now:

  • git pull (no results because no updates on repo);
  • python setup.py install" (looks the same of Monday, no error or weird output).

Executed same query of yesterday, results:

  1. getting older tweet without --scrape than with --scrape
  2. at every run it restart from the beginning and not starting from last IDS saved
  3. saving name files with '%23' instead '#' despite
  4. no more the error of previous comment and a clean log.

Here is results of utils/summarize.py with --scrape :

%23moncler%20%23report-20141105104103.json
  start: 529573570809057281 [Tue Nov 04 09:59:12 +0000 2014]
  end:   529930957868908544 [Wed Nov 05 09:39:20 +0000 2014]
  total: 6653

%23moncler%20%23report-20141105105537.json
  start: 529573570809057281 [Tue Nov 04 09:59:12 +0000 2014]
  end:   529934176045502464 [Wed Nov 05 09:52:07 +0000 2014]
  total: 6655

and without --scrape:

%23moncler%20%23report-20141105104930.json
  start: 529013237497737216 [Sun Nov 02 20:52:39 +0000 2014]
  end:   529932988579328000 [Wed Nov 05 09:47:24 +0000 2014]
  total: 6260

%23moncler%20%23report-20141105113241.json
  start: 529013237497737216 [Sun Nov 02 20:52:39 +0000 2014]
  end:   529943588109815808 [Wed Nov 05 10:29:31 +0000 2014]
  total: 6280

from twarc.

edsu avatar edsu commented on July 20, 2024

I don't understand this ticket. I thought you opened it because you were getting an error about the missing requests module?

from twarc.

remagio avatar remagio commented on July 20, 2024

I opened because requirements were apparently all installed properly since the beginning. And tried to reinstall Twarc anyway getting what I posted with the issue.
Now I checked again requirement.txt to understand. Then tested requirements installation manually. I found that executing directly "pip install" it really installed the pytest package, like if "python setup.py install" didn't installed it since the beginning.

pip install pytest
Downloading/unpacking pytest
  Downloading pytest-2.6.4.tar.gz (512kB): 512kB downloaded
  Running setup.py (path:/private/var/folders/q4/ry4k5ymx2dvdhm8lqwd915nr0000gn/T/pip_build_remagio/pytest/setup.py) egg_info for package pytest

Downloading/unpacking py>=1.4.25 (from pytest)
  Downloading py-1.4.26.tar.gz (190kB): 190kB downloaded
  Running setup.py (path:/private/var/folders/q4/ry4k5ymx2dvdhm8lqwd915nr0000gn/T/pip_build_remagio/py/setup.py) egg_info for package py

Installing collected packages: pytest, py
  Running setup.py install for pytest

    Installing py.test-2.7 script to /Library/Frameworks/Python.framework/Versions/2.7/bin
    Installing py.test script to /Library/Frameworks/Python.framework/Versions/2.7/bin
  Running setup.py install for py
pip install python-dateutil
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages
Requirement already satisfied (use --upgrade to upgrade): six in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages (from python-dateutil)

pip install requests_oauthlib
Requirement already satisfied (use --upgrade to upgrade): requests-oauthlib in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages
Cleaning up…

I did an --upgrade anyway. Other requirements were ok.
Than tried again to check if it's solved the issue reported. It didn't. But it only change is results with --scrape, starting since 03 instead 04:

%23moncler%20%23report-20141105115907.json
  start: 529389017947582464 [Mon Nov 03 21:45:52 +0000 2014]
  end:   529949921546612736 [Wed Nov 05 10:54:41 +0000 2014]
  total: 6687

%23moncler%20%23report-20141105120627.json
  start: 529389017947582464 [Mon Nov 03 21:45:52 +0000 2014]
  end:   529949921546612736 [Wed Nov 05 10:54:41 +0000 2014]
  total: 6687

Main issue is it re-download all same tweets since the beginning instead since last IDS saved.
It looks related with filename '%23something' -> query "#something", like described.
Testing too again with "Keybase" instead "#Keybase" it seems working fine.

from twarc.

edsu avatar edsu commented on July 20, 2024

I'm afraid I still don't understand your problem. Would removing the --scrape functionality help you?

from twarc.

remagio avatar remagio commented on July 20, 2024

It doesn't help, It doesn't matter if using or not --scrape. Simply solving this issue I got back a previous opened issue for which I stopped to use a previous Debian box.

  1. The initial issue was missing "requests", on a new OSX box. The error is present only at first run of the query, not on next executions of same query (all next execution son't get same error like at first execution).
  2. Following anyway your suggestion to check requirements: I launched again the installation and checked requirements (python setup.py installation). Apparently all satisfied. But testing again all requirements directly (using pip install namepackage), and one by one, it looks like the standard setup missed only the "pytest" package. Not "requests". Checking back the console there was no errors or any abnormal output during all kind of setups.

So, I solved initial issue about "requests" but appeared a new issue:
the Twarc started to handle JSON name file using "23%" instead of "#" like a few minutes early before step 2. I think this is the cause of this new issue: at every execution of the same query, in the same path, Twarc don't read properly previous JSON file for checking last IDS. And it return again a JSON file like if not enable to read the previous JSON filename and if it's always a first run.

from twarc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.