GithubHelp home page GithubHelp logo

ssc-oscar / oscar.py Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 9.0 2.42 MB

Python interface for OSCAR data

License: GNU General Public License v3.0

Python 0.61% Makefile 3.11% Dockerfile 0.01% C 93.61% Cython 2.66%

oscar.py's People

Contributors

actions-user avatar atutko2 avatar audrism avatar cbogart avatar dkennard3 avatar gaokai320 avatar kaygau avatar px1624 avatar user2589 avatar zol0 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oscar.py's Issues

ImportError: No module named clickhouse_driver

I'm trying to use oscar.py on da2 or da4. After cloning oscar.py, I ran
easy_install --user --upgrade oscar.
When I try to run anything that imports from oscar, I get the following error message:
Traceback (most recent call last):
File "simple.py", line 1, in
from oscar import Blob
File "/home/dreid6/oscar.py/oscar.py", line 6, in
import clickhouse_driver as clickhouse
ImportError: No module named clickhouse_driver

What is the recommended way of using oscar.py in one of the OSCAR servers?

Excuse me for posting an issue here but I don't know a better place to ask this question.

I'm a student from Peking University, under supervision of Prof. Minghui Zhou, and I plan to do some research on the World of Code database.

I think oscar.py is already installed on the server, but some of its dependency is missing. An example is

bash-4.2$ python
Python 2.7.5 (default, Sep 12 2018, 05:31:16) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import oscar
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build/bdist.linux-x86_64/egg/oscar.py", line 2, in <module>
ImportError: No module named lzf

However, I cannot install new python packages using easy_install because python development environment is missing, and I cannot fix this without root access. I wonder that I might have did all these things completely wrong. Should I use some sort of virtual environment to install all the dependencies and run my script?

Incorrect URLs for non-GitHub repositories

Hello!

There is an issue with generating the URL for some non-GitHub repositories.

For example, the project with URI gitlab.freedesktop.org_libinput_libinput. The following code...

print(oscar.Project("gitlab.freedesktop.org_libinput_libinput").url.decode())

Produces the following output...

https://github.com/gitlab.freedesktop.org/libinput_libinput

However, the project URL is https://gitlab.freedesktop.org/libinput/libinput.

This error can happen to any project that is not hosted in the list of projects in URL_PREFIXES. Updating the list of projects with those within the list seems unmanageable, as the are many non-GitHub repositories in WoC.

Problems with import oscar

I connected da4, da4 has pre installed oscar v2.2.1, but I am unable to import oscar in Python 3. The error message is as follows:
image
I have tried v2.2.1, v2.2.0, v2.1.0, and I have also run the command in Tutorial: "easy_ Install -- user clickhouse driver".
I also tried to build and install oscar as per the installation instructions in Oscar. py, using "python3 setup.py build_ext && python3 setup.py install --user"
But none of these works, I still cannot import Oscar correctly.

problems with examples

I want to get the heads of a project and traverse the tree.
I was not able to get it to work. I backed up and just tried
getting some of the examples to work and had some trouble. I
don't know if I'm doing something wrong or if there is a problem
with oscar.py.

I tried this example from from https://ssc-oscar.github.io/oscar.py/
>>> tree = Tree("d4ddbae978c9ec2dc3b7b3497c2086ecf7be7d9d")
>>> '.gitignore' in tree
True
when I run it it returns False instead of True. (using ~/lookup/showCnt tree,
I can see that it should return True as shown in the example.)

This example is in the comments in oscar.py
c = Commit('e38126dbca6572912013621d2aa9e6f7c50f36bc')
print c.authored_at
when I run it, I get the following error:
Traceback (most recent call last):
File "", line 1, in
File "oscar.py", line 832, in getattr
self.header, self.full_message = self.data.split("\n\n", 1)
ValueError: need more than 1 value to unpack

(again, ~/lookup agrees with what the documentation says it should return).

Blob / File iterators not working with python 3

They seem to be different kinds of errors:

bklein3@da4 ~> python3 test_oscar_iterators.py
Traceback (most recent call last):
  File "test_oscar_iterators.py", line 3, in <module>
    print(f"first file: {next(File.all())}")
  File "oscar.pyx", line 597, in all
  File "oscar.pyx", line 592, in all_keys
  File "oscar.pyx", line 514, in oscar._get_tch
TypeError: expected bytes, str found

Not sure if this one is 3 specific or not:

bklein3@da4 ~> python3 test_oscar_iterators.py
Traceback (most recent call last):
  File "test_oscar_iterators.py", line 3, in <module>
    print(f"first blob: {next(Blob.all())}")
  File "oscar.pyx", line 607, in all
KeyError: 'blob_sequential_idx'
from oscar import Blob, File, Commit

print(f"first file: {next(File.all())}")
print(f"first blob: {next(Blob.all())}")

The difference between .idx and sha1.tch

I am deploying WoC in Pengcheng, but I'm confused about .idx file and sha1.tch file(such as blob_0.idx and sha1.blob_0.tch). According to the published paper in MSR (World of Code- An Infrastructure for Mining the Universe of Open Source VCS Data)(In III section D Data Storage), I think sha1.tch files use a git object's SHA as key and the object's offset in .idx file. The .idx file records its offset and size in .bin file. But after I see the code in oscar.py, it seems that it doesn't use .idx file at all?
So I am confused what does .idx file and sha1.tch file do? Thank you!

No keys found for path_template

I'm running oscar.py on da4 and get this warning when importing Author from oscar

What I did on terminal:

Python 2.7.5 (default, Apr  2 2020, 13:16:51) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from oscar import Author
oscar.py:63: UserWarning: No keys found for path_template /fast/All.sha1/sha1.blob_{key}.tch
  warnings.warn("No keys found for path_template " + path_template)
oscar.py:63: UserWarning: No keys found for path_template /fast/All.sha1/sha1.tree_{key}.tch
  warnings.warn("No keys found for path_template " + path_template)
oscar.py:63: UserWarning: No keys found for path_template /fast/All.sha1o/sha1.commit_{key}.tch
  warnings.warn("No keys found for path_template " + path_template)
oscar.py:63: UserWarning: No keys found for path_template /fast/All.sha1o/sha1.tree_{key}.tch
  warnings.warn("No keys found for path_template " + path_template)

How can this warning be prevented?

Using `File('<finename>').commit_shas` results in error due to missing .tch file "/da5_fast/f2cFullU.13.tch"

When trying to find all commits that changed given file (and then all projects that included given file at some point), I have tried to follow the example from oscar.py documentation: https://ssc-oscar.github.io/oscar.py/#oscar.File

from oscar import File
commits = File('minicms/templatetags/minicms_tags.py').commit_shas

Unfortunately, it does not work, and instead returns the following error:

OSError: Failed to close .tch "b'/da5_fast/f2cFullU.13.tch'": file not found
Exception ignored in: 'oscar.Hash.__del__'
OSError: Failed to close .tch "b'/da5_fast/f2cFullU.13.tch'": file not found
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "oscar.pyx", line 344, in oscar.cached_property.wrapper
  File "oscar.pyx", line 1578, in oscar.File.commit_shas
  File "oscar.pyx", line 574, in oscar._Base.read_tch
  File "oscar.pyx", line 523, in oscar._get_tch
  File "oscar.pyx", line 459, in oscar.Hash.__cinit__
OSError: Failed to open .tch file "b'/da5_fast/f2cFullU.13.tch'": file not found

In the discussion for woc-hack/mining-challenge-msr-2023#2 (where I originally created an issue for this problem) @audrism wrote:

f2c is available for version T only.
oscar.pyc only uses current version (U) which is absent.
getValues, on the other hand, checks if prior versions exist.

I propose that oscar use the same technique, that is use prior versions of f2c file if current version does not exist (possibly giving also some warning).

Parsing bug in oscar.pyx

New oscar.pyx changes may be triggering some parsing bugs. The following code, run on da4:

import oscar

for commit in oscar.Project("buttermilk-crypto_b2"):
    print(commit)

Produces a few correct responses, then a parse error:

0fb93e4f750d75f6c1ccaf1f1e43b5680b82b61f
12b14fb30e7483edaf87ca6f3c4f97d836ea801f
14e609896cbe0819cd9f80ddd6902a58d4cfda40
1aa03a789bdb2dda02a196946f44d63e422880fc
1c7d68857bfc439deb43875db253e97437c6f358
221e8b6571dad078a2c086b13b6e774b7156e519
224d1fa0f9b68801418d57e0e591b28aace1ca79
247e29b7c1bd0d3d6557c338249133be49a77398
27e5cb3d5e4e3f6d67353d5825d3174894a0238d
304bd7c5fd53ea52ff17a800b623e88ac823ecfb
356d25daa208e992d10045f8b19e9dcc4ef886e0
Traceback (most recent call last):
  File "demo_bug.py", line 3, in <module>
    for commit in oscar.Project("buttermilk-crypto_b2"):
  File "oscar.pyx", line 1355, in __iter__
  File "oscar.pyx", line 949, in oscar.Commit.__getattr__
  File "oscar.pyx", line 1056, in oscar.Commit._parse
ValueError: need more than 1 value to unpack

The problem appears to be in line 1056 of the Commit class; this sometimes doesn't work, and it appears that maybe self.data is returning an empty string in some cases:

    def _parse(self):
        self.header, self.full_message = self.data.split(b'\n\n', 1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.