ssc-oscar / oscar.py Goto Github PK
View Code? Open in Web Editor NEWPython interface for OSCAR data
License: GNU General Public License v3.0
Python interface for OSCAR data
License: GNU General Public License v3.0
I'm trying to use oscar.py on da2 or da4. After cloning oscar.py, I ran
easy_install --user --upgrade oscar.
When I try to run anything that imports from oscar, I get the following error message:
Traceback (most recent call last):
File "simple.py", line 1, in
from oscar import Blob
File "/home/dreid6/oscar.py/oscar.py", line 6, in
import clickhouse_driver as clickhouse
ImportError: No module named clickhouse_driver
Excuse me for posting an issue here but I don't know a better place to ask this question.
I'm a student from Peking University, under supervision of Prof. Minghui Zhou, and I plan to do some research on the World of Code database.
I think oscar.py
is already installed on the server, but some of its dependency is missing. An example is
bash-4.2$ python
Python 2.7.5 (default, Sep 12 2018, 05:31:16)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import oscar
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "build/bdist.linux-x86_64/egg/oscar.py", line 2, in <module>
ImportError: No module named lzf
However, I cannot install new python packages using easy_install
because python development environment is missing, and I cannot fix this without root access. I wonder that I might have did all these things completely wrong. Should I use some sort of virtual environment to install all the dependencies and run my script?
Hello!
There is an issue with generating the URL for some non-GitHub repositories.
For example, the project with URI gitlab.freedesktop.org_libinput_libinput
. The following code...
print(oscar.Project("gitlab.freedesktop.org_libinput_libinput").url.decode())
Produces the following output...
https://github.com/gitlab.freedesktop.org/libinput_libinput
However, the project URL is https://gitlab.freedesktop.org/libinput/libinput
.
This error can happen to any project that is not hosted in the list of projects in URL_PREFIXES
. Updating the list of projects with those within the list seems unmanageable, as the are many non-GitHub repositories in WoC.
I connected da4, da4 has pre installed oscar v2.2.1, but I am unable to import oscar in Python 3. The error message is as follows:
I have tried v2.2.1, v2.2.0, v2.1.0, and I have also run the command in Tutorial: "easy_ Install -- user clickhouse driver".
I also tried to build and install oscar as per the installation instructions in Oscar. py, using "python3 setup.py build_ext && python3 setup.py install --user"
But none of these works, I still cannot import Oscar correctly.
I want to get the heads of a project and traverse the tree.
I was not able to get it to work. I backed up and just tried
getting some of the examples to work and had some trouble. I
don't know if I'm doing something wrong or if there is a problem
with oscar.py.
I tried this example from from https://ssc-oscar.github.io/oscar.py/
>>> tree = Tree("d4ddbae978c9ec2dc3b7b3497c2086ecf7be7d9d")
>>> '.gitignore' in tree
True
when I run it it returns False instead of True. (using ~/lookup/showCnt tree,
I can see that it should return True as shown in the example.)
This example is in the comments in oscar.py
c = Commit('e38126dbca6572912013621d2aa9e6f7c50f36bc')
print c.authored_at
when I run it, I get the following error:
Traceback (most recent call last):
File "", line 1, in
File "oscar.py", line 832, in getattr
self.header, self.full_message = self.data.split("\n\n", 1)
ValueError: need more than 1 value to unpack
(again, ~/lookup agrees with what the documentation says it should return).
They seem to be different kinds of errors:
bklein3@da4 ~> python3 test_oscar_iterators.py
Traceback (most recent call last):
File "test_oscar_iterators.py", line 3, in <module>
print(f"first file: {next(File.all())}")
File "oscar.pyx", line 597, in all
File "oscar.pyx", line 592, in all_keys
File "oscar.pyx", line 514, in oscar._get_tch
TypeError: expected bytes, str found
Not sure if this one is 3 specific or not:
bklein3@da4 ~> python3 test_oscar_iterators.py
Traceback (most recent call last):
File "test_oscar_iterators.py", line 3, in <module>
print(f"first blob: {next(Blob.all())}")
File "oscar.pyx", line 607, in all
KeyError: 'blob_sequential_idx'
from oscar import Blob, File, Commit
print(f"first file: {next(File.all())}")
print(f"first blob: {next(Blob.all())}")
I ran cut -d\; -f4 /da4_data/data/All.blobs/commit_*.idx | wc -l
which has 2034176272 commits.
According to https://bitbucket.org/swsc/overview/src/master/README.md that is the amount of commits for version R.
So where can I find version S?
When I try to run the command "Project('notcake_gcad').url" from the Python interpreter in my shell I am given the following error: "AttributeError: 'Project' object has no attribute 'url'" when the expected output is: "https://github.com/notcake/gcad'.
I am deploying WoC in Pengcheng, but I'm confused about .idx file and sha1.tch file(such as blob_0.idx and sha1.blob_0.tch). According to the published paper in MSR (World of Code- An Infrastructure for Mining the Universe of Open Source VCS Data)(In III section D Data Storage), I think sha1.tch files use a git object's SHA as key and the object's offset in .idx file. The .idx file records its offset and size in .bin file. But after I see the code in oscar.py, it seems that it doesn't use .idx file at all?
So I am confused what does .idx file and sha1.tch file do? Thank you!
I'm running oscar.py on da4 and get this warning when importing Author
from oscar
What I did on terminal:
Python 2.7.5 (default, Apr 2 2020, 13:16:51)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from oscar import Author
oscar.py:63: UserWarning: No keys found for path_template /fast/All.sha1/sha1.blob_{key}.tch
warnings.warn("No keys found for path_template " + path_template)
oscar.py:63: UserWarning: No keys found for path_template /fast/All.sha1/sha1.tree_{key}.tch
warnings.warn("No keys found for path_template " + path_template)
oscar.py:63: UserWarning: No keys found for path_template /fast/All.sha1o/sha1.commit_{key}.tch
warnings.warn("No keys found for path_template " + path_template)
oscar.py:63: UserWarning: No keys found for path_template /fast/All.sha1o/sha1.tree_{key}.tch
warnings.warn("No keys found for path_template " + path_template)
How can this warning be prevented?
For example I can't find any toURL method in oscar.py: https://github.com/woc-hack/tutorial/blob/master/README.md#exercise-4b-get-the-url-of-a-projects-repository-using-the-oscarpy-projecttourl-function
As well, the Project class does not have a fork_commits seen here: https://github.com/woc-hack/tutorial/blob/master/README.md#activity-4-using-python-apis-from-oscarpy
Line 1243 in 359b9a0
p_name = p_name.replace('_', '/')
I think it will replace all '_' (including those in repo name) with '/'
I think it should be
p_name = p_name.replace('_', '/',1)
to replace only the first one
When trying to find all commits that changed given file (and then all projects that included given file at some point), I have tried to follow the example from oscar.py documentation: https://ssc-oscar.github.io/oscar.py/#oscar.File
from oscar import File
commits = File('minicms/templatetags/minicms_tags.py').commit_shas
Unfortunately, it does not work, and instead returns the following error:
OSError: Failed to close .tch "b'/da5_fast/f2cFullU.13.tch'": file not found
Exception ignored in: 'oscar.Hash.__del__'
OSError: Failed to close .tch "b'/da5_fast/f2cFullU.13.tch'": file not found
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "oscar.pyx", line 344, in oscar.cached_property.wrapper
File "oscar.pyx", line 1578, in oscar.File.commit_shas
File "oscar.pyx", line 574, in oscar._Base.read_tch
File "oscar.pyx", line 523, in oscar._get_tch
File "oscar.pyx", line 459, in oscar.Hash.__cinit__
OSError: Failed to open .tch file "b'/da5_fast/f2cFullU.13.tch'": file not found
In the discussion for woc-hack/mining-challenge-msr-2023#2 (where I originally created an issue for this problem) @audrism wrote:
f2c is available for version T only.
oscar.pyc only uses current version (U) which is absent.
getValues, on the other hand, checks if prior versions exist.
I propose that oscar use the same technique, that is use prior versions of f2c file if current version does not exist (possibly giving also some warning).
I am using the the master branch of this repo.
>>> from oscar import Author
>>> Author('"Albert Krawczyk" <[email protected]>').commit_shas
()
>>> Author('Audris Mockus <[email protected]>').commit_shas
()
New oscar.pyx changes may be triggering some parsing bugs. The following code, run on da4:
import oscar
for commit in oscar.Project("buttermilk-crypto_b2"):
print(commit)
Produces a few correct responses, then a parse error:
0fb93e4f750d75f6c1ccaf1f1e43b5680b82b61f
12b14fb30e7483edaf87ca6f3c4f97d836ea801f
14e609896cbe0819cd9f80ddd6902a58d4cfda40
1aa03a789bdb2dda02a196946f44d63e422880fc
1c7d68857bfc439deb43875db253e97437c6f358
221e8b6571dad078a2c086b13b6e774b7156e519
224d1fa0f9b68801418d57e0e591b28aace1ca79
247e29b7c1bd0d3d6557c338249133be49a77398
27e5cb3d5e4e3f6d67353d5825d3174894a0238d
304bd7c5fd53ea52ff17a800b623e88ac823ecfb
356d25daa208e992d10045f8b19e9dcc4ef886e0
Traceback (most recent call last):
File "demo_bug.py", line 3, in <module>
for commit in oscar.Project("buttermilk-crypto_b2"):
File "oscar.pyx", line 1355, in __iter__
File "oscar.pyx", line 949, in oscar.Commit.__getattr__
File "oscar.pyx", line 1056, in oscar.Commit._parse
ValueError: need more than 1 value to unpack
The problem appears to be in line 1056 of the Commit class; this sometimes doesn't work, and it appears that maybe self.data is returning an empty string in some cases:
def _parse(self):
self.header, self.full_message = self.data.split(b'\n\n', 1)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.