GithubHelp home page GithubHelp logo

rpm-software-management / createrepo_c Goto Github PK

View Code? Open in Web Editor NEW
92.0 19.0 91.0 6.39 MB

C implementation of the createrepo.

Home Page: http://rpm-software-management.github.io/createrepo_c

License: GNU General Public License v2.0

Python 12.60% CMake 0.99% C 85.67% Shell 0.74%

createrepo_c's Introduction

createrepo_c

C implementation of createrepo

Run createrepo -h for usage syntax.

Devel tips

Building

Package build requires - Pkg name in Fedora/Ubuntu:

From your checkout dir:

mkdir build
cd build/
cmake ..
make

To build the documentation, from the build/ directory:

make doc

Note: For build with debugging symbols you could use (from the build/ directory):

cmake -DCMAKE_BUILD_TYPE:STRING=DEBUG .. && make

Building from an rpm checkout

E.g. when you want to try weak and rich dependencies.

cmake .. && make

Note: The RPM must be built in that directory

Commands I am using for building the RPM:

cd /home/tmlcoch/git/rpm
CPPFLAGS='-I/usr/include/nss3/ -I/usr/include/nspr4/' ./autogen.sh --rpmconfigure --with-vendor=redhat --with-external-db --with-lua --with-selinux --with-cap --with-acl --enable-python
make clean && make

Other build options

-DENABLE_LEGACY_WEAKDEPS=ON

Enable legacy SUSE/Mageia/Mandriva weakdeps support (Default: ON)

-DENABLE_THREADED_XZ_ENCODER=ON

Threaded XZ encoding (Default: OFF)

Note: This option is disabled by default, because Createrepo_c parallelizes a lot of tasks (including compression) by default; this only adds extra threads on XZ library level which causes thread bloat and for most usecases doesn't bring any performance boost. On regular hardware (e.g. less-or-equal 4 cores) this option may even cause degradation of performance.

-DENABLE_DRPM=ON

Enable DeltaRPM support using drpm library (Default: ON)

Adds support for creating DeltaRPMs and incorporating them into the repository.

-DWITH_ZCHUNK=ON

Build with zchunk support (Default: ON)

-DWITH_LIBMODULEMD=ON

Build with libmodulemd support (Default: ON)

Adds support for working with repos containing Fedora Modularity metadata.

Build tarball

utils/make_tarball.sh [git revision]

Without git revision specified HEAD is used.

Build Python package

To create a binary "wheel" distribution, use:

python setup.py bdist_wheel

To create a source distribution, use:

python setup.py sdist

Installing source distributions require the installer of the package to have all of the build dependencies installed on their system, since they compile the code during installation. Binary distributions are pre-compiled, but they are likely not portable between substantially different systems, e.g. Fedora and Ubuntu.

Note: if you are building a bdist or installing the sdist on a system with an older version of Pip, you may need to install the scikit-build Python package first.

To install either of these packages, use:

pip install dist/{{ package name }}

To create an "editable" install of createrepo_c, use:

python setup.py develop

Note: To recompile the libraries and binaries, you muse re-run this command.

Build RPM package

Modify createrepo_c.spec and run:

utils/make_rpm.sh

Note: Current .spec for Fedora rawhide

Testing

All unit tests run from librepo checkout dir

Build C tests && run c and python tests

make tests && make test

Note: For a verbose output of testing use: make ARGS="-V" test

Run only C unittests (from your checkout dir):

build/tests/run_tests.sh

Note: The C tests have to be built by make tests)!

Run only Python unittests (from your checkout dir):

PYTHONPATH=`readlink -f ./build/src/python/` python3 -m unittest discover -bs tests/python/

Note: When compiling createrepo_c without libmodulemd support add WITH_LIBMODULEMD=OFF

Links

Bugzilla

Important notes

In original createrepo sha is a nickname for the sha1 checksum. Createrepo_c mimics this behaviour.

Contribution

Here's the most direct way to get your work merged into the project.

  1. Fork the project

  2. Clone down your fork

  3. Implement your feature or bug fix and commit changes

  4. If the change fixes a bug at Red Hat bugzilla, or if it is important to the end user, add the following block to the commit message:

    = changelog =
    msg:           message to be included in the changelog
    type:          one of: bugfix/enhancement/security (this field is required when message is present)
    resolves:      URLs to bugs or issues resolved by this commit (can be specified multiple times)
    related:       URLs to any related bugs or issues (can be specified multiple times)
    
    • For example::

      = changelog =
      msg: Enhance error handling when locating repositories
      type: bugfix
      resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1762697
      
    • For your convenience, you can also use git commit template by running the following command in the top-level directory of this project:

      git config commit.template ./.git-commit-template
      
  5. In a separate commit, add your name and email into the authors file as a reward for your generosity

  6. Push the branch to your fork

  7. Send a pull request for your branch


Differences in behavior between createrepo_c and createrepo

Checksums after update

Use case:

  • Repodata in repo/ are has checksum xxx
  • Params: --update --checksum=yyy repo/

createrepo_c result:

  • All package checksums are recalculated into yyy

original createrepo result:

  • Only new and changed packages has yyy checksums other packages has still xxx checksums

Skip symlinks param

Use case:

  • Some packages in repo/ are symlinks
  • Params: --skip-symlinks repo/

createrepo_c result:

  • Symlinked packages are ignored

original createrepo result:

Base path from update-md-path repo

Use case:

  • A somebody else's repo is somewhere
  • The repo items have set a base path to http://foo.com/
  • We want to create metadata for our repo
  • Some packages in our repo are same as packages in somebody else's repo
  • We want to speed up creation of our repodata with combo --update and --update-md-path=somebody_else's_repo
  • Params: --update --update-md-path=ftp://somebody.else/repo our_repo/

createrepo_c results:

  • All our packages have no base path set (if we don't set --baseurl explicitly)

original createrepo result:

Crippled paths in filelists.xml after update

Use case:

  • A repo with old metadata exists
  • We want to update metadata
  • Params: --update repo/

createrepo_c results:

  • All is fine

original createrepo result:

--update leaves behind some old repodata files

Use case:

  • A repo with repodata created with --simple-md-filenames exists
  • We want to update repodata to have checksums in filenames
  • Params: --update repo/

createrepo_c results:

  • All repodata contains checksum in the name

original createrepo result:

Mergerepo_c

Default merge method

  • Original mergerepo included even packages with the same NVR by default
  • Mergerepo_c can be configured by --method option to specify how repositories should be merged.
  • Additionally its possible to use --all option to replicate original mergerepo behavior.

Modifyrepo_c

Modifyrepo_c is compatible with classical Modifyrepo except some misbehaviour:

  • TODO: Report bugs and add reference here

Batch file

When there is need to do several modification to repository (repomd.xml) a batch file could be used.

Batch file is Modifyrepo_c specific. It is not supported by the classical Modifyrepo - at least not yet.

Example

# Add:
#   [<path/to/file>]
#   <options>

# Metadata that use a bunch of config options
[some/path/comps.xml]
type=group
compress=true
compress-type=gz
unique-md-filenames=true
checksum=sha256
new-name=group.xml

# Metadata that use default settings
[some/path/bar.xml]

# Remove:
#   [<metadata name>]
#   remove=true

[updateinfo]
remove=true

Supported options

Option name Description Supported value(s) Default
path Path to the file. When specified it override the path specified in group name (name between [] parenthesis) Any string group name (string between '[' ']')
type Type of the metadata Any string Based on filename
remove Remove specified file/type from repodata true or false false
compress Compress the new metadata before adding it to repo true or false true
compress-type Compression format to use gz, bz2, xz gz
checksum Checksum type to use md5, sha, sha1, sha224, sha256, sha384, sha512 sha256
unique-md-filenames Include the file's checksum in the filename true or false true
new-name New name for the file. If compress is true, then compression suffix will be appended. If unique-md-filenames is true, then checksum will be prepended. Any string Original source filename

Notes

  • Lines beginning with a '#' and blank lines are considered comments.
  • If remove=true is used, no other config options should be used

createrepo_c's People

Contributors

adelton avatar aplanas avatar conan-kudo avatar cottsay avatar dcantrell avatar dirkmueller avatar dralley avatar inknos avatar j-mracek avatar jan-kolarik avatar jcfr avatar jdieter avatar jrohel avatar kangie avatar kontura avatar kraj avatar lmacken avatar m-blaha avatar m0ses avatar marmarek avatar mikhirev avatar noelbk avatar pkratoch avatar ppentchev avatar ppisar avatar praiskup avatar puiterwijk avatar ralphbean avatar sgallagher avatar tojaj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

createrepo_c's Issues

Provide a high level API for repository creation

I just want to create a repository from files in a given repository. I don't need any customizations. It would be nice if there was a function that can do that (just like if I execute createrepo) so that I don't need to duplicate createrepo_c/examples/python/simple_createrepo.py. Please don't forget to document the exceptions that can be raised.

support gzip --rsyncable for repo metadata (and other .gz files)

gzip now accepts the --rsyncable option. This option is accepted in all modes, but has effect only when compressing: it makes the resulting output more amenable to efficient use of rsync. For example, when a large input file gets a small change, a gzip --rsyncable image of that file will remain largely unchanged, too. Without --rsyncable, even a tiny change in the input could result in a totally different gzip-compressed output file.

http://savannah.gnu.org/forum/forum.php?forum_id=8495

Please use --rsyncable when compressing package metadata using gzip (provided it's recent enough). This will help mirrors pulling compose updates. Thank you.

Simplify the usage of createrepo_c.RepomdRecord

It would save me few letters and one variable name if I could create the createrepo_c.RepomdRecord using a single function :)

E.g. it could be:

  • createrepo_c.RepomdRecord(typename, filename, createrepo_c.SHA256) and it will create a "filled" record
  • or similarly createrepo_c.RepomdRecord.create_filled(typename, filename, createrepo_c.SHA256)
  • or even createrepo_c.RepomdRecord.filled_from_file(xmlfile, createrepo_c.SHA256)
  • or even repomd.set_record_auto(typename, filename, createrepo_c.SHA256)
  • or even repomd.set_filerecord(xmlfile, createrepo_c.SHA256)

(it overlaps with #19)

Undeclared EOK, DRPM_SOURCE_NEVR & DRPM_SEQUENCE

I am building createrepo_c with drpm but looks like there is some missing declarations:

[ 1%] Building C object src/CMakeFiles/libcreaterepo_c.dir/deltarpms.c.o
/home/edsiper/deps/createrepo_c/src/deltarpms.c: In function ‘cr_deltapackage_from_drpm_base’:
/home/edsiper/deps/createrepo_c/src/deltarpms.c:146: warning: passing argument 1 of ‘drpm_read’ from incompatible pointer type
/home/edsiper/deps/drpm/drpm.h:71: note: expected ‘struct drpm *’ but argument is of type ‘const char *’
/home/edsiper/deps/createrepo_c/src/deltarpms.c:146: warning: passing argument 2 of ‘drpm_read’ from incompatible pointer type
/home/edsiper/deps/drpm/drpm.h:71: note: expected ‘const char *’ but argument is of type ‘struct drpm *

/home/edsiper/deps/createrepo_c/src/deltarpms.c:146: error: ‘EOK’ undeclared (first use in this function)
/home/edsiper/deps/createrepo_c/src/deltarpms.c:146: error: (Each undeclared identifier is reported only once
/home/edsiper/deps/createrepo_c/src/deltarpms.c:146: error: for each function it appears in.)
/home/edsiper/deps/createrepo_c/src/deltarpms.c:152: error: ‘DRPM_SOURCE_NEVR’ undeclared (first use in this function)
/home/edsiper/deps/createrepo_c/src/deltarpms.c:161: error: ‘DRPM_SEQUENCE’ undeclared (first use in this function)
/home/edsiper/deps/createrepo_c/src/deltarpms.c: In function ‘cr_deltarpms_scan_targetdir’:
/home/edsiper/deps/createrepo_c/src/deltarpms.c:617: warning: implicit declaration of function ‘g_queue_free_full’
/home/edsiper/deps/createrepo_c/src/deltarpms.c: In function ‘walk_drpmsdir’:
/home/edsiper/deps/createrepo_c/src/deltarpms.c:706: warning: implicit declaration of function ‘g_slist_free_full’
/home/edsiper/deps/createrepo_c/src/deltarpms.c: In function ‘cr_prestodelta_thread’:
/home/edsiper/deps/createrepo_c/src/deltarpms.c:781: warning: ignoring return value of ‘g_slist_append’, declared with attribute warn_unused_result
/home/edsiper/deps/createrepo_c/src/deltarpms.c: In function ‘cr_deltarpms_generate_prestodelta_file’:
/home/edsiper/deps/createrepo_c/src/deltarpms.c:865: warning: assignment discards qualifiers from pointer target type
make[2]: *** [src/CMakeFiles/libcreaterepo_c.dir/deltarpms.c.o] Error 1
make[1]: *** [src/CMakeFiles/libcreaterepo_c.dir/all] Error 2

any help is appreciated

createrepo_c filters requires from a package in metadata even when it provides a different version of it

It appears that createrepo_c is filtering Requires based on whether the package in question also Provides the same capability. However, the with that is it is not taking into account the version associated with the Provides when filtering Requires. Thus, situations where a package provides an older version of the capability but requires a newer version leads to the capability being stripped from the Requires, which breaks the install.

This has occurred with the mutter package with GNOME 3.22 in Mageia Cauldron (development for Mageia 6).

Reference: https://bugs.mageia.org/show_bug.cgi?id=19509

Simplify the usage of the file types when creating a repository

In my case (I don't create the sqlite files), it's much easier/nicer to use createrepo_c.XmlFile than createrepo_c.PrimaryXmlFile and friends. However:

  • to create an instance of createrepo_c.XmlFile, I need a type constant (e.g. createrepo_c.XMLFILE_PRIMARY)
  • I need the name of the type (e.g. primary) as well to compose the file name (e.g. path/primary.xml.gz) - (Well, I don't need it. The name can be arbitrary. But it's a common practice to use these names.)
  • I need the name of the type to refer to the file in a createrepo_c.RepomdRecord instance

I can think of multiple solutions:

  • there can be a mapping from type constants to their names and/or the opposite mapping
  • there can be a factory function that creates file instances for given type constant (optionally with the default file name)
  • there can be a "builder" that has an add_pkg method and a dump method (it will create all the needed files in a given directory)
  • there can be a function that transforms a metadata file into a createrepo_c.RepomdRecord
  • the createrepo_c.Repomd class can have a method that does the same as set_record but accepts metadata files
  • the createrepo_c.XmlFile.__init__ can accept type names
  • the createrepo_c.RepomdRecord.__init__ can accept the type constants

but I don't insist on any of them.

Please make proper releases like you do with librepo

To make things easier on people packaging createrepo_c, please make proper releases of createrepo_c like you do for librepo. It makes it much easier to manage from the packager's point of view, since we can point directly to a GitHub release URL for the tarball source.

repoclosure_c ?

I appreciate the performance improvement of createrepo_c, and was wondering if repoclosure_c was on the horizon?

retain-old-md doesn't seem to work

I am using "createrepo_c --update --retain-old-md 4 ." and the files in repodata keeps growing after each run. Does --retain-old-md option work for anyone? Am I missing something?

Document the repository format [lowprio]

Is it true that createrepo_c is intended to replace createrepo? If yes, can you please document the repository format (the content of the directory and what purpose each file serves) so that we can use the same terms to document e.g. hawkey.Repo?
I mean something like http://createrepo.baseurl.org/ but ideally without the "FIXME" note :)
Or do you think that hawkey is rather the library that should define the format?

RFE: Multi-threaded (de)compression of metadata

While Mageia was working on implementing usage of createrepo_c to regularly generate rpm-md data, it was discovered that it was much slower on compressing xz metadata than gzip. Looking through the code on compression, it doesn't appear that you're using multi-threaded compression.

As of xz 5.2.0, the multi-threaded API is now stable and can be used in production. The documentation for this was incorrect in 5.2.0 (indicating only decompression support) but that was fixed in 5.2.2.

You may want to implement threaded (de)compression for bzip2 and gzip too.

See the Mageia bug report for more information.

Elaborate on the usage of createrepo_c.XmlFile.set_num_of_pkgs

The example createrepo_c/examples/python/simple_createrepo.py calls createrepo_c.XmlFile.set_num_of_pkgs. It seems that my code works the same even if I just add the packages without setting the number. Could you please extend the documentation wrt why should we call createrepo_c.XmlFile.set_num_of_pkgs?

Missing link to glib2 ?

Looks like the project is missing some linking on CMake:

Linking C shared library libcreaterepo_c.so
[ 54%] Built target libcreaterepo_c
[ 56%] Building C object src/CMakeFiles/createrepo_c.dir/createrepo_c.c.o
[ 57%] Building C object src/CMakeFiles/createrepo_c.dir/cmd_parser.c.o
Linking C executable createrepo_c
libcreaterepo_c.so.0.6.1: undefined reference to g_queue_free_full' libcreaterepo_c.so.0.6.1: undefined reference tog_slist_free_full'

[RFE] Provide a Python 3 API

Currently createrepo_c offers a Python 2 API, but it does not offer a Python 3 API to go with it.

Please add a Python 3 API to createrepo_c for those who want to have their programs use Python 3 instead of Python 2.

SIGSEGV when reading updateinfo

I'm reading hundreds of updateinfo.xml files in 10 threads in Python.

I end up with SIGSEGV, Segmentation fault.
Relevant part of backtrace:

#0  0x00007fffef7ef9f7 in get_datetime () from /usr/lib64/python2.7/site-packages/createrepo_c/_createrepo_c.so
#1  0x00007ffff7a5682e in getset_get () from /lib64/libpython2.7.so.1.0

It's not 100% reproducible, looks like it's caused by a race condition.
I'll try to get a reliable reproducer or more data.

Publish the Python API documentation

Could you please publish the documentation somewhere? It could be either somewhere on Internet or in form of a software package. So far, it's not readable even in the source form so I have to build it locally from sources.

ANSI escape sequences are not handled

If the RPM spec erroneously contains ANSI escape sequences (I have only tested this in %description), then those bytes are passed into the repo metadata when createrepo_c is called.

This becomes a problem when some parsers (at least that from python's sqlite module), try to parse this and they crash and burn.

createrepo on the other hand avoids this by stripping out the ANSI code.

This was tested using the createrepo_c-0.2.1-1 RPM

Add option to createrepo_c to only update certain packages

I have a repo with many many thousands of packages in it. The repos are updated by a scheduled job that know exactly while packages have been updated/added/deleted, but as far as I'm aware there's no way to tell createrepo_c --update to only bother looking for changes in those packages. As far as I'm aware, --pkglist and/or --includepkg with the --update option won't do this, as they'll remove the packages that don't match, I believe. Maybe the --keep-all-metadata option?

Either that isn't currently possible, or the man pages could use some clarifying.

Support different checksum type for RPMs and XML files

In Spacewalk, repos can be configured using a different checksum method for the XML file entries in repomd.xml and the RPM entries in primary.xml. (Referred to as "checksum type" and "RPM checksum type".)

Currently it's not possible to make createrepo_c generate repos in this way since it only takes a single --checksum argument.

It would be useful if createrepo_c supported setting both checksum types independently so that it can be used to generate repos configured in this way.

Include metadata md5 checksum in repomd.xml schema for repo metadata files

I'd like to find out if the schema for repomd.xml can be modified to include an additional entry in:

<!ELEMENT data (location | checksum | timestamp | open-checksum | open-size | size | database_version)+>
seen here:
https://github.com/rpm-software-management/yum/blob/master/docs/repomd.dtd

which could hold an additional MD5 checksum for the repo metadata files.

Would this cause any issues with existing clients if so, or would clients ignore "extraneous" elements that they're not using?

mergerepo_c and may be broken own workflow ?

I have ci system that build packages and upload artifacts to openstack swift.
So for example i have pkg-example that produces example-0.0.1.rpm and example-libs-0.0.1.rpm inside /repo/x86_64/

each step runs on docker container that does not have persistent storage and after container dies data removed...

  1. i generate rpmmd via createrepo_c inside /repo/repodata
  2. upload /repo to openstack swift to public/staging/example/ (so i can add this via repo and test package on all my systems without affecting other test nodes
  3. if testing on some nodes are fine, i do mergerepo_c with --repo url that points to public/staging and public/staging/example with output to /repo and upload /repo to public/staging/ (so on this step all staging nodes see the package and i can test on more nodes)
  4. if testing on all staging nodes are fine i do upload resulted rpms on public/release/example
    and do mergerepo_c with --repo pointed to public/release/ and public/release/example.
    So all production nodes sees that new package.

My problem - mergerepo_c sees that public/release have older package version and in resulted metadata present only this old version, that absent from public/release/example/.
I can use --all switch, but in this case in metadata i have all package versions, that are stalre already and may be cleanup..

I can rsync all the time packages to use createrepo_c --update because it expensive (i think)
How can i deal with my workflow or how can i change it...?

P.S. I know about mounting openstack swift container via fuse, but i'm try to avoid it because in this case i need privileged docker container (to access to /dev/fuse)

Add RPMS to repository, without having the entire repository locally.

I'd like to be able to host a repository on a storage service like s3 or google storage, and be able to add RPMs without having to download the entire repository.

I can approximate this by running 'createrepo' locally with an a baseurl matching the remote repository, and the running merge repo on the remote repository and the local one. But this adds a baseurl to all the packages, rather than having none.

Implement the context manager interface in createrepo_c.XmlFile

It would be nice if I could use:

with createrepo_c.XmlFile(filename, filetype, compression, None) as xmlfile:
    xmlfile.set_num_of_pkgs(len(packages))
    for package in packages:
        xmlfile.add_pkg(package)

instead of

xmlfile = createrepo_c.XmlFile(filename, filetype, compression, None)
try:
    ...
    xmlfile.set_num_of_pkgs(len(packages))
    for package in packages:
        xmlfile.add_pkg(package)
finally:
    xmlfile.close()

Please support generation of AppStream metadata automatically

A lot of people ship free and nonfree code in addon yum repos for Fedora. They add the rpms to a directory, run createrepo_c and then tell the world about their awesome new repo. Some even get selected by the Fedora workstation group to be included by default in Fedora. The new users fire up gnome-software or apper and searches for the awesome new tool, but nothing is found. I normally have to point them at https://blogs.gnome.org/hughsie/2016/04/27/3rd-party-fedora-repositories-and-appstream/ and get them to update their release tooling.

Could we move to a model where createrepo_c automatically generates the AppStream metadata (either default on, or default off) by calling the appstream-builder executable if it is installed? The other alternative is I write a patch for createrepo_c to use libappstream-builder.so, but that has some deps that you might find unpalatable. I'm open for ideas and am willing to write patches if you agree if this is something you'd permit me to do.

Thanks!

Improve the documentation of createrepo_c.RepomdRecord.fill

The example createrepo_c/examples/python/simple_createrepo.py calls createrepo_c.RepomdRecord.fill with createrepo_c.SHA256. It would be nice if you could document this method parameter and explain why it is a good idea to call it. It seems that createrepo segfaults if the method is called without any argument.
Would it be possible to pass the checksum type to the constructor? It would save me few
letters and one variable name :)
(filed a separate report #28)

It also seems that it may raise some exceptions. Could it be documented, please?

Double free on corrupt other.xml (and probably filelists.xml)

Certain types of corrupt XML files can result in memory corruption when loaded.
Specifically, this can happen if a package element is opened and XML parsing stops before it is closed, in other.xml (and probably in filelists.xml).

This affects createrepo_c when running with the --update argument but can also be demonstrated by cr.Metadata(use_single_chunk=True).locate_and_load_xml(...) from python.

Version: createrepo_c-0.7.4-1.fc20 (also confirmed in a3aaa03 )
Testcase: createrepo_c-crashbug.tar.gz (sha256: 9b8ffdb64b2359fb07e00dcbeaead68d167d8eecfd58e1531d5afd265a02a692)

$ curl -LO https://www.dropbox.com/s/xfzszf712ppyy8c/createrepo_c-crashbug.tar.gz
$ tar -xzf createrepo_c-crashbug.tar.gz 
$ ./createrepo_c-crashbug/run 
+ exec valgrind --tool=memcheck python ./crashbug.py
==1689== Memcheck, a memory error detector
==1689== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==1689== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==1689== Command: python ./crashbug.py
==1689== 

(process:1689): C_CREATEREPOLIB-CRITICAL **: cr_xml_parser_generic: parsing error 'badrepo/repodata/3edce8edbbffd8ccf7c569cc36e81c638b84d778348307c4dd20692269230510-other.xml.gz': no element found

(process:1689): C_CREATEREPOLIB-CRITICAL **: cr_metadata_load_xml: Error encountered while parsing
Metadata parsing failed (as expected).
Traceback (most recent call last):
  File "./crashbug.py", line 17, in <module>
    main()
  File "./crashbug.py", line 10, in main
    md.locate_and_load_xml(REPO_PATH)
createrepo_c.CreaterepoCError: Error encountered while parsing:other.xml parsing: Parse error 'badrepo/repodata/3edce8edbbffd8ccf7c569cc36e81c638b84d778348307c4dd20692269230510-other.xml.gz' at line: 16 (no element found)
==1689== Invalid read of size 8
==1689==    at 0x3E4B669FCE: g_string_chunk_free (in /usr/lib64/libglib-2.0.so.0.3800.2)
==1689==    by 0xBF8B55F: cr_metadata_free (in /usr/lib64/libcreaterepo_c.so.0.7.4)
==1689==    by 0xBD6BA51: ??? (in /usr/lib64/python2.7/site-packages/createrepo_c/_createrepo_cmodule.so)
==1689==    by 0x3D6286DB01: ??? (in /usr/lib64/libpython2.7.so.1.0)
==1689==    by 0x3D62904F2A: ??? (in /usr/lib64/libpython2.7.so.1.0)
==1689==    by 0x3D62904F3A: ??? (in /usr/lib64/libpython2.7.so.1.0)
==1689==    by 0x3D6287EA2E: ??? (in /usr/lib64/libpython2.7.so.1.0)
==1689==    by 0x3D628803EF: ??? (in /usr/lib64/libpython2.7.so.1.0)
==1689==    by 0x3D628828E7: PyDict_SetItemString (in /usr/lib64/libpython2.7.so.1.0)
==1689==    by 0x3D628F17A2: PyImport_Cleanup (in /usr/lib64/libpython2.7.so.1.0)
==1689==    by 0x3D628FD36D: Py_Finalize (in /usr/lib64/libpython2.7.so.1.0)
==1689==    by 0x3D6290E584: Py_Main (in /usr/lib64/libpython2.7.so.1.0)
==1689==  Address 0xbd31678 is 8 bytes inside a block of size 40 free'd
==1689==    at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==1689==    by 0x3E4B64EF7E: g_free (in /usr/lib64/libglib-2.0.so.0.3800.2)
==1689==    by 0xBF8FDED: cr_package_free (in /usr/lib64/libcreaterepo_c.so.0.7.4)
==1689==    by 0x3E4B638372: ??? (in /usr/lib64/libglib-2.0.so.0.3800.2)
==1689==    by 0x3E4B6390B0: g_hash_table_remove_all (in /usr/lib64/libglib-2.0.so.0.3800.2)
==1689==    by 0x3E4B63911D: g_hash_table_destroy (in /usr/lib64/libglib-2.0.so.0.3800.2)
==1689==    by 0xBF8B9C0: cr_metadata_load_xml (in /usr/lib64/libcreaterepo_c.so.0.7.4)
==1689==    by 0xBF8BCCB: cr_metadata_locate_and_load_xml (in /usr/lib64/libcreaterepo_c.so.0.7.4)
==1689==    by 0xBD6BFF3: ??? (in /usr/lib64/python2.7/site-packages/createrepo_c/_createrepo_cmodule.so)
==1689==    by 0x3D628E0BC3: PyEval_EvalFrameEx (in /usr/lib64/libpython2.7.so.1.0)
==1689==    by 0x3D628E097F: PyEval_EvalFrameEx (in /usr/lib64/libpython2.7.so.1.0)
==1689==    by 0x3D628E21DC: PyEval_EvalCodeEx (in /usr/lib64/libpython2.7.so.1.0)
==1689== 
==1689== Invalid read of size 8
==1689==    at 0x3E4B669FE0: g_string_chunk_free (in /usr/lib64/libglib-2.0.so.0.3800.2)
==1689==    by 0xBF8B55F: cr_metadata_free (in /usr/lib64/libcreaterepo_c.so.0.7.4)
==1689==    by 0xBD6BA51: ??? (in /usr/lib64/python2.7/site-packages/createrepo_c/_createrepo_cmodule.so)
==1689==    by 0x3D6286DB01: ??? (in /usr/lib64/libpython2.7.so.1.0)
==1689==    by 0x3D62904F2A: ??? (in /usr/lib64/libpython2.7.so.1.0)
==1689==    by 0x3D62904F3A: ??? (in /usr/lib64/libpython2.7.so.1.0)
==1689==    by 0x3D6287EA2E: ??? (in /usr/lib64/libpython2.7.so.1.0)
==1689==    by 0x3D628803EF: ??? (in /usr/lib64/libpython2.7.so.1.0)
==1689==    by 0x3D628828E7: PyDict_SetItemString (in /usr/lib64/libpython2.7.so.1.0)
==1689==    by 0x3D628F17A2: PyImport_Cleanup (in /usr/lib64/libpython2.7.so.1.0)
==1689==    by 0x3D628FD36D: Py_Finalize (in /usr/lib64/libpython2.7.so.1.0)
==1689==    by 0x3D6290E584: Py_Main (in /usr/lib64/libpython2.7.so.1.0)
==1689==  Address 0xbd60f60 is 0 bytes inside a block of size 16 free'd
==1689==    at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==1689==    by 0x3E4B64EF7E: g_free (in /usr/lib64/libglib-2.0.so.0.3800.2)
==1689==    by 0x3E4B665A64: g_slice_free_chain_with_offset (in /usr/lib64/libglib-2.0.so.0.3800.2)
==1689==    by 0x3E4B669FF9: g_string_chunk_free (in /usr/lib64/libglib-2.0.so.0.3800.2)
==1689==    by 0xBF8FDED: cr_package_free (in /usr/lib64/libcreaterepo_c.so.0.7.4)
==1689==    by 0x3E4B638372: ??? (in /usr/lib64/libglib-2.0.so.0.3800.2)
==1689==    by 0x3E4B6390B0: g_hash_table_remove_all (in /usr/lib64/libglib-2.0.so.0.3800.2)
==1689==    by 0x3E4B63911D: g_hash_table_destroy (in /usr/lib64/libglib-2.0.so.0.3800.2)
==1689==    by 0xBF8B9C0: cr_metadata_load_xml (in /usr/lib64/libcreaterepo_c.so.0.7.4)
==1689==    by 0xBF8BCCB: cr_metadata_locate_and_load_xml (in /usr/lib64/libcreaterepo_c.so.0.7.4)
==1689==    by 0xBD6BFF3: ??? (in /usr/lib64/python2.7/site-packages/createrepo_c/_createrepo_cmodule.so)
==1689==    by 0x3D628E0BC3: PyEval_EvalFrameEx (in /usr/lib64/libpython2.7.so.1.0)
==1689== 
(... many more)

A theory about the root cause:

When running in single chunk mode, there is one GStringChunk which belongs to the cr_Metadata. It's temporarily assigned/unassigned to/from cr_Package objects in newpkgcb/pkgcb in load_metadata.c.

If XML parsing stops in the middle of a package element, pkgcb is not called for the last cr_Package, so the chunk remains assigned to that object.

The chunk is then freed twice, once when the hash table of packages is destroyed (end of cr_metadata_load_xml function) and again when the cr_Metadata object is destroyed.

Residual .repodata directory

I deployed createrepo_c on Copr. But after few days, I found that there exist several residual .repodata directory.
I do not have reproducer how this can happen, but this happen.

It would be much better if the createrepo_c would name that directory .repodata-$PID
and just give warning if another .repodata-* directory exist.

This way we user get same information, while on massive scale (as is Copr) flipping critical to warning would allow better continuity of service.

Please write proper commit messages

Hi, while perusing the commit log of this repository one thing instantly struck me. The lack of actual commit messages.

Sure there's a one line (sometimes too long a line) summary but where's the rest? Not too easy to follow what's going on.

Here's what a good commit message should generally look like.
https://github.com/torvalds/subsurface/blob/master/README#L87

If you want more real-world examples of generally good commit messages, just browse the Linux Kernels log

Cheers,
Andrew

RFE: Option to automatically sign repodata during metadata creation/manipulation

Package managers such as dnf and zypper have the ability to verify signatures of metadata if it is signed. In fact, for zypper, this is the default behavior and it complains when the repodata isn't signed.

However, how to do this isn't that well-known, and it would make sense to incorporate the functionality into the createrepo_c suite of tools.

Exit status should not be 0 on failure

When there is an issue opening a file (for example, a broken symlink), the error is reported, however the exit status still reports success.

The original createrepo returns failure if an error occurs opening a file. While I really, really want to use createrepo_c because of the amazing speed, until this is fixed I have no choice but to use the older, slower, createrepo.

src/misc.c:421: 2 * resource leak ?

createrepo_c/src/misc.c:421]: (error) Resource leak: orig

$ fgrep -n orig ../BUILD/createrepo_c/src/misc.c
373: cleanup_file_fclose FILE *orig = NULL;
387: if ((orig = fopen(src, "rb")) == NULL) {
405: while ((readed = fread(buf, 1, BUFFER_SIZE, orig)) > 0) {
406: if (readed != BUFFER_SIZE && ferror(orig)) {

createrepo_c/src/misc.c:421]: (error) Resource leak: new

$ fgrep -n new ../BUILD/createrepo_c/src/misc.c
374: cleanup_file_fclose FILE *new = NULL;
396: if ((new = fopen(dst, "wb")) == NULL) {
412: if (fwrite(buf, 1, readed, new) != readed) {

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.