GithubHelp home page GithubHelp logo

ryppl / boost2git Goto Github PK

View Code? Open in Web Editor NEW
5.0 9.0 6.0 3.27 MB

Conversion to Git for Boost

Home Page: http://jenkins.boost.org/job/Boost2Git

Emacs Lisp 1.64% Shell 0.26% Python 1.26% C++ 96.83%

boost2git's Introduction

Boost2Git

This project converts an SVN repository into multiple Git repositories, optionally registering each repository as a submodule in some other Git repository. It started out as KDE's svn2git tool, but has been almost completely rewritten, to the point where very little of the original code remains.

There were many reasons for our initial deviations from svn2git, but the heart of the original program was still there until we discovered it was producing nonsense results. When we evaluated the core logic, it became clear that the svn2git approach was insufficiently general to correctly handle our branch and directory mapping structure. Our rewrite requires C++11.

In the rewrite, we dropped several features of svn2git that aren't needed for Boost, most notably incremental conversions. The dropped features could be brought back without too much difficulty, but unless someone else takes over maintenance of this project, they are unlikely to get addressed. The issue tracker is our record of what can or should still be done.

For any substantially large SVN-to-Git conversion + modularization job, if you start with today's technology, some amount of coding will be necessary. Because it is quite general and fairly clean, Boost2Git is probably a good starting point.

At the time of this writing, Boost is being continuously converted into these Git repositories.

boost2git's People

Contributors

dabrahams avatar purplekarrot avatar harinath avatar uqs avatar nsams avatar thiagomacieira avatar modax avatar nicolas17 avatar tnyblom avatar andersk avatar danieljames avatar jmsantamaria avatar lastique avatar hartwork avatar jobermayr avatar mclap avatar sdoerner avatar

Stargazers

illlust avatar  avatar Jano Svitok avatar  avatar  avatar

Watchers

Troy Straszheim avatar  avatar Regina Pfeifer avatar Eric Niebler avatar Darren Garvey avatar Vicente J. Botet Escriba avatar James Cloos avatar  avatar  avatar

boost2git's Issues

Performance improvement

The current algorithm to find a match rule is quite slow. It loops linearly through all match rules and checks a) the minimum revision, b) the maximum revision, and c) the prefix. The first rule that matches will be chosen, hence the order is relevant.

Searching for the longest prefix instead of the first prefix can improve both performance (eg by using a radix tree) and usability (repositories.txt may list repositories in any order (currently graph_parallel must come before graph)).

Document rule DSL

We can't ask people to submit rule edits if they don't understand how the rules work

Do some cleanup on the ruleset

For example, there are lots of branch rules in common_branches that really are specific to a given repository. It would greatly simplify thinking about how the rules work if these were localized to specific repository sections. If we wrote some post-processing code to dump information about branch rules that are only matched in a single repository, that would be easy to automate.

Incremental conversion breaks .gitmodules

It seams the information about existing submodules is lost when doing an incremental Boost2git conversion.

Workaround: Run a full conversion.
Solution: Read the .gitmodules file on startup.

Explain unmatched source directories

I'm seeing these in the log, which I'm at a loss to explain:

++ WARNING: SVN reports a "copy from" @38875 from /trunk@38874 but no matching rules found! Ignoring copy, treating as a modification
++ WARNING: SVN reports a "copy from" @38875 from /trunk@38874 but no matching rules found! Ignoring copy, treating as a modification
++ WARNING: SVN reports a "copy from" @39706 from /tags/Version_1_34_1/boost/boost/algorithm@39705 but no matching rules found! Ignoring copy, treating as a modification

(etc)

Seems like there must be a bug somewhere.

Comment more of the code more thoroughly

Unless this codebase is going to die with the Boost transition to Git, it will need many more very clear comments so that someone else can pick it up and use it later

Deal with executable files

Right now we only create text; they need to be written with a different "mode" string to git-fast-import

Lost submodules

bdbb694 was supposed to address a problem that apparently persists. Assigning this to Daniel so that he can provide a complete description and testcase.

Translate symlinks

I don't think we have any in Boost's SVN, but they need a different "mode" string in the git-fast-import input.

Make it incremental

The original svn2git codebase could restart the conversion process where it left off for cases when SVN is just receiving new commits. This is not hard to implement, but not necessarily required for Boost

Put information about line numbers into ruleset

Useful diagnostics from svn2git would be a whole lot more useful if the ruleset contained line number information about the participating rules. I tried to implement this myself but couldn't figure it out. Assigning to @purpleKarrot since he's the one with the Spirit chops.

Optimize ancestry computation

Instead of traversing the SVN filesystem, do it by examining SVN subtree rules (patrie has a method for this). Not sure whether this is actually a bottleneck, so maybe profile first

Re-implement "recurse" action

The new DSL has no "recourse" action yet.
Either it should get one, or we provide an additional file with all the directories listed that should be recoursed into. It may also be helpful to limit the version range where this rule should be considered.

Do a lot more asynchronously

There's no reason SVN change discovery can't go on in parallel with writing to git fast-import, and that more git fast-import stuff can't be happening in parallel (we have a separate process for each repo after all!) ASIO might be helpful here.

Deal with branch deletion

When branches become empty in Git, it's usually because they have been deleted in SVN. We should leave a tag on the ref's final commit before becoming empty, and then delete the ref.

Recognize folders

Currently the output shows:

Revision 7623
++ WARNING: File '/branches/unlabeled-1.1.1.1.10/boost/boost/detail' not accounted for. Putting to fallback.
++ WARNING: File '/branches/unlabeled-1.1.1.1.10/boost/libs' not accounted for. Putting to fallback.

These two changes are actually folders. If a folder is not accounted for, Boost2Svn autumatically recurses. For some reason they are not recognized as folders.

Address Warnings

Warnings in the svn2git output are usually ruleset errors. With very few exceptions, they look like they should be fixed.

Optimize merge discovery

Right now there's a pass across every file participating in an SVN directory copy. This is terribly inefficient and I'm certain it could be replaced by a use of patrie::svn_subtree_rules, perhaps along with some exploration to make sure that SVN actually contains a file under the path in question.

Review destination branch/tag names

Some examples of issues that should be considered:

  • We have lots of refs beginning with old-branches/. I believe that name was chosen originally by @jwiegley because either the branch had been deleted (in which case we can let svn2git handle that using its backups/ feature) or because it is not being actively developed. IIRC in that case he was directing the old branch at a tag, which we are not currently doing.
  • there are lots of duplicate ref name suffixes in SVN, some of which are being collapsed to the same tag in git. This mostly happens in the Boost.Build repo, where we have, e.g., /tags/jam/ and /tags/tools/jam/<whatever>. It seems likely that these tags were all simply moved/re-rooted... ah, yes, see r39733. In a case like that, the collapsing is fine, but we should try to be sure of it.

Mimic gitflow

  • map 'trunk' to 'develop'
  • make 'develop' the default branch
  • connect all tags to 'master'

Make sure empty commit elimination works with ancestry

The part of the process that resets a ref to its previous commit if the SHA1 hasn't changed needs to be checked for interactions with ancestry. We probably want to avoid creating a new commit just because a branch was created, but otherwise we want to make sure that ancestry isn't dropped.

Make sure commits are connected

Looking at the tree view, there are lots of commits that are not connected to any branch. Probably some merge information is missing.

Remove redundant merge parents

I found the following ruby script:

#!/usr/bin/ruby
old_parents = gets.chomp.gsub('-p ', ' ')

if old_parents.empty? then
  new_parents = []
else
  new_parents = `git show-branch --independent #{old_parents}`.split
end

puts new_parents.map{|p| '-p ' + p}.join(' ')

Here is a bash one-liner that should might work.

git filter-branch --force --parent-filter 'read commit; test -z "$commit" || git show-branch --independent `echo -n "$commit" | sed -e "s/-p / /g"` | sed -e "s/.*/-p &/" | tr "\n" " "; echo'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.