[geeklog-devel] Upgrade CVS to a distributed repository

Nick Andrew nick at nick-andrew.net
Wed Apr 16 13:14:31 EDT 2008


On Tue, Apr 15, 2008 at 09:37:57PM +0200, Dirk Haun wrote:
> SVN has two main advantages over CVS:
> 
> 1) You can rename / move files
> 2) Atomic commits

Also file (or tree) copy is a cheap operation and SVN supports metadata
through named properties. SVN is a big advantage over CVS for all these
reasons. Oh, and SVN has one unique feature that all the distributed
SCMs lack, and that is you can checkout a subset of the tree (like one
directory and everything underneath) and work with that subset with no
need to check out the rest of the tree. In fact SVN needs this because
their tags and branches are implemented as copies; if you couldn't check
out and work with just the trunk or just a branch, you'd be checking out
every branch and tag there ever was, with possibly hundreds of copies of
your files on your disk.

I have a lot of personal and professional experience with SVN, and Git
and Mercurial are light-years ahead of it. I have a repository with over
40,000 commits in it, migrated from RCS to CVS to SVN through the years.
There are over 100 branches. SVN is reliably managing this code, but it
is also slow and merging of changes from the branches into the trunk
loses the individual commits in the branch. Thus it has become difficult
for us to track with precision the individual code changes and commit
messages.

> We have in fact been discussing using a DVCS on and off for as far back
> as last year's Mentor Summit. I have just recently attended a
> presentation[1] about DVCS again, and in that session, the presenters
> and the audience discussed the benefits and problems of various systems.
> The conclusion was something along these lines:
> 
> - git can do pretty much everything, but it's a system by geeks for
> geeks, so it may not always be obvious

That's the way it seems to me. I don't know if Git has any graphical
frontend for Windows users, but Mercurial has TortoiseHg.  For me using
Linux, Mercurial is already very friendly at the command line.

> - bazaar has the potential of becoming everybody's first choice simply
> due to its backing (by Canonical and the Ubuntu community)

I haven't used bazaar, but the popularity of the other SCMs ensures that
there will be choice for years to come. Conversion tools will migrate
repositories from one format to another so whatever choice you make
doesn't even have to be final.

I am currently converting the abovementioned 40,000 commits repository
from git to SVN, as a test. It takes only _one_ command. I expect the
resulting git repository will faithfully represent every single commit
from the SVN repository, but I doubt that even git-svn can figure out in
what revisions various branches were merged back into trunk, because
SVN itself doesn't keep that information.

> - mercurial may be the underdog here but may be worth a look as it has
> some advantages over the other two (and it also has backing by Sun now)

The stacked patch systems are an add-on to Git (StGit, Guilt and Quilt)
but built in to Mercurial (MQ). Git's SVN converter is top-notch
because you can also contribute commits back into SVN. I've used
Mercurial's converters for SVN and CVS. The LinuxTV team use Mercurial
for their codebase even though other kernel developers use Git;
they also publish a Git repository of their master repository anyway.

> This is actually something that we should discuss with the Summer of
> Code in mind. Last year, we made the mistake of giving CVS access too
> late, which resulted in some merging pains later on.

If you're going to have several people working on the codebase all at
once then you need to make it easy for them, i.e. easy sync-up with
other peoples' repositories and merging.

> With a DCVS, students could work on their local repository, pull from
> the main repository, and the eventual push upwards should be much easier
> since these systems are designed with branching and merging in mind.

Yes, quite. But in the 'eventual push upwards' think that the students
can pull the latest from the master, do their own merge and then you
can pull already-merged code from them. Repeat 'N' times for N students
and you will notice the merge is not a problem for you at all because
the students can actually take responsibility for it themselves.

Nick.



More information about the geeklog-devel mailing list