DIY & Music & Guitar & Beer & Tech

Migrate svn to git, lessons learned

There are many tutorials online about how to do this, so I won’t dwell into unnecessary many details. This will rather be a summary of my experience. I needed to move a few dozen projects from svn to git where both central repositories were hosted internally on own servers. Focus was to get as much history possible after the migration and all the authors and tags and branches. For this purpose I used ‘git svn’ utility that comes with git installation. Problems encountered during the migration were related mostly to how well the SVN repo was maintained during its life cycle. For some projects the migration was a breeze. It just worked. These project were relatively new, had a few years of history only and standard svn layout. For some, there was a need to fiddle around in order to get history right and also to complete migration at all (size).

Process itself was straight forward like this ( yet another, but really compressed tutorial 🙂 ):

1. First get all the SVN authors (committers) in one file as they were called in the ‘svn realm’

This can be done like this:

svn log --quiet | awk '/^r/ {print $3}' | sort -u

then create a plain text file called something like authors.txt. Output of the command above is simply the authors by the username in svn. However, authors in Git are less private and they need to supply full name and email. So file needs to be structured like this (note that you must not have leading spaces!):

svnUsername1 = User One <user.one@email.com>
svnUsername2 = User Two <user.two@email.com>
...

2. No we need to clone svn repo into git repo, which can be done like this:

git svn clone --authors-file=authors.txt --prefix svn/ https://url_to_repo/path_to_repo


3. Last thing to do is to convert svn tags and branches to local git branches:

For branches:

for branch in `git branch -r | grep -v svn/tags | sed 's/ svn\///'`;
do
git branch $branch refs/remotes/svn/$branch
done


and for tags:

for tag in `git branch -r | grep svn/tags | sed 's/ svn\/tags\///'`;
do
git tag -a -m "Imported from SVN" $tag refs/remotes/svn/tags/$tag
done


Check that everything is in there:

git branch --all
git tag

Now, you’ll see why we used special prefix (called svn above) where all the svn remote branches/tags were placed. Besides this, you’ll see your new git branches/tags.

4. Create new git projects wherever you want to host it (github/gitlab/somewhere else) and push to it. Don’t forget to push branches AND tags.

$ git push --all
$ git push --tags

Done.

This is what I call happy flow. Problems emerged when svn repo was simply too big. Too large of a code base with many years history and many tags and branches. What is many? In my case perl utility (used by git svn) crashed on a project with ~500000 lines of code with ~100 tags and branches and ~7 years of history. This also took ~48h of execution until it would ultimately exit with an error. In this case we decided that we are ‘OK’ with ~2 years of history. Furthermore, problem with these repos was that they, at some point in time, contained other projects then the one that is being migrated (bad svn layout). Svn’s own database remembers all this and now the price to pay is that git svn will traverse all the possible history paths where these other projects (that might also be huge on their own). For that you can use flag for specific range:

--revision some_desired_revision_number:HEAD

Second annoying problem was some long forgotten committer to svn repo that first script missed. This will cause whole migration to halt and after ~40h of execution this is no fun. Try to remember everybody you can think of and put them manually in the authors.txt file.

Third and last was the history issue with moved projects. Case here was that main SVN repository has been around for a while and projects were move around a lot. Git svn utility could not properly traverse this changes which left trunk code without history beyond the last move. And by move I mean even simple rename of the project parent folder. Basically any change in svn path. I manage to fix this later in git by checking out latest known tag as a local branch and merge it into master with ‘–strategy ours’ flag keeping all the changes in master (former trunk). In this way I got all the history up until that tag was released. More than good enough for our needs.

All in all it was not too complicated but corner cases do arise and these are just some of I was unlucky to reveal.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.