Wednesday, December 26, 2012

How I Migrated My Source Code from Subversion to Git

While at University in the eighties, I didn't learn much in the way of useful software development tactics or strategies. It was just a question of fulfilling the homework requirements and getting past the exams. I was an EE student and not a ComSci major, after all. When I started my first job out of college, which was as much about programming as it was engineering, I was introduced to version control in the form of the PolyTron Version Control System, or PVCS. At that time, I worked in a two-person coding group. PVCS kept us from stepping on each other's work and allowed us to track our changes, using nothing more than the legendary MS-DOS command line.
 
When I moved on to my second job, the coding environment was more complex and chaotic. I was surprised that the version control strategy consisted of occasionally pkzipping the source files. The product itself was somewhat confused. I've worked on many projects since then, and it always seems that the better-run projects have their VCS strategies mapped out. The development team's understanding of how to best use a VCS seems to correlate to the quality of the system being built. If the developers use the VCS as a dumping ground, with little distinction between it and a group of folders on a file share, it is often true that the project itself will be a mess. Ditto if checking code in happens infrequently, as some sort of afterthought. Ditto for lack of integration testing, among other things. As time has gone on, it seems that more and more teams do avail themselves of VCS and do learn the "ins-and-outs" of their chosen VCS. I believe that the large number of distributed, open-source projects that exist today have contributed to this positive development.
 
I've used a number of different VCS over the years. As I mentioned, PVCS was first. (I once interviewed for a job that entailed providing integration between PVCS and Visual Basic.) The second one was a version control system on SunOS called SCCS. IIRC, SCCS was extremely basic, with a pedigree that dates back to the seventies. I spent a couple of years working with SCCS and make files. I helped evaluate ClearCase for a few days. (We didn't adopt it. IIRC, it was pretty neat, but it took up too many resources on our HP minicomputer and we couldn't afford enough licenses for our development team.)  I used Microsoft's Source Safe and TFS. I managed to avoid using CVS for very much. 
 
My go-to VCS software for the better part of the last decade has been Subversion. I like Subversion for a few reasons. It is tiny and easy to install. Creating a repository is so easy that you can create a repository for a project that will only run for a day or two and throw the repository away afterwards. Subversion has built-in diff and merge tools, making it easy to see what you have changed between commits. It is easy to set up a server for subversion. I used to run my own subversion servers, on top of apache on linux using old Sun and Dell workstations and dealing with firewalls, port forwarding and DDNS. Eventually, after a series of power failures at my apartment, I moved my Subversion to a freebie online service. Running my own server was just too much effort and not enough reward. No one was going to hire me to run their Subversion server for them, so there wasn't much use in practicing all the time.
 
So, why move away from Subversion? I have been contemplating such a move for a couple of years. One (weak) reason is that I'm not using my repository as much as I used to. My Subversion repository has 3043 commits, spaced out over seven years. I haven't been committing very much this year (the last change was in July) as I find myself doing less and less interesting stuff. In addition to fewer commits, I'm keeping less in my Subversion repository than I used to. I'm relying on Microsoft's SkyDrive and similar products for some of the stuff that I used to keep in my cloud-based Subversion repository. For example, I've moved my copies of the SysInternals utilities from my Subversion repository to SkyDrive.
 
A stronger reason is that the popularity of Subversion seems to be dwindling in favor of Mercurial and (particularly) Git. Git has a lot of weight behind it because it is used by the linux kernel developers. Both systems are interesting in that they don't rely on a central point of control. Instead, everyone has a full copy of the repository and updated source code is passed around using a more egalitarian method. Each repository is more of a peer than a centralized point of control.
 
I'm not particularly worried about sharing my own stuff (it isn't groundbreaking or anything), but I think that knowing Git is more useful than knowing Subversion nowadays. I certainly see Git mentioned in more job want ads than Subversion. I don't think that you can really learn something without using it frequently (at least monthly, preferably daily), so it's time to chuck out Subversion and pick up Git. 
 
I'm not going to recite each detail of the process here, as there is plenty of information available online (you've probably already seen much of it) and no one has asked me for details. I'm only going to provide my highlights and observations.
 
Moving my repository from Subversion to Git was simple. Basically, Git supports direct importation of Subversion repositories via the "git svn clone" command. All you really need to do the import is the Git software. I installed the software, did a little prep work and ran the import process.
 
(It also seems that you can use Git as a sort of front-end to a shared Subversion repository. This way, you can use the Git tools without forcing a migration away from Subversion. This is an interesting idea and it certainly might be useful on large projects. I'm an army of one here, with no other person's wishes to get in the way. I'd rather keep things simple, so I did not explore this feature.) 

Installing the Windows version of Git is as easy as downloading an installer from a web site and running it. (You don't need a full-blown linux-like environment like Cygwin. I have used Cygwin before and Cygwin (particularly the X server) is great, but if you don't need a lot of linux "stuff" and you don't need bash, then Cygwin is overkill.) If you are on Windows, you want to make sure that you are running a recent enough build of Git. If not, the Subversion functionality may not be included. If you are on linux, it is possible that you already have git on your system. You may need to install additional packages such as git-core and git-svn. Exactly what packages must be installed and the exact commands to install those packages may vary by distribution, as it is with all things linux.

The only pre-migration task to be done is to create a small text file that maps users from the old Subversion repository to email addresses in the new Git repository. This was simple for me (four old users each map to one common email address), but could be tricky in environments with many users, particularly if those users have moved on. The hardest part of creating the text file was making sure that I had an entry for each of the old users in the Subversion repository because I wasn't 100% sure how many users I had used in the last nine years. There are scripts available that can look through the history of the relevant Subversion repository to create a text file that functions as a starting point for you to edit.
 
While actually running the import process, I ran into two real problems. The first problem was that the import process asks for user input at a certain point (it asks about the security fingerprint for the old subversion repository) and it seems that the libraries that are used to accept that user input do not work inside of a PowerShell shell window. My workaround for that was to Ctrl-C out of the attempt, close the PowerShell shell window and then use an old-fashioned CMD console window to run the import process. That worked fine and, so far, I haven't had the problem come up again and running Git in a PowerShell shell window has been OK.
 
The second problem cost me much more time. I used the --stdlayout command line switch because all of the examples I had seen did. The switch causes the import process to assume a certain, widely-used layout for the source code that exists in the Subversion repository. I did not follow this layout when I initially set up my Subversion repository and I never missed it, partly because my code is simple and partly because I haven't had the opportunity to bring others into the project. In short, with the switch, the import process looks for source files in certain locations. Since I did not set up those locations, the import process didn't find anything to import. The process simply ran for a while and then reported success without actually importing any code into my new Git repository. After spending some time swearing, I broke down and read the documentation on the command line switches. I realized that --stdlayout was trying to do something that didn't pertain to my repository. Removing the --stdlayout switch allowed the process to go forward.
 
As an exercise, I also ran through the same import using my linux Mint Virtual Machine (VM) running on my ESXi 5.1 server. The results were pretty much the same. Obviously, I used bash and I didn't have the PowerShell input problem. For whatever reason, running the import in the VM was around twice as fast as running it "on the metal" on my Windows 7 laptop. This is surprising because my ESXi server is pretty slow, being a six year old HP workstation with a pair of lowly 5150 Xeons.
 
Now that I have my repository out of Subversion and in Git, all I have to do is get used to the Git workflow and command line syntax.

No comments: