Wednesday, December 26, 2012

How I Migrated My Source Code from Subversion to Git

While at University in the eighties, I didn't learn much in the way of useful software development tactics or strategies. It was just a question of fulfilling the homework requirements and getting past the exams. I was an EE student and not a ComSci major, after all. When I started my first job out of college, which was as much about programming as it was engineering, I was introduced to version control in the form of the PolyTron Version Control System, or PVCS. At that time, I worked in a two-person coding group. PVCS kept us from stepping on each other's work and allowed us to track our changes, using nothing more than the legendary MS-DOS command line.
When I moved on to my second job, the coding environment was more complex and chaotic. I was surprised that the version control strategy consisted of occasionally pkzipping the source files. The product itself was somewhat confused. I've worked on many projects since then, and it always seems that the better-run projects have their VCS strategies mapped out. The development team's understanding of how to best use a VCS seems to correlate to the quality of the system being built. If the developers use the VCS as a dumping ground, with little distinction between it and a group of folders on a file share, it is often true that the project itself will be a mess. Ditto if checking code in happens infrequently, as some sort of afterthought. Ditto for lack of integration testing, among other things. As time has gone on, it seems that more and more teams do avail themselves of VCS and do learn the "ins-and-outs" of their chosen VCS. I believe that the large number of distributed, open-source projects that exist today have contributed to this positive development.
I've used a number of different VCS over the years. As I mentioned, PVCS was first. (I once interviewed for a job that entailed providing integration between PVCS and Visual Basic.) The second one was a version control system on SunOS called SCCS. IIRC, SCCS was extremely basic, with a pedigree that dates back to the seventies. I spent a couple of years working with SCCS and make files. I helped evaluate ClearCase for a few days. (We didn't adopt it. IIRC, it was pretty neat, but it took up too many resources on our HP minicomputer and we couldn't afford enough licenses for our development team.)  I used Microsoft's Source Safe and TFS. I managed to avoid using CVS for very much. 
My go-to VCS software for the better part of the last decade has been Subversion. I like Subversion for a few reasons. It is tiny and easy to install. Creating a repository is so easy that you can create a repository for a project that will only run for a day or two and throw the repository away afterwards. Subversion has built-in diff and merge tools, making it easy to see what you have changed between commits. It is easy to set up a server for subversion. I used to run my own subversion servers, on top of apache on linux using old Sun and Dell workstations and dealing with firewalls, port forwarding and DDNS. Eventually, after a series of power failures at my apartment, I moved my Subversion to a freebie online service. Running my own server was just too much effort and not enough reward. No one was going to hire me to run their Subversion server for them, so there wasn't much use in practicing all the time.
So, why move away from Subversion? I have been contemplating such a move for a couple of years. One (weak) reason is that I'm not using my repository as much as I used to. My Subversion repository has 3043 commits, spaced out over seven years. I haven't been committing very much this year (the last change was in July) as I find myself doing less and less interesting stuff. In addition to fewer commits, I'm keeping less in my Subversion repository than I used to. I'm relying on Microsoft's SkyDrive and similar products for some of the stuff that I used to keep in my cloud-based Subversion repository. For example, I've moved my copies of the SysInternals utilities from my Subversion repository to SkyDrive.
A stronger reason is that the popularity of Subversion seems to be dwindling in favor of Mercurial and (particularly) Git. Git has a lot of weight behind it because it is used by the linux kernel developers. Both systems are interesting in that they don't rely on a central point of control. Instead, everyone has a full copy of the repository and updated source code is passed around using a more egalitarian method. Each repository is more of a peer than a centralized point of control.
I'm not particularly worried about sharing my own stuff (it isn't groundbreaking or anything), but I think that knowing Git is more useful than knowing Subversion nowadays. I certainly see Git mentioned in more job want ads than Subversion. I don't think that you can really learn something without using it frequently (at least monthly, preferably daily), so it's time to chuck out Subversion and pick up Git. 
I'm not going to recite each detail of the process here, as there is plenty of information available online (you've probably already seen much of it) and no one has asked me for details. I'm only going to provide my highlights and observations.
Moving my repository from Subversion to Git was simple. Basically, Git supports direct importation of Subversion repositories via the "git svn clone" command. All you really need to do the import is the Git software. I installed the software, did a little prep work and ran the import process.
(It also seems that you can use Git as a sort of front-end to a shared Subversion repository. This way, you can use the Git tools without forcing a migration away from Subversion. This is an interesting idea and it certainly might be useful on large projects. I'm an army of one here, with no other person's wishes to get in the way. I'd rather keep things simple, so I did not explore this feature.) 

Installing the Windows version of Git is as easy as downloading an installer from a web site and running it. (You don't need a full-blown linux-like environment like Cygwin. I have used Cygwin before and Cygwin (particularly the X server) is great, but if you don't need a lot of linux "stuff" and you don't need bash, then Cygwin is overkill.) If you are on Windows, you want to make sure that you are running a recent enough build of Git. If not, the Subversion functionality may not be included. If you are on linux, it is possible that you already have git on your system. You may need to install additional packages such as git-core and git-svn. Exactly what packages must be installed and the exact commands to install those packages may vary by distribution, as it is with all things linux.

The only pre-migration task to be done is to create a small text file that maps users from the old Subversion repository to email addresses in the new Git repository. This was simple for me (four old users each map to one common email address), but could be tricky in environments with many users, particularly if those users have moved on. The hardest part of creating the text file was making sure that I had an entry for each of the old users in the Subversion repository because I wasn't 100% sure how many users I had used in the last nine years. There are scripts available that can look through the history of the relevant Subversion repository to create a text file that functions as a starting point for you to edit.
While actually running the import process, I ran into two real problems. The first problem was that the import process asks for user input at a certain point (it asks about the security fingerprint for the old subversion repository) and it seems that the libraries that are used to accept that user input do not work inside of a PowerShell shell window. My workaround for that was to Ctrl-C out of the attempt, close the PowerShell shell window and then use an old-fashioned CMD console window to run the import process. That worked fine and, so far, I haven't had the problem come up again and running Git in a PowerShell shell window has been OK.
The second problem cost me much more time. I used the --stdlayout command line switch because all of the examples I had seen did. The switch causes the import process to assume a certain, widely-used layout for the source code that exists in the Subversion repository. I did not follow this layout when I initially set up my Subversion repository and I never missed it, partly because my code is simple and partly because I haven't had the opportunity to bring others into the project. In short, with the switch, the import process looks for source files in certain locations. Since I did not set up those locations, the import process didn't find anything to import. The process simply ran for a while and then reported success without actually importing any code into my new Git repository. After spending some time swearing, I broke down and read the documentation on the command line switches. I realized that --stdlayout was trying to do something that didn't pertain to my repository. Removing the --stdlayout switch allowed the process to go forward.
As an exercise, I also ran through the same import using my linux Mint Virtual Machine (VM) running on my ESXi 5.1 server. The results were pretty much the same. Obviously, I used bash and I didn't have the PowerShell input problem. For whatever reason, running the import in the VM was around twice as fast as running it "on the metal" on my Windows 7 laptop. This is surprising because my ESXi server is pretty slow, being a six year old HP workstation with a pair of lowly 5150 Xeons.
Now that I have my repository out of Subversion and in Git, all I have to do is get used to the Git workflow and command line syntax.

Thursday, December 13, 2012

Notes from PSSUG: SQL Server on Windows Azure and Visual Studio Debugging & Source Control

This blog entry consists of my notes and thoughts from the PSSUG meeting at Microsoft's offices in Malvern on 2012/12/12. Any misunderstandings are mine. The evening was broken up into three parts.

First, Kevin Howell presented "Visual Studio 2012: A Complete IDE (Debugging and Source Control". He covered some of the enhancements since Visual Studio 2010. He did a quick demo of server-side debugging of a stored procedure, and pointed out a few configuration changes that you must make to your solution before you can use this feature.
Kevin particularly likes the schema comparison feature, now part of SQL Server Data Tools, which I think debuted back in Visual Studio 2008 or 2005 as an add-on/down-loadable thing. He points out that you need to be using the Premium or Ultimate editions of Visual Studio 2012 to use schema comparison feature. The comparison feature does seem greatly improved, even over Visual Studio 2010 and is light years beyond what was available five years ago. To me, the VS interface still looks very busy, and it seems hard to know where to look for particular things.
The solution/localdb duality also seems a little confusing. I presume that a developer can make changes to the solution without updating the database running on localdb. When you then compare the localdb database to the "production" database running on a server somewhere, you will not see those changes. Microsoft does not seem to believe in shared database development environments.

One of the things that Kevin did was use the "Clean" menu pick before a "Build". I'd never noticed Clean before, even though I've used SSDT's predecessors to reverse engineer scores of databases. If you are having trouble with odd build errors, you might want to give Clean a try.

With the new Visual Studio comes a new version of TFS. (I keep meaning to switch from subversion to git.) It seems that you need to be running SQL Server 2008 R2 to host TFS 2012 and that you need to have Full Text Search (FTS) installed. (You need FTS to use TFS. Is there no coordination between acronym teams at Microsoft?)
Second, there was some time spent on PSSUG business.
  • The election of board members was completed.
  • Upcoming PSSUG and PASS events were detailed. Most importantly, as far as I am concerned, SQL Saturday #200 will be held in June, 2013. There is some talk of having more sessions than last year and perhaps some sessions that could be longer and more detailed than the one hour sessions that have been the rule in the past. In any event, it should be a great day as many people will have much practical experience with SQL Server 2012 by then. IIRC, the last SQL Saturday held here filled up a month or more in advance. You can register now.
  • The Microsoft Technology Center (MTC) in Philly has created a web site to provide MTC news and videos shot during PSSUG presentations. (Microsoft's offices are really in Malvern, but I'll let that slide.)
  • This might only be news to me, but SQL PASS will be held in Charlotte next year. Perhaps I can scare up the budget for Charlotte, it might even be within driving distance of Ashdar World Headquarters here in Downingtown. Seattle is awesome but it's a big drain for a small business.

Third, Rodger Doherty of Microsoft presented on "SQL Server on a Windows Azure Virtual Machine". He points out that SQL Server 2008, SQL Server 2008 R2 and SQL Server 2012 are supported. Since we are dealing with Virtual Machines (IaaS) and not SQL Azure's (PaaS), I would guess that older versions of SQL Server would run. However, if you have functionality or performance problems you should not expect to get help from Microsoft.

While Windows Azure provides some redundancy due to the way data disks are implemented, it does not provide a complete High Availability (HA) solution by itself. For true HA, you must use features provided by SQL Server. There is no support for Windows Failover Clusters, but database mirroring and AlwaysOn are supported. (The one quirk is that AlwaysOn listeners are not supported "yet". This might be a networking/VPN problem than a SQL Server problem.) The biggest HA hurdle to get over is latency, especially if you want to be in more than one region.
I'm 95% sure that log shipping still works, too. Microsoft doesn't seem to talk about log shipping very much anymore, but it occupies a warm spot in my heart. Log shipping is so non-fancy that I think it would be hard to break. If Microsoft's version has a problem, you can always write your own log shipping implementation in a day if you need to.

Licensing, as always, demands some attention. Right now, you have two options. There is "pay by the hour" licensing, in which you select from a gallery of images. This works pretty much like AWS. These images are sysprepped, apparently just like you would prepare them for a local, multiple-install scenario. It's important to realize that, once set loose, the image will first go through the usual steps of a sysprepped image and may reboot several times before the VM is ready for use. This process could take as long as 10 or 15 minutes but it is still much faster than going through a requisition process for a new local physical server.

If you are on Software Assurance, you can opt for "bring your own" licensing. This means that you could possibly do a p2v of a machine, copy the VHD to Azure and start running it on Microsoft's equipment.

The big part that seems to be missing is something like AWS's "reserved instances", where you can get a discount from the hourly rate if you commit to a longer period of time. There doesn't seem to be support for MSDN license holders yet and TechNet members are definitely shut out, though I do see references to "free" account offers floating around every few months. With AWS giving away small VM instances and a price war looming or already in effect, it would be great for Microsoft to follow suit.

As far as running SQL inside of Windows Azure, one should not expect high-performing I/O. You should expect more latency to the disks than you would get in a local set up. It seems that performance is more in line with consumer-grade SATA drives than 15 KRPM SAS drives or SSD. While there is an option to enable write-caching on drives that you put SQL Server data on, you shouldn't do that. It can lead to corrupted databases. If you need speed, you should probably work to ensure that the memory allocation for the VM can hold the entire database.

Each VM has a "D:", which is a non-persistent drive that is intended for data that does not have to survive a reboot. You should consider putting tempdb on these D: disks. If you put tempdb on the "C:", it will count against your storage and you will pay for it. Since the "D:" comes up with nothing on it (not even directories), you need to do a little work to create the directory path and relevant ACLs before SQL Server starts and tries to build the tempdb database somewhere.

Rodger kept using the word "blade". This might be a colloquialism or it might mean that Windows Azure is actually running on blade servers. If so, this just underlines the fact that these systems are not meant for high-performance, bleeding-edge databases. During the Q&A session, he cautioned against moving very large data warehouse implementations to Windows Azure, though smaller data warehouse implementations may be fine.
Pricing is by "VM size". You have a small number of choices. This is similar to the way that AWS started. AWS has added options as time has gone on and demand has picked up. There is no customization of one of these sizes. You can't say "I want lots of cores and RAM, but not so much disk space".
Page compression is supported in Windows Azure VMs. You do still need to be running Enterprise Edition SQL Server, however. The CPU performance penalty seems to be something on the order of 5%. YMMV.

Rodger stressed that these servers are on the open Internet and that you should be careful about opening firewall ports and choosing passwords. If you are running SQL Server on the default port (1433), your SA account will get probed. It is suggested to use port mapping and unusual ports for common services like RDP or SQL Server. If you can avoid opening those ports to the world, that would be better. There is a VPN available, but it is only compatible with certain hardware VPN devices at this time. Such devices aren't necessarily expensive, costing perhaps a few hundred dollars US, but they would tie your access down to one physical location and that might not be easily workable if you have several remote DBAs or are a traveling demo person.
In any case, I would advocate (re-)reading any SQL Server security best practices document that you can find before you start creating servers.

Another caution is that this new virtualization environment isn't officially released yet. That means that Microsoft may reset the environment in some way, nuking your stuff. Unless you can tolerate a couple of days of downtime while you rebuild everything, putting production systems up on Windows Azure right now isn't a great idea. After Microsoft officially opens Windows Azure for business, they are aiming for 99.5% uptime for non-HA systems and 99.95% uptime for systems that use some sort of HA technology.
An Active Directory server should use the smallest instance, as AD simply doesn't need many resources. You can extend your existing AD infrastructure into Microsoft's cloud.
There is a cost reporting scheme. It doesn't seem to be as flexible as what AWS provides, but it may be simpler and allow you to jump into IaaS more quickly.
There is an Early Adoption Cook Book with much useful advice.
Performance troubleshooting of systems may be difficult since there is no access into performance counters from the "host" of the VM.
To sum up my impressions, Windows Azure looks like it could easily grow into a AWS competitor. They will need to make many improvements before they are a match for AWS. To me, the two most immediate needs are a larger variety of instances (with respect to cores, sizing, bandwidth, etc) and some sort of reserved instance pricing. Microsoft has deep pockets and can ramp things up over a period of months or years, just like they have been fighting VMWare with Hyper-V. I would only use Windows Azure for development or test systems first and I would start with systems that require less performance.
That should be the last PSSUG meeting for 2012. See you next year.