The Problems of Development
In writing software, one can have a host of problems, especially when the
project grows beyond the limits of a single person, or extends in time beyond
a single revision. Three of problems which I will discuss here are:
- Dealing with multiple versions, allowing users, developers, to report bugs
or changes against historical versions.
- Dealing with multiple developers, allows several coders to work on the same
file at once, and have those changes all merged to one file.
- Tracing accountability -- once you have more than one person, you want to
often know who did what? Or maybe even just when did you do that?
Version control can answer all of these (and many more) questions. Before
I go into depth as to how, let me first talk a little about how version control
works. To begin, I list some terminology:
- SCM (Source Code Management)
- Category of software including version control. I (and many others) will
use SCM an "version control" interchangably even though SCM is arguably
a broader term than version control.
- RCS (Revision Control System)
- The father of SCM. The first (at least first open source) SCM implementation.
- CVS (Concurrent Versions System)
- The poster child for version control. By far the most popular version control
product.
- Repository
- The central storage location for files under version control. Described
in much more detail below.
- Workspace
- Location of files “checked out” from a version control repository.
Also described in detail below.
The general idea behind version control is to allow the developer a place
to store her work as it progresses, while also allowing her instant recal of
any work from any point in time. Version control extends also to a multi-developer
situation, allowing others to recall from any point in time (including now),
make changes, and have those changes merged with changes from other people,
and other times.
Version control is accomplished via the use of two structures (both listed
in our above terminology): workspaces, and repositories. Workspaces are where
you are working, and repositories are where that work is archived or stored.
Repositories hold information including dates, labels, branches, version numbers, and it does so in a change-only format, only keeping track of the changes between individual revisions in order to save space. Repositories are often held on a central server and can be referred to also as depots or roots.
The Checkout/Commit Paradigm
Version control functions of a checkout/commit paradigm. All of your documents
are stored, or archived, in a repository. You check-out copies of those documents,
work on them in your workspace, and then commit the changes in the documents
in your workspace back to the repository. Most version control products are
quite flexable and will allow you to check out sets of files based on date,
tag, branch or any number of other criteria.
Another common command in need of discussion is Update. Update commands are used for automatically bringing your workspace in sync with a repository. Update can also merge non-exclusive changes between files and can initiate conflict resolution. Updates, no matter how implmented, will always solicit information from the user as to how he/she wishes to resolve conflicts. Conflict resolution as part of the update command is one of the most useful aspects of version control.
A Little More About The Repository
Before you see a repository out in the wild, let me give you a little better
picture of what you're seeing. Files are normally stored associated with some
kind of internal versioning number. For CVS (a popular version control system)
these internal numbers are of the form ...
So files start with 1.1.1, or a branch 1.2.1
One piece of repositories which is very useful (but not often enough used)
are branches. Branches, break one set of code from another. The way a tree branches
off from the main trunk. Branching at release time is a very useful technique
used commonly in large scale software development. This allows developers to
continue development on the main tree, while allowing a subset of their staff
to hone the branched code for release. Branching can also be useful for trying
out experimental code, or patches to the code which may take an extended time
to work through. Branches can later on be merged with other branches, or into
the main trunk.
Another feature worth discussing is Tags. Tags are used to associate a group
of files inside the repository. For example, one could apply a "Version 1.0"
tag to all the files which were released with the 1.0 version of your app. Then
later grab those exact files from the repository using the tag name "Version
1.0". (Branches actually work better for encapsulating releases, but this remains
a convient example.)
More Advanced Version Control
The unix "diff" command is also available for all version control systems. For those not familiar with diff, diff produces for the user a line-by-line comparison of two text files, or of two file-system hierarchies. Diffing two file versions in CVS can be incredibly useful to the programmer for finding bugs, as it allows her to see exactly what was changed between any two commits, thus if she knows when the bug first appeared, she can diff those two revisions and know what set of changes must have caused the bug to surface.
All version control repositories also store some form of history information, and provide a "history" (CVS) command or equivalent. This command can be used for tracking what commands were issued against the repository when.
Annotation is another form of reporting which version control systems normally
support. Using an annotate (sometimes blame) command one can get a line-by-line
history of a file, when it was changed/created, and by whom. File annotation
is also useful for tracking bugs and can be combined with reports from diff
or history to give a better idea of the status and history of your code base.
Locking is one final feature of version control. Most SCM implmentations provide a method by which a programmer can "lock" files she is working on, thus disallowing anyone else to commit changes to that file before she is done. This can be a useful feature when working on files for which line-by-line differences are not entirely helpful (XML files, prose, etc), or when one is working on extremely sensitive parts of code. Another person can "watch" a file or set of files to be notified when locks are added or removed from a file (or when any other changes are made).
The Solution That is Version Control
Returning now to our discussion of solutions to common problems. Problems
we had hoped to address:
- Dealing with multiple versions, allowing users, developers, to report bugs
or changes against historical versions.
- Dealing with multiple developers, allows several coders to work on the
same file at once, and have those changes all merged to one file.
- Tracing accountability -- once you have more than one person, you want
to often know who did what? Or maybe even just when did you do that?
Following our disccusion above, the reader may already see solutions, but
for clarity I will review:
For the first, the repository acts as a historical record of code as it was at any point. This solves the question of finding the exact code of a certain release version against with to compare bugs, comments or revisions received at a later date. The developer need only perform a checkout operation, requesting a specific date, tag, branch or internal cvs revision number as appropriate. As necessary the developer can make changes to the code, diagnose the bug, etc, and then merge those changes into the most current code (even years later -- as we all know, some bugs last that long). If you used branching, you can even make fixes to that specific release branch and release patches against your the version (version 1.0.1 etc).
For the second, a central repository model, can easily allow one repository, multiple workspaces. Under this model, each developer checks out their own (or multiple) copies of the source code in question. They make changes, and then using the merge and conflict resolution features of update as discussed above, merge their changes into the repository with a commit command.
Finally, accountability tracking can be done via any of the reporting methods discussed in the previous section. Many 3rd party tools exist, including Apache's Tinderbox and Bonsai for managing SCM repositories. Those can be used to help "blame" the right people for bugs. Tools such as CVSweb turn annotate files into dynamically linked webpages, useful for getting a mental handle on your repository.
A couple other solutions to problems not mentioned here, but still problems deserving mention, are those of web-site control, and preventing "getting lost in ones own code." Web-sites, particularly those edited by multiple people often use version control. Version control is a solution for more than just the developer, but also for the web artist, and the writer and has uses in many domains. The problem of "getting lost in ones own code" is also answered by version control. Commonly as individual projects get big, or extend over long periods of time it is easy to loose track of even code you wrote. Using an SCM system from the begining, can help to fight off this problem, as the developer then know when things were written, and hopefully has logs from each commit she made to help decide what each new piece of code did (even if she didn't comment her code!).
The Tools of the Trade
We've discussed problems facing developers, version control technology, and how version control can be an answer to some problems of development, but we have yet to mention the actual tools of version control. Below is a short discussion of a couple tools, including my biased opinions.
To begin, we have CVS. CVS is the most popular, and one of the oldest forms
of Version Control. CVS started as a bunch of shell scripts written by Dick
Grune, posted to the newsgroup comp.sources.unix in December, 1986. Much of
the current CVS conflict resolution algorithms are derived from those scripts.
In April, 1989, Brian Berliner designed and coded CVS in C. Jeff Polk later
helped Brian with the design of the CVS module and vendor branch support. CVS
has since grown to be the de facto standard for version control, despite a raft
of design problems. CVS is open source and is maintained by developers at http://www.cvshome.org/.
CVS has some nice things, but also has a raft of problems. I list those both
below:
CVS - http://www.cvshome.org/
| The Good |
The Bad |
- Extremely Popular
- Free & Open Source
- Simple & Effective
- Ships on nearly every Linux/Unix OS
- Many 3rd party tools designed for CVS
- A good start to Version Control
- Long history
|
- Maintains NO file meta-data
- Serving was an afterthought
- Poor Branching
- Poor Permissions
- No “change sets”
- Non Atomic
- Security was an afterthought
- Slooooooower than molasses
- Single Repository
- No language support
- Only supports ASCII Text (not Unicode)
|
There are many, many other SCM solutions, and I will mention a few of them here along with the respective merits and faults of Subversion and PerForce.
Subversion - http://subversion.tigris.org/
| The Good |
The Bad |
- Free & Open Source
- Designed to serve
- Host of improvements on CVS
- Fast
- Partially CVS Compatible
|
- Apache dependant
- Only version 0.15 Beta
|
Perforce (p4) - http://www.perforce.com/
| The Good |
The Bad |
- Free to Open Source
- High performance
- Great support
- Partially CVS Compatible
|
- Expensive
- Larger & more complex than CVS
- Single Repository
|
And 5 others worth mentioning:
Closing remarks
As most of you reading this are Mac developers, I should also mention that
Project Builder under OS X integrates directly with CVS version control and
was originally planned (and may still) interface with other products. Using
CVS with Project Builder is still a bit of an art, but once you get used to
it is very easy. (Apple provides a guide.)
Using CVS from the command line is also extraordinarily easy if you are comfortable
at a command line. (Apple even provides a tutorial
and the manual.)
Those looking for more powerful version control solutions might first look to
Perforce (which only has a command line interface -- but at that time the developer
should be familiar with the command line anyway).
I hope this article has provided you at least a brief overview of version control. I did not go into the technical details of using any one technology, but had hoped mostly to convince at least the small developer that it is worth his/her time to learn a little about version control and use it for both their existing products and new products to come, both big and small. I began using version control about 3 years into development and have not looked back since, a wonderful, necessary technology.
Eric Seidel, resident mac geek and security freak, works as a remote Intern for the BSD Technologies group of Apple Computer under FreeBSD guru Jordan Hubbard. Eric is the author of a number of popular macintosh applications including mod_rendezvous, OpenAG. When not programming or studying, he fills his free time teaching swing dancing and unicylcing. Eric will graduate from Lawrence University with a BA in Mathematics this coming June with the intention of pursuing a PhD in Computer Science.
|