CollabNet
Submerged - CollabNet's Subversion Blog
CollabNet Community

CollabNet Blogs

  • Submerged Home
  • CollabNet Home
  • openCollabNet

Categories

  • Administration (8)
  • Client Tools (9)
  • General (30)
  • Subversion Client (23)
  • Subversion Events (2)
  • Subversion in the Enterprise (26)
  • Subversion Server (14)

Past 6 Months

  • May 2008 (2)
  • April 2008 (2)
  • March 2008 (3)
  • February 2008 (3)
  • January 2008 (4)
  • December 2007 (2)

Archives

All Archives...

Considerations when upgrading to Subversion 1.5

One of the most common questions that occurs whenever a new release of Subversion (or any product) comes out, has to do with the considerations you need to take into account before upgrading.  I will try to cover some of those in this post.

Subversion's compatibility guidelines guarantee that any 1.x client can talk to any 1.x server.  So you do not have to upgrade the client and server at the same time.  The 1.5 client can talk to 1.0-1.4 servers and any 1.0-1.4 clients can talk to a 1.5 server.

Those same guidelines say that you never have to reload a repository when upgrading the server.  You can upgrade your server from 1.0-1.4 to 1.5 and you do not need to do anything to your repositories.  The new server will silently make any updates that are needed, when they are needed.  In the case of 1.5, a new SQLite database file will be added to the repository to store indexes that speed up the new merge tracking function.  Even though you do not have to reload your repository, there are often small efficiencies you can gain by doing a dump/load of your repository.  In the case of this release, there is a new storage mechanism for fsfs repositories.  Instead of storing all of the revisions in a single folder, we now "shard" the repository by creating a sub-directory for each 1000 revisions. This does not improve performance of Subversion, but it might improve performance of things like backup utilities or other tools you use to work with the file system on the server.  In this case, you do not have to dump/load to get this feature, there is a Python script you can run that will do the sharding for you.  This script will run relatively quickly, certainly orders of magnitude faster than a dump/load.

Some features of Subversion 1.5 will require that you have a 1.5 server and a 1.5 client to enjoy the full benefits of the feature.  For example, the merge tracking feature requires that both the client and server are running 1.5.  If one of them is running a lower version than you get the merge functionality that existed in that version of Subversion.  If you want to take advantage of merge tracking you really need to update your server and all of the clients that are doing merges.  Doing a merge with an old client will work, but will not set any of the merge tracking information when the change is committed.  This simply means that someone using a 1.5 client later will attempt to merge the same info again and get conflicts.  The svn merge --record-only command can be used to fix this after the fact.

When using the new sparse-checkouts functionality it really helps to have a 1.5 server.  The feature will work with an older server, but in most cases the server will still send the extra information (file contents) to the client and the client has to discard what it does not want.

There is an important client-side compatibility feature to take into account.  Some of the new features of 1.5 such as sparse-checkouts and changelists necessitate a new client working copy format number.  So this means as soon as you perform an operation that needs to re-write your working copy (checkout/update/switch/commit/propset etc.) then the internal format of the working copy will be upgraded to the new format for Subversion 1.5.  Once this happens, you will no longer be able to use a 1.4 client with that same working copy.  The same sort of change occurred with the 1.3 to 1.4 upgrade process.  We know that this really impacted many users that were stuck using an older client, so we are including a Python script that you can run to downgrade a working copy back to the 1.4 format.  We will distribute this as an EXE for Windows users.  Just to make this perfectly clear, you can have a 1.4 client and 1.5 client on the same system.  You just need to be careful to not mix their usage on the same working copy.  Also, pure-read operations like (status/info/ls) do not update the working copy.  So simply viewing a working copy with a new version of TortoiseSVN or SCPlugin installed does not modify the working copy format.  Conversely, however, an old version of one of these tools will not be able to read a 1.5 working copy.

Hopefully this answers the questions about upgrading to Subversion 1.5.  It is going to be a great release and I certainly hope people choose to upgrade when it comes out.  If you have additional questions that I did not cover, please post them in the comments or on the Subversion Users forum on openCollabNet.

Posted by Mark Phippard | Permalink | Comments (0) | TrackBack (0)

Subversion 1.4.5 Released

Subversion 1.4.5 was released today.  You can download the updated CollabNet Subversion binaries immediately.

Subversion 1.4.5 contains a fix for a security exploit on Windows clients. This exploit was discovered and reported by researchers at the Colorado Research Institute for Security and Privacy.

The only change from Subversion 1.4.4 is the patch for this security exploit.  Since the exploit only affects Windows clients, we decided to only release CollabNet Subversion 1.4.5 packages for Windows. There is no point for someone who is already running 1.4.4 on any other operating system to update to 1.4.5.

I am not going to give a lot of details about the exploit, you can find more information at various security reporting sites, such as CVE.  I will say that it was a legitimate exposure that made it possible for the Subversion client to write files outside the normal working copy.  That being said, there are a couple of points to make:

  1. Creating the exploit requires commit access to the repository.  If you can trust the people who have write access to the repository, then you do not have too much to be concerned about. The keyword in that sentence is "trust". If you are checking out from a repository you cannot completely trust, such as on a public hosting service, then be careful and update to 1.4.5 first.
  2. While the exploit itself is pretty easy to produce, it is also pretty difficult to use it in a way that would cause harm.
  3. You can only create the exploit from a non-Windows platform.
  4. There is nothing terribly secretive about the exploit.  If you send commit emails, or even just browse your repository using svn ls, this exploit would stand out as not normal.

If you are running a Subversion client on Windows, this would include the command line client as well as any graphical client such as TortoiseSVN or Subclipse, then you should definitely go ahead and install this version of Subversion.  I would recommend that users of earlier versions such as 1.3.2 or 1.2.3 also install this update immediately. The Subversion 1.4.5 client can talk to any 1.x version of the server, so there is no reason not to update your client (for compatibility: if you have the command line and a GUI client, update them both).

Subversion servers are not affected by this exploit.  That being said, a Windows server that uses the Subversion client in scripts would still be vulnerable and should be updated to 1.4.5.

Posted by Mark Phippard | Permalink | Comments (1) | TrackBack (0)

Mirroring Repositories with svnsync

Terminology

To best discuss svnsync without getting confused, we should establish some common terminology before going any further:

  • Master: The live read/write repository that will be mirrored via svnsync.
  • Mirror: The read-only repository that is synchronized with the master via svnsync.

Overview

svnsync is a utility that became part of the standard Subversion offering when 1.4 was released and is described as a program that "provides all the functionality required for maintaining a read-only mirror of a Subversion repository." While understanding the purpose of svnsync based on it's documentation is simple, why would maintaining a mirror repository be important in the enterprise? With each Subversion implementation being different, there can be many reasons but there are a couple common reasons as to the importance of maintaining mirror repositories:

  • Provide a backup repository: This can be beneficial for failover, soft-upgrade, etc.
  • Provide a simple read-only repository: Some people want a simple way to provide read-only access to a repository. With svnsync, you can easily achieve this without maintaining authorization files and such. (For example: To maintain a community access point to a repository while using a different repository for the actual developer actions.)

These are just a couple examples but should give you an idea as to what value svnsync can provide. (For a more detailed explanation, please refer to the "Repository Maintenance" chapter of the "Version Control with Subversion" book.) While I could jump right in to script suggestions and examples, doing so would be a shame. To really understand why we are doing what we are, we should really understand how svnsync works. I will be brief in my explanation and then we will go into example scripts and suggestions you can apply in your Subversion implementation.

Understanding svnsync

The way svnsync works is actually pretty simple: Take revisions from one repository and "replay" them to another. This means that the mirror repository plays by the same rules as the master repository. The user account performing the actions against the mirror repository must have write access to that mirror repository. The "secret sauce" that makes svnsync work is due to Subversion maintaining the necessary metadata to know what needs to be synchronized in special revision properties on revision 0 of the mirror repository. That is it. That is how svnsync works and although it is easy to understand, to make svnsync work as designed, there are a few "rules" you need to be aware of. The following is a list of rules and/or best practices for using svnsync:

  • The synchronizing user needs read/write access to the complete mirror repository.
  • The synchronizing user needs to be able to modify certain revision properties.
  • The mirror repository needs to be read-only for all users except the synchronizing user.
  • Before you can synchronize a mirror repository with the master, the mirror repository needs to be at revision 0.

Now that we know what svnsync is and how it works and why it might be useful, let's learn how we can start synchronizing a mirror repository with our master repository using svnsync.

Implementing svnsync

The only real prerequisite for implementing svnsync is to have a repository that you want to mirror already created prior to starting this process. Once that is complete, you can follow the steps outlined below:

Step 1: Create Mirror Repository

svnadmin create MIRROR_REPO_PATH

Step 2: Make Mirror Repository Only Writable by Synchronizing User

To make the mirror repository only writable by the synchronizing user, which in our example will be "svnsync", we have a few options. One option is to use the authz functionality of Subversion with a default access rule like this:

[/]
* = r
svnsync = rw

The other option is to use the start-commit hook to check for the svnsync user. Here is an example, as a shell script:

#!/bin/sh

USER="$2"

if [ "$USER" = "syncuser" ]; then exit 0; fi

echo "Only the syncuser user may commit new revisions as this is a read-only, mirror repository." >&2
exit 1

Step 3: Make Mirror Repository Revision Properties Modifiable by Synchronizing User

To do this, we need to create a pre-revprop-change hook with something similar to the following example, as a shell script:

#!/bin/sh

USER="$3"

if [ "$USER" = "syncuser" ]; then exit 0; fi

echo "Only the syncuser user may change revision properties as this is a read-only, mirror repository."  >&2

exit 1

Step 4: Register Mirror Repository for Synchronization

Perform the following svnsync command on any system:

svnsync initialize URL_TO_MIRROR_REPO URL_TO_MASTER_REPO --username=svnsync --password=svnsyncpassword

If everything is configured properly, you should see some output like this:

Copied properties for revision 0.

Now that you have registered your mirror repository for synchronization with the master repository, we should go ahead and perform the initial synchronization so that the mirror and the master repository are synchronized.

Step 5: Perform Initial Synchronization

To make sure everything is ready and to perform the initial synchronization, on any system, perform the following:

svnsync synchronize URL_TO_MIRROR_REPO --username=svnsync --password=svnsyncpassword

If everything synchronized property, you should see some output similar to this:

Committed revision 1.
Copied properties for revision 1.
Committed revision 2.
Copied properties for revision 2.
Committed revision 3.
Copied properties for revision 3.
…

Step 6: Automate Synchronization with post-commit Hook

Now with the initial synchronization out of the way, all that needs to happen now is to write a script to be ran either as a scheduled process or as a post-commit hook to synchronize your mirror repository with the master repository. I suggest the post-commit option as it gives you the best chance of having a mirror repository as up-to-date as possible.  Here is an example hook that might be used on the master repository to synchronize a mirror repository as part of the post-commit hook.  As a shell script:

# Example for synchronizing one repository from the post-commit hook
#!/bin/sh
SVNSYNC=/usr/local/bin/svnsync
$SVNSYNC synchronize URL_TO_MIRROR_REPO --username=svnsync --password=svnsyncpassword &

exit 0

That is it. Once you have followed the steps outlined above, you should have a mirror repository that is kept up to date automatically when someone modifies the master repository. This also concludes our introduction to svnsync and how to implement it.

Posted by Jeremy Whitlock | Permalink | Comments (19) | TrackBack (0)

Single Repository or Many?

My previous blog entry discussed the issue of repository layout. This entry will try to answer the question of whether you should have one repository per project or a single repository that houses all your projects. There is not going to be a single right answer to this question. Hopefully this post will help you understand the tradeoffs so you can make the right decision that suits your requirements. These are some of the advantages of the single repository approach.

  1. Simplified administration. One set of hooks to deploy. One repository to backup. etc.
  2. Branch/tag flexibility. With the code all in one repository it makes it easier to create a branch or tag involving multiple projects.
  3. Move code easily. Perhaps you want to take a section of code from one project and use it in another, or turn it into a library for several projects. It is easy to move the code within the same repository and retain the history of the code in the process.

Here are some of the drawbacks to the single repository approach, advantages to the multiple repository approach.

  1. Size. It might be easier to deal with many smaller repositories than one large one. For example, if you retire a project you can just archive the repository to media and remove it from the disk and free up the storage. Maybe you need to dump/load a repository for some reason, such as to take advantage of a new Subversion feature. This is easier to do and with less impact if it is a smaller repository. Even if you eventually want to do it to all of your repositories, it will have less impact to do them one at a time, assuming there is not a pressing need to do them all at once.
  2. Global revision number. Even though this should not be an issue, some people perceive it to be one and do not like to see the revision number advance on the repository and for inactive projects to have large gaps in their revision history.
  3. Access control. While Subversion's authz mechanism allows you to restrict access as needed to parts of the repository, it is still easier to do this at the repository level. If you have a project that only a select few individuals should access, this is easier to do with a single repository for that project.
  4. Administrative flexibility. If you have multiple repositories, then it is easier to implement different hook scripts based on the needs of the repository/projects. If you want uniform hook scripts, then a single repository might be better, but if each project wants its own commit email style then it is easier to have those projects in separate repositories

This is just a sampling of the pros and cons of each approach. Hopefully it gives you something to go on to make a decision. I tend to prefer the one repository per project approach with the caveat that if I have many projects that are related to each other, I would put those all in the same repository. I also tend to break up repositories by group or team, although in reality this is just a variation of the project concept.

For example, I had one repository for the Documentation department to use for their projects. Of course in the case of on-line help that was often located in the same project as the application code, but the Documentation team also had other materials they produced and I gave them a repository for that. Likewise, the Marketing department had a repository to store the things they worked on, including the company web site. As was the case with the layout of the repository, this is really just a decision of what will work best for you. That  said, it is a little more difficult to change your repository setup after the fact.

So, it is worth it to take some time to understand your requirements and which approach best suits them.

Posted by Mark Phippard | Permalink | Comments (3) | TrackBack (0)

Subversion Repository Layout

I see a lot of questions asked about "What is the recommended repository layout?", "What does trunk mean?", or: "What is the significance of trunk?". This post will try to answer those questions and more.

A Subversion repository implements the metaphor of a versioned filesystem. The repository is just a filesystem with folders and files. It so happens that modifications to this filesystem are versioned and there are implementation enhancements like "cheap" copies that make certain operations less expensive than they are in a traditional filesystem, but the repository itself still behaves like a filesystem: there are no special folders or names and Subversion itself has no knowledge of trunk or branches, they are just folders within the filesystem. It is up to you as the user to give those folders names and structure that are meaningful to you.

That said, there are several common layouts that have been adopted by the community as best practices and therefore one could think of these as recommendations. If your repository is accessible to the public, following these conventions might make it easier for users that have accessed other Subversion repositories to find what they are looking for.

There are two commonly used layouts:

trunk
branches
tags

This first layout is the best option for a repository that contains a single project or a set of projects that are tightly related to each other. This layout is useful because it is simple to branch or tag the entire project or a set of projects with a single command:

svn copy url://repos/trunk url://repos/tags/tagname -m "Create tagname"

This is probably the most commonly used repository layout and is used by many open source projects, like Subversion itself and Subclipse. This is the layout that most hosting sites like Tigris.org, SourceForge.net and Google Code follow as each project at these sites is given its own repository.

The next layout is the best option for a repository that contains unrelated or loosely related projects.

ProjectA
   trunk
   branches
   tags
ProjectB
   trunk
   branches
   tags

In this layout, each project receives a top-level folder and then the trunk/branches/tags folders are created beneath it. This is really the same layout as the first layout, it is just that instead of putting each project in its own repository, they are all in a single repository. The Apache Software Foundation uses this layout for their repository which contains all of their projects in one single repository.

With this layout, each project has its own branches and tags and it is easy to create them for the files in that project using one command, similar to the one previously shown:

svn copy url://repos/ProjectA/trunk url://repos/ProjectA/tags/tagname -m "Create tagname"

What you cannot easily do in this layout is create a branch or tag that contains files from both ProjectA and ProjectB.  You can still do it, but it requires multiple commands and you also have to decide if you are going to make a special folder for the branches and tags that involve multiple projects. If you are going to need to do this a lot, you might want to consider the first layout.

As for the names of the folders within the repository, again: they are just a convention. They have no special meaning to Subversion.

"trunk" is supposed to signify the main development line for the project. You could call this "main" or "mainline" or "production" or whatever you like.

"branches" is obviously supposed to be a place to create branches. People use branches for a lot of purposes. You might want to separate your release or maintenance branches from your feature branches or your customer modification branches etc. In this case, you could create a layer of folders beneath branches, or just create multiple branch folders at the top-level.

"tags" are not treated as special by Subversion either. They are a convention, perhaps enforced by hook script or authz rules, that indicate you are creating a point in time snapshot. Typically the difference between tags and branches is that the former are not modified once they are created. You might call your tag folders "releases" or "snapshots" or "baselines" or whatever term you prefer.

Remember, the significance of the name is for your benefit, not for Subversion. Finally, the architecture of Subversion, with its global revision number can often make the need for tags unnecessary. I do not think there is any reason to create tags just for the sake of creating them. If you find a need to recreate the software at a specific point in time, you can always do so by using svn log to determine the relevant revision number. Tags are best when there are "external" consumers of the repository. Maybe it is a QA/Release team that needs to perform builds, maybe it is an internal development team that wants to use releases of your code in another product, or maybe it is literally external users or customers who need to grab release snapshots from your repository. In these scenarios, creating tags is both a convenient way to be sure they get the right code, as well as a good communication mechanism to indicate the presence of release snapshots.

Hopefully this post clarifies some of these issues for you and makes it easier for you to understand how Subversion works.

I would like to finish by pointing out that a Subversion repository layout can be changed. You can always reorganize and restructure your layout after the fact. At worst, it just might create some short term pain as users adjust their working copies. It is not like you need to start over though. Just change names, move folders or do whatever to get the filesystem looking the way you want it.

Posted by Mark Phippard | Permalink | Comments (4) | TrackBack (0)

Subversion LDAP Authentication with Apache

More and more companies are using directory services for housing their user credentials and information.  Example directory services are Active Directory, eDirectory and OpenLDAP.  How does this relate to Subversion?  Well, in the enterprise deployments I've been involved with, most clients wanted to harness their existing directory services for their Subversion authentication.  This blog post will explain the simplicity of hooking up Apache to your directory service using mod_auth_ldap, giving you the ability to authenticate against your existing user data store.

As of now, the only way to utilize your directory service for authentication is by using Apache as your network layer.  This allows you to use any of the available authentication options to Apache for your Subversion authentication and with mod_auth_ldap, Apache can authenticate against your directory service for Subversion.

Before we get started modifying our Apache configuration file, lets look at the simplest Location directive possible for exposing a Subversion repository via Apache:

<Location /repos>
  # Enable Subversion
  DAV svn

  # Directory containing all repository for this path
  SVNParentPath /absolute/path/to/directory/containing/your/repositories
</Location>

Now lets modify this to add mod_auth_ldap support for the authentication portion of the Location directive above:

<Location /repos>
  # Enable Subversion
  DAV svn

  # Directory containing all repository for this path
  SVNParentPath /absolute/path/to/directory/containing/your/repositories

  # LDAP Authentication & Authorization is final; do not check other databases
  AuthLDAPAuthoritative on

  # Do basic password authentication in the clear
  AuthType Basic

  # The name of the protected area or "realm"
  AuthName "Your Subversion Repository"

  # Active Directory requires an authenticating DN to access records
  # This is the DN used to bind to the directory service
  # This is an Active Directory user account
  AuthLDAPBindDN "CN=someuser,CN=Users,DC=your,DC=domain"

  # This is the password for the AuthLDAPBindDN user in Active Directory
  AuthLDAPBindPassword somepassword

  # The LDAP query URL
  # Format: scheme://host:port/basedn?attribute?scope?filter
  # The URL below will search for all objects recursively below the basedn
  # and validate against the sAMAccountName attribute
  AuthLDAPURL "ldap://your.domain:389/DC=your,DC=domain?sAMAccountName?sub?(objectClass=*)"

  # Require authentication for this Location
  Require valid-user
</Location>

Use the in-line comments in the code above to better understand the Apache configuration directives for mod_auth_ldap.  With the above example (which you need to modify for your environment) you can have Apache authenticate your Subversion users against your Active Directory directory service.  The above will also work for other directory services but with minor modifications in the AuthLDAPURL.  For more information, you can consult the mod_auth_ldap documentation linked to in the first paragraph.  Although this post is short, I hope it adds value to those who read it.

Posted by Jeremy Whitlock | Permalink | Comments (24) | TrackBack (0)

RSS Syndicate this blog

Recent Posts

  • Subversion 1.5 Merge Tracking and Mergeinfo…
    Posted by pburba
  • Subversion 1.5 RC5…
    Posted by Mark Phippard
  • Subversion 1.5 Release Candidate Available…
    Posted by Mark Phippard
  • SharpSvn Brings Subversion to .NET…
    Posted by Jeremy Whitlock

Recent Site Comments

  • "On this page, http://blogs.open.collab.net/svn/2008/04/subv…"

    Thomas Malone
  • "Awesome, my c# project uses Subversion as a wiki content ma…"

    Scott
  • "Great article - I found it really helpful. Thx…"

    Przemek
  • "Hi All, There is a new binary for Subversion 1.4.6 on O…"

    Jeremy Whitlock
  • "Joe, Yes, it does have this. You just add the -g option t…"

    Mark Phippard
  • "You would have to exec a script. This feature has not been…"

    Mark Phippard
  • "Much appreciated! Can --summarize be used with svnant, or …"

    Greg Butterfield
  • ©2008 CollabNet Corporation
    • Site Feedback
    • Terms of Use
    • Privacy Policy
    • Copyright & Trademark