CollabNet
Submerged - CollabNet's Subversion Blog
CollabNet Community

CollabNet Blogs

  • Submerged Home
  • CollabNet Home
  • openCollabNet

Categories

  • Administration (8)
  • Client Tools (9)
  • General (30)
  • Subversion Client (23)
  • Subversion Events (2)
  • Subversion in the Enterprise (26)
  • Subversion Server (14)

Past 6 Months

  • May 2008 (2)
  • April 2008 (2)
  • March 2008 (3)
  • February 2008 (3)
  • January 2008 (4)
  • December 2007 (2)

Archives

All Archives...
RSS Syndicate this blog

Subversion 1.5 WebDAV Write-Thru Proxies

Yesterday at the Pre-SubConf Subversion Workshop in Munich, I presented a short bit about a new Subversion 1.5 feature — WebDAV write-thru proxy support. This feature allows for a Subversion server deployment based around Apache 2.2.x and mod_proxy in which there is a single master server and associated repository, and one or more slave servers which handle read operations while passing through (proxying) write operations to that single master server. In this deployment scenario, each slave server has its own copy of the master repository which is kept in sync by some process, typically one driven by hook scripts on the master server itself.

Why might someone want to use such an arrangement? Well, users are far more likely to be doing read operations (such as checkouts, updates, status checks, log history requests, diff calculations, etc.) than write operations (such as commit, revision property changes, and lock/unlock requests). You might wish to employ several servers to share the burden of handling those read requests (a load-balancing scenario). Or perhaps you have a worldwide organization (such as CollabNet's) with offices in, say, the United States, Eastern Europe, and India. If you deploy a single centralized server across the whole organization, you will almost certainly wind up favoring some users over others in terms of the performance of the network links between them and the Subversion server. WebDAV write-thru proxies would allow you to keep geographically local slave servers for each of your global regions, universally optimizing the performance of all of your users' read requests.

For the purposes of my presentation, I whipped up a simple WebDAV write-thru proxy scenario with a single slave server, and using svnsync as the mechanism for propogating changes from the master server to the slave server. The following describes how you can do the same.

First, you'll need to have Apache HTTP Server on your master and slave servers, and on the slaves it must be version 2.2.0 or better with mod_proxy enabled. Now, you'll need to configure your master server to expose your repository. Assuming your Subversion repository is located at /opt/svn/project, you might do so by adding a block like this to its httpd.conf file:

<Location /svn/project>
   DAV svn
   SVNPath /opt/svn/project
</Location>

Your slave servers will each eventually have full replicas of the master server's repository (which we'll take care of in a bit), but at this point will need at least some httpd.conf magic to expose their replica (which, again, we'll assume lives at /opt/svn/project):

<Location /svn/project>
   DAV svn
   SVNPath /opt/svn/project
   SVNMasterURI http://IP-ADDR-OF-MASTER/svn/project/
</Location>

Notice that SVNMasterURI directive — that's the bit that's new to Subversion 1.5. It tells the slave server to proxy write-type operations to the master machine, and provides the URL at which the master machine's repository can be found.

Now, there's a second bit of httpd.conf configury we need on each slave. Remember that we're planning to use svnsync to push changes from the master to the slave servers. Well, svnsync does so using regular commits, and if we're proxying commits back to the master server, that won't work out so well. So we add another Location block which exposes the server's repository, but which does not proxy write operations through, and which allows commits only from the master server's IP address:

<Location /svn/project-proxy-sync>
   DAV svn
   SVNPath /opt/svn/project
   Order deny,allow
   Deny from all
   Allow from IP-ADDR-OF-MASTER <Location /svn/project>

Okay, let's talk about the actual repositories. As I mentioned, each slave server needs a replica of the master repository. And for our purposes, that replica needs to be initialized as a read-only mirror of the master repository by svnsync. (See this blog post for some basic information about using svnsync.) You'll run svnsync init from the master server, and for the sake of performance (which is critical in this scenario), you'll do so using file:/// access to the master. The cautious among you will simply use svnadmin create (or the equivalent in some third-party Subversion tool) to create a new repository on each slave, create for it a permissive pre-revprop-change hook, then use svnsync init http://IP-ADDR-OF-SLAVE/svn/project-proxy-sync file:///opt/svn/project (notice we're syncing via our special sync URL), and finally use svnsync sync http://IP-ADDR-OF-SLAVE/svn/project-proxy-sync to copy each revision from the master to the slave. Those of you who are more adventurous will find every way possible to shortcut this process, from creating just one slave's svnsync-ready mirror repository and literally copying that to every other slave, to even optimizing out that sync job cost by hand-editing the revision 0 properties on literal copies of the master repository. I'll leave such trickery as an exercise to the reader.

Let's see where we are. We have servers with Apache HTTP Server configured. We have repositories in place on each machine, where the slave repositories are svnsync-ready mirrors of the master. Now we need the automation bits which keep those mirrors in sync. We do this using the master repository's hook subsystem. If you plan to allow revision property changes, you'll need a permissive pre-revprop-change hook script on the master repository, and then also a post-revprop-change hook which tells svnsync to re-copy revision properties for a given revision when one of that revision's properties is changed:

#!/bin/sh
REVISION=${2}
# Launch (backgrounded) sync jobs for each slave server.
svnsync copy-revprops http://IP-ADDR-OF-SLAVE1/svn/project-proxy-sync ${REVISION} &
svnsync copy-revprops http://IP-ADDR-OF-SLAVE2/svn/project-proxy-sync ${REVISION} &
svnsync copy-revprops http://IP-ADDR-OF-SLAVE3/svn/project-proxy-sync ${REVISION} &
…

You'll also need a post-commit hook to transfer new revisions in full to the slaves:

#!/bin/sh
# Launch (backgrounded) sync jobs for each slave server.
svnsync sync http://IP-ADDR-OF-SLAVE1/svn/project-proxy-sync &
svnsync sync http://IP-ADDR-OF-SLAVE2/svn/project-proxy-sync &
svnsync sync http://IP-ADDR-OF-SLAVE3/svn/project-proxy-sync &
…

Why are we backgrounding our svnsync processes? As I mentioned, performance is critical here. Every second that our slaves remain out of sync with the master is a second in which the user who performed the commit or revision property change might try to then perform a read operation (like svn update) against that revision. If he does so, he'll get an error that indicates that the revision doesn't exist on the server, which while not fatal is at the very least quite confusing.

At this point, we could go live with our servers and things would mostly work. Users would checkout working copies from one of the available slave servers. When they performed read operations against that repository, the server would field the requests from its replica of the master repository. When they did commits or changed revision properties, their slave server would hand off to the master server, which would do the real work and then propogate those changes back out to all the slaves. But there are some additional things you may want to configure before going live.

First, we never added any authentication or authorization stuff for our users. Interestingly, you'll need the authentication stuff to match across all servers (and need to rig up a way to keep those in sync), but you need only the read-authorization stuff on the slaves, and both the read- and write-authorization stuff on the master. (It's probably easiest just to keep all that stuff in sync across all the servers.)

Secondly, we didn't do anything to handle lock/unlock client requests. To do so properly requires implementing post-lock and post-unlock hook scripts on the master which in turn perform the lock/unlock operations on each slave as the user doing the locking/unlocking. This is complicated work. Fortunately, if you choose to omit it, lock enforcement in your deployment scenario should still work. It's just that lock queries (asking, "What's locked, and by whom?") will always turn up empty.

Finally, we didn't do anything to handle the problems that might occur if the link between the master and a slave server should go down at the wrong time. If the link drops during some client commit operation, that's okay — the commit will never finish on the master server, and the user will hear back from his/her slave server that something went wrong. At worst the commit will complete on the master but the user's client will never know about it. (This same thing can happen in a single-server setup if the link falls down while the server is trying to respond to the final commit MERGE request.) If the link drops during the svnsync phase after a commit, that slave server will continue to work, but might be out of sync until the next commit. You could implement a cron job on the master that occasionally syncs all the slave repositories to minimize that out-of-sync period. What about svnsync failing after a revision property change? That's more complex — you may need to implement some wrapper around that process that can reliably track success and failure and provide a retry mechanism. That's true also of something failing while trying to propogate lock/unlock status to the slave servers.

As you can see, the state of the art is currently not such that you flip a switch and suddenly wind up with a one-master-to-many-slaves repository replication deployment scenario. It's tricky business, fraught with opportunities to make mistakes and to leave edge-cases uncovered. But this new feature in Subversion 1.5 provides the fundamental requirements if you're willing to see the complexities through to completion.

C. Michael Pilato

About the Author

C. Michael (Mike) Pilato has been on the Subversion project as a committer since 2000. Mike is one of the co-authors of “Version Control with Subversion” and he is on the board of the non-profit Subversion Corporation.
Permalink
Categories: Subversion Server

Technorati Tags: CollabNet, Open Source, Programming, Revision control, SCM, Software Configuration Management, Software development, SourceForge, Subversion, SVN, Version control

TrackBack

TrackBack URL for this post: http://www.typepad.com/t/trackback/2278052/22470694

Comments

This is going to be very useful, thanks. Is anyone using it with a sizeable repository yet? Any feedback about how it works - latency, unexpected problems etc.

Matt Doar | October 17, 2007 at 09:29 AM

Thanks, Mike. The on problem that I have is that backgrounding the svnsync in the post-commit hook does not keep the client from waiting until the svnsync operation has completed. The client sits at "Transmitting file data" until the svnsync is done. I call a script from post-commit that is backgrounded and that script does the svnsync command which is also backgrounded. Am I missing something?

Mark Keisler | October 17, 2007 at 01:08 PM

Matt - 1.5 is not released yet, so while there is at least one person that has been using this in a patched version of 1.4, it has not been used widely enough yet to have that information.

Mark - you have to do some trickery to get hooks to return immediately and let the process run in the background. See this thread from the mailing list which has an example:

http://svn.haxx.se/users/archive-2006-08/0925.shtml

Mark Phippard | October 17, 2007 at 02:12 PM

Right, the & is not enough; you also need to close stdout and stderr like:

svnsync copy-revprops http://IP-ADDR-OF-SLAVE1/svn/project-proxy-sync ${REVISION} > /dev/null 2>&1 &

or with something like this:

LOG=$(mktemp /tmp/pre-revprop-change.XXXXXX)
exec > ${LOG}
exec 2>&1
[hook body]
mail -es "REVPROPCHANGE $*" somewhere < ${LOG}
rm -f ${LOG}

Eric Gillespie | October 17, 2007 at 08:13 PM

Mike,

The article says Apache 2.2.x, but I don't see any ifdefs in mod_dav_svn.c limiting it to that version. Will this work with Apache 2.0.x? If not, what will happen when somebody tries to use it on 2.0.x?

Thanks,
Blair

Blair Zajac | October 20, 2007 at 08:59 AM

Good question. I recall when Justin committed the feature that it required some fixes in mod_proxy and it was only done in 2.2. It seems like we ought to reject the directive if it is used in 2.0.x. You should probably file an issue for this though, not enter it here.

Mark Phippard | October 20, 2007 at 10:56 AM

SVN 1.5 will be very useful for the scenario I am having. Can we use master slave using VPN connection, because I don't have leased lines and servers are sitting in different countries? Also how frequently slave gets updated?

Prakhar | January 22, 2008 at 08:46 PM

Prakhar:

Subversion defers matters of network traffic routing to lower level processes. If your master and slave machines share a VPN connection, and you can transmit HTTP/WebDAV traffic back and forth across that connection, then I know of no reason why you couldn't establish a WebDAV proxy deployment using that VPN.

As for how frequently the slave gets update, that's entirely up to you. I would recommend having the master's post-commit and post-revprop-change hook scripts fire off synchronizations, and *also* have a cron job that runs every so many hours to do the same (just in case the most recent hook-fired sync job goes awry and there's not been more commit activity on the master).

C. Michael Pilato | January 23, 2008 at 06:40 AM

Mike

We are planning to upgrade from 1.4 to 1.5 as soon as 1.5 is released for general usage. I went through your article and decided to try out 1.5 on a set of Windows sandboxes. First up, the current set of Windows binaries does not seem to have svnsync.exe. I was therefore unable to test out the whole set up. Will you be able to check this and let me know where I can download a reliable version of svnsync.exe from?

Next, I wanted to validate a few things with you. Our teams predominantly consist of developers (as I guess most software teams would). We encourage our developers to check-in at least once a day and if possible multiple times a day to prevent their local copies from getting out of sync with the repository (as long as the code compiles locally). This practice means that most people on our teams (developers) perform a write (commit) almost as frequently as they peform reads (checkouts and updates). I personally commit at least four to five times a day but update only once a day. As of now this is just a train of thoughts I have. I am going to verify this from Apache logs on our current system. However, do you feel that given our scenario, using write-through proxy set up will really help too much?

We are geographically distributed across seven countries and many different offices. Right now what hits our teams the most is network latency rather than anything else. I am not sure if the write-through proxy feature will really take away the problems associated with network latency (especially since half the operations are write operations anyway). What are your thoughts on this?

Manish Baxi | February 12, 2008 at 03:19 AM

Actually, one more question. Instead of using SVNPath, can we use SVNParentPath in configuration files? I tried using SVNParentPath but no proxing seems to be happening, i.e. commits made to the slave are made in the slave repository only.

Per repository configuration will be quite painful.

Manish Baxi | February 12, 2008 at 03:23 AM

Manish, I believe you can use SVNParentPath so long as your repository locations on the master and server end with the same basename, and your SVNMasterURI configuration doesn't carry that basename. I think the proxy code is going to use SVNParentPath + /basename to find the repository on the slave, and SVNMasterURI + /basename to address the repository on the master. (There's a mailing list post which corroborates this at: http://svn.haxx.se/users/archive-2008-02/0326.shtml)

As for whether or not this whole setup is wise in your situation, I really can't say. Maybe something more complex, like a proxy setup with a follow-the-sun type of rotating master would work out better (read "lots of custom coding" or "check out WANDisco"). Or maybe you could go the multi-master route using geo-local read-write repositories which themselves are svk mirrors of The One Repository. Everybody's needs and environment are different -- you may have to just try some of these things out to see what works best for you and your teams.

C. Michael Pilato | February 12, 2008 at 05:14 AM

Mike,

We are in the middle of converting to SVN along with many other changes which has caused some headaches for our team. The main issue we are having at this time is our Master Repo's IP has recently been changed (due to conversion to virtual machines), and now our SVNSYNC obviously does not work. Could you recommend anything other than blowing away our current mirrors and recreating those?

Thanks for any help

Craig Menke | March 18, 2008 at 07:57 AM

Craig, svnsync stores revision properties on the mirror repository which record the source repository URL and UUID, as well as some state information. (This is why you need to provide the source URL only at 'svnsync init' time, not during the actual synchronization phases later.) Here's an example from a mirror I keep on my box:

$ svn plist -v --revprop -r0 file:///usr/local/svn/subversion | grep svn:sync
svn:sync-from-uuid : 612f8ebc-c883-4be0-9ee0-a4e9ef946e3a
svn:sync-last-merged-rev : 29943
svn:sync-from-url : http://svn.collab.net/repos/svn
$

You can modify the svn:sync-from-url property's value to match the new location of the repository you are mirroring using the "propedit" subcommand:

$ svn pedit -r0 --revprop svn:sync-from-url MIRROR-REPOS-URL

Hope that helps!

C. Michael Pilato | March 18, 2008 at 08:23 AM

Mike,

We are working on a Windows 2003 server, and when we run this command it seems as though we are connecting to the repository but cannot change the property. Here is a copy of our command we are running:

svn pedit -r15 --revprop svn:sync-from-url https://10.x.xx.xx/svn/npers

Once we run this command we are prompted for credentials. We get past all that (logging in it seems), but then a text editor pops up (file name: svn-prop.tmp) and I am not sure if I am supposed to enter the new IP of the Master or what.

Thanks again for any help!

Craig Menke | March 18, 2008 at 09:07 AM

Sorry, Craig. I meant to point out that these properties live only on revision 0. So try the command again as:

svn pedit -r0 --revprop svn:sync-from-url https://10.x.xx.xx/svn/npers

Your text editor should pop up with a file populated with the current value of that property. Change the value, save the file, and exit the editor.

C. Michael Pilato | March 18, 2008 at 01:26 PM

Mike,

Thank you!! It worked fine once we realized that we needed to truly use the path to the MIRROR and NOT the MASTER.

Craig Menke | March 19, 2008 at 07:25 AM

Post a comment

  • ©2008 CollabNet Corporation
    • Site Feedback
    • Terms of Use
    • Privacy Policy
    • Copyright & Trademark