Difference between revisions of "Umlaut Deployment"

From Code4Lib
Jump to: navigation, search
(NOTE/WARNING)
Line 1: Line 1:
 +
[[Category:Umlaut]]
 +
 
So you have your Umlaut running by executing "./scripts/server" in the Umlaut directory, and then it launches on port 3000 by default, and you connect to port 3000. This is fine for some initial confirmation that you've got your setup working, and for development, but how do you actually deploy it?
 
So you have your Umlaut running by executing "./scripts/server" in the Umlaut directory, and then it launches on port 3000 by default, and you connect to port 3000. This is fine for some initial confirmation that you've got your setup working, and for development, but how do you actually deploy it?
  

Revision as of 15:18, 14 March 2008


So you have your Umlaut running by executing "./scripts/server" in the Umlaut directory, and then it launches on port 3000 by default, and you connect to port 3000. This is fine for some initial confirmation that you've got your setup working, and for development, but how do you actually deploy it?

Turns out there are several possible deploy environments for a Rails application. There is not necessarily one standard or best one at the moment, different people use different ones in different circumstances.

Jonathan Rochkind at Hopkins uses mongrel, mongrel_cluster and Apache mod_proxy and mod_proxy_balancer on a unix system for his deploy environment. He went down this road because it was what was recommended by the Rails Agile Development book. Georgia Tech is currently using a different deploy environment for it's Umlaut version 1, but mongrel_cluster/apache is the only currently demonstrated way to deploy the current Umlaut version. We would definitely not be optimistic about running Umlaut on Windows.

Since jrochkind is writing this documentation, he can only tell you how to do it how he did. You do need to have a verison of Apache that includes mod_proxy_balancer (> apache 2.2? ), but if you do, jrochkind is fairly happy with the solution.

These two pages from the mongrel website on Apache Best Practice Deployment and Using Mongrel Cluster are pretty good how-tos for mongrel. But we will also take you through it here, with specific directions and Umlaut recommendations and pit-falls we ran into.

Umlaut Deployment with Mongrel and Apache

There are basically two parts to getting Umlaut (or any Rails app) deployed in this setup. First is getting your Rails app running, and second is configuring Apache to connect to it properly.

There are a few decisions to make. Run just one instance of Umlaut, or run multiple load balancing instances? Because of the nature of the way Umlaut works, we strongly recommend running multiple Umlaut instances regardless of how little traffic you expect. Even in a low-traffic environment, the fact that Umlaut can take several seconds to respond to a request means that multiple instances are a good idea to keep Umlaut from seeming even slower than it is. We're completely guessing, but 3 is probably a pretty good number for just about any Umlaut site, from low to high traffic.

Also think about whether you what unix account you want to run Umlaut (recommended to create a special low-priv account). And whether your Umlaut URLs can be the base urls for a host (Ie, findit.library.jhu.edu points directly to umlaut), or whether you use a 'prefix' (ie, findit.library.jhu.edu/some/path/findit). Using apache virtual hosts and mounting Umlaut at the base is typical, but the prefix can work too.

Setting up mongrel_cluster

First you've got to install mongrel and mongrel_cluster:


sudo gem install mongrel
sudo gem install mongrel_cluster

Reccommend you make sure you have mongrel >= 1.1.4 and mongrel_cluster >= 1.0.5. Recommend you do NOT have previous versions installed. When I had mongrel_cluster 1.0.3 simultaneously installed, it was being used, even though it shouldn't be, and its bugs were effecting me


The point of mongrel_cluster is to save configuration information for multiple mongrel instances in one configuration file, and then you can start, stop, or restart them all with one command, and without having to remember that config information each time (and possibly get it wrong or typod).

By default, mongrel_cluster keeps that configuration file in a Rails app's config/mongrel_cluster.yml. You could do that with Umlaut, but we like to keep your local config files in $Umlaut/config/umlaut_config instead (see Umlaut Local Configuration Architecture), so we recommend putting it in $Umlaut/config/umlaut_config. You can use the mongrel_rails command to write this config for you (see Using Mongrel Cluster; make sure to use the -C argument to put the config file in umlaut_config, if that's what you want), but here we'll just give you our actual mongrel_cluster.yml config, annotated. (You are certainly allowed to write it by hand).


# Unix account to run your processes as:\\
user: umlaut  

#Unix group to run processes as:
group: umlaut 

# Install dir of Umlaut you want to run from:
cwd: /data/web/findit/Umlaut 
log_file: log/mongrel.log # Leave like this. 

# Start port for your instances. Any high port will do. Does NOT need need
# to be open through firewall externally. 
port: 8000 
environment: production # Leave like this
address: 127.0.0.1 # Leave like this 
pid_file: tmp/pids/mongrel.pid # Leave like this

# How many instances to run. port: 8000 with servers:3 means you'll
# have a server on 8000, 8001, and 8002. 
servers: 3

# Only if  you want to start at web path other than base / :
prefix: /findit       # for instance. Start with slash, and don't end with one.

Now you can start all three of these mongrel instances by executing:

sudo mongrel_rails cluster::start -C $Umlaut/config/umlaut_config/mongrel_rails

The 'sudo' is necessary because we've told mongrel_cluster to start apps as user 'umlaut' ; first need to be root before you can start a process as another user. Also cluster::stop, cluster::restart, and cluster::status

We're still not sure exactly how many mongrels are neccesary to handle a given sized umlaut installation.

See below to automate the startup of these processes on boot.

If you are choosing to start as a particular unix account, make sure your install dir can be read by that account! log and tmp dirs need to be writeable too. Easiest thing to do is just "sudo chgrp -R umlaut", or whatever other group you are choosing, your entire $Umlaut installation. Note that the parent directory (and all of it's parents) needs to have "x" permission for the user/group too.

Apache Setup

Now we set up apache using mod_proxy to 'reverse proxy' to our mongrel instances, with clustered load balancing. Make sure you have mod_proxy and mod_proxy_balancer installed and configured. Now, in your apache conf, proably in the specific virtual host you want to use for Umlaut:

# Very important, make sure you aren't inadvertantly making an open proxy with mod_proxy
ProxyRequests off 

# Set up the mod_proxy blanacer, with our three instances running on 8000-8002
# Note: Do not put trailing / on these

BalancerMember http://127.0.0.1:8000
BalancerMember http://127.0.0.1:8001
BalancerMember http://127.0.0.1:8002


# Now set up the ProxyPass directives to reverse proxy to that cluster
# Note: DO put trailing / on these.

ProxyPass / balancer://umlaut_cluster/ 
ProxyPassReverse / balancer://umlaut_cluster/ 
ProxyPreserveHost on

# Or, if you were using a prefix, these would look like, eg:
# ProxyPass /findit balancer://umlaut_cluster/findit/
# ProxyPassReverse /findit balancer://umlaut_cluster/findit/

Dealing with bad query strings: More Apache Setup

Mongrel refuses to accept a mal-formed query string. EBSCOHost, however, insists on sending such---for example, query strings with unescaped greater-than or less-than chars in them. We want to take care of this by putting directives in the apache config to rewrite these bad urls into proper escaped urls. The apache mod_redirect external map function is most convenient to use here, and a program to serve as an external map is included with umlaut. The following apache directives will take care of rewriting bad URLs. As always, $Umlaut stands for your Umlaut install dir.

  # We want to re-write URLs with 'bad' < and > chars in the query
  # string (eg from EBSCO) to escape them.
  RewriteEngine on
  RewriteMap query_escape prg:$umlaut/distribution/script/rewrite_map.pl
  RewriteLock /var/lock/subsys/apache.rewrite.lock
  RewriteCond %{query_string} ^(.*[\>\<].*)$
  RewriteRule ^(.*)$ $1?${query_escape:%1} [R,L,NE]

Note: Due to a bug in Apache, ampersand chars in query string end up 'double escaped' when put through the map. We have code in a before filter in application_controller to take care of this.

Start at Boot?

Follow the directions at Using Mongrel Cluster, which are basically:

sudo mkdir /etc/mongrel_cluster
sudo ln -s $UMLAUT/config/umlaut_config/mongrel_cluster.yml /etc/mongrel_cluster/umlaut.yml
sudo cp /path/to/mongrel_cluster_gem/resources/mongrel_cluster /etc/init.d/
sudo chmod +x /etc/init.d/mongrel_cluster

Now your cluster will start at boot, and you can also start, stop, or restart it (and any other clusters you link into /etc/mongrel_cluster) with:

sudo /etc/init.d/mongrel_cluster {start|stop|restart}


NOTE/WARNING

There is a problem in mongrel_cluster that will prevent mongrels from starting up again if your machine (or mongrels) die ungracefully leaving stale pids. See http://www.ruby-forum.com/topic/105849

My better fix: Edit the /etc/init.d/mongrel_cluster bash script you installed above. Change line:

mongrel_cluster_ctl start -c $CONF_DIR

to:

mongrel_cluster_ctl start -c $CONF_DIR --clean

Note the addition of the --clean argument.

I know this works with mongrel 1.1.4 and mongrel_cluster 1.0.5. An earlier mongrel_cluster did not respect the --clean argument properly--and I found that having a simultaenous install of the earlier mongrel_cluster for some reason caused it to be used instead of the later one. gem isn't supposed to work that way. But best make sure you have no mongrel_clusters earlier than 1.0.5 installed.

SFX Configuration

Institute Feature

This only matters if you use the SFX institute feature. Umlaut sends a req.ip=[client ip] param, which SFX is supposed to use to treat the request as if it came from that IP, not umlaut's ip. That works if the req.ip matches an SFX institute. But if it does not match any institute, you want SFX to treat the request as if it did not match any institute. Instead it consults the actual umlaut server IP and connects THAT to an institute. This is bad.

As a work around, define an institute in SFX that is listed first alphabetically (eg, "aaa_umlaut_server") that matches the Umlaut server's IP address(es). Now if req.ip doesn't match anything, SFX will decide the request matches "aaa_umlaut_server" institute--which won't effect anything, will be treated just like a non-local address--instead of matching on umlaut server address which might match a wrong institute.

This bug has been reported to Ex Libris.