30 July, 2012

SCGI in Apache HTTP Server and in Lighttpd

Brief description

SCGI is one of few protocols that utilizes communication between HTTP server and your application. This is very useful concept, especially in advanced architectures, where actual application (aka application server) runs as different and standalone process besides HTTP server. This process can possibly run on different hardware, allowing application to scale easily.

HTTP server (like Apache HTTP Server or Lighttpd) is responsible for serving static content and dispatching requests towards application servers. HTTP server then acts as front-end part of the application, application server is in the middle and eventual database is at the back-end. Important note is that HTTP server typically acts as client for the application server.

You can choose among several protocols for connecting an application server to HTTP server. Few well-known examples are FastCGI, WSCI and SCGI. There is whole bunch of others but I will discuss mainly SCGI here. This one is modeled after CGI protocol, which however works on slightly different premise: HTTP server executes standalone application for each request and output of this application will be send as a response back to client. So there is no dedicated application server that is permanently running. This setup has different performance profile than SCGI but it is still useful in some cases.

SCGI introduces communication link (TCP/IP or UNIX socket) between HTTP server and application server and uses this link to communicate request and response in very similar way as CGI request and response is structured. 'S' in SCGI stands for 'simple' and it is well chosen, this protocol is actually very very simple - it's description fits on 2 pages. Protocol is still very powerful with practically no compromise. It can be also very fast. On the other hand there are while places in the specification or just references to CGI standard. Unfortunately due to this, available implementations are not 100% compatible and this can be source of surprises. To provide complete information I have to say that FastCGI shares this concept.

SCGI implementations

mod_scgi for Apache HTTP Server

This is mature module for Apache HTTP Server (2.0+). Development seems to be stopped for a while but module is very stable and perform very well. It works on all major platform (including Windows; with little bit of Googling you can find recent Win32 binaries and skip compilation stage that is usually complicated on Windows). Compilation on Linux and Mac OS X is smooth (well - sometimes you have to play some nasty games with autoconf and automake). This implementation uses only TCP/IP - there is no support for UNIX sockets.

mod_proxy_scgi in Apache HTTP Server

SCGI proxy module was added to Apache HTTP Server 2.2 about a year ago providing build-in alternative for external mod_scgi module (described above). This looks quite promising, this module is part of Apache HTTP Server source code and it is usually distributed with Apache HTTP Server itself. So you don't have to compile anything. On the other hand, there are some deviances from mod_scgi behavior that can eventually make porting of the application complicated.

mod_scgi in Lighttpd

There is also support for SCGI in Lighttpd server - offering faster and lighter alternative to quite large Apache project. Very unfortunately, there are few differences in protocol understanding that makes use of this alternative quite troublesome especially in case you want to have some configuration flexibility and "HTTP server plaform independence".

Other implementations

There are other implementations of SCGI in various HTTP servers, for example SCGI in NGinx but I haven't test them so far.

Typical web application mount points

Speaking about SCGI, important term is "mount point" of the SCGI protocol (and eventually application server). This is location or logical path in your web, where application server is responsible for content serving. All requests that uses this location or any path that starts with that are handed over via SCGI to application server and it is now responsible for providing response. It is similar concept to Unix file system and its mounts.

Based on position and eventually postfix, you can implement various schemes that are then reflected in URL(s) of your web application. These URLs are usually the ones that are displayed in the address bar of browsers and they should be "user friendly"; also SEO requires 'nice' URLs in the application.

Here are few typical ones:

SCGI mount on the root

In this scheme, your application server is mounted on '/' (root) of your web server. Static content is server from different host (maybe virtual) or from dedicated sub-location (e.g. /__static).
This scheme is little bit complicated for configuration but provides the best results for singleton applications - URLs are nice and understandable without any prefixes nor postfixes.

SCGI mount on a logical location in the web application structure

Application server is mounted on e.g. '/node' location over static content in document root served by HTTP server. This scheme tends to produce little bit less attractive URLs that mount on the root - there is always prefix in the URL but it can be useful in case you have more applications (or application entry points) on the same domain.

SCGI invoked based on postfix

This is scheme known from e.g. PHP - application server is serving only request that location ends with given postfix (e.g. '.php'). As there is usually no file associated with SCGI request, this scheme makes less sense in this context and application URLs have to contain this postfix making them ugly and obscure.

Modern applications requires good support for first two mentioned schemes, last one is more or less inherited from history as should disappear gradually.

SCGI request in detail

When request hits HTTP server in location that belongs to particular SCGI mount or postfix, server prepares SCGI request that is passed thru SCGI protocol to application server for processing. This request consist of header and body. Header is very similar to HTTP header containing even the same values prefixed by 'HTTP_' (e.g. HTTP_USER_AGENT) plus series of CGI 'environment variables' containing important dispatching data like REQUEST_METHOD or CONTENT_LENGHT.

Body has (if any) the same content and HTTP request. Also response produced by application server for HTTP server (and indirectly for client browser) is plain HTTP and it is only forwarded to client (well, in the most cases).

For proper dispatching, application server usually needs to know an location of the request - it is used to determine what functionality of application server should be launched and/or what content should be sent back to client in response. This is comparable to serving static files.
Application server receive this information in few variables (coming from CGI standard):
  • PATH_INFO
  • SCRIPT_NAME
  • SCRIPT_FILENAME
  • ... few other
Interpretation of content of each variable is very different from implementation to implementation - and this causes major troubles when migrating application among these implementations.

Here is one example:

Assuming the application is mounted on '/' (document root) and request is stated as "http://eiclocal/0p/1p/2p", we can get following results (actual result is dependent on exact configuration of the HTTP server and its SCGI connector):

mod_scgi (Apache)

PATH_INFO: /0p/1p/2p
SCRIPT_NAME: ''
SCRIPT_FILENAME: not present


mod_proxy_scgi (Apache)

PATH_INFO: /0p/1p/2p
SCRIPT_NAME: ''
SCRIPT_FILENAME: proxy:scgi://127.0.0.1/0p/1p/2p

mod_scgi (Lighttpd)

PATH_INFO: /1p/2p
SCRIPT_NAME: /0p
SCRIPT_FILENAME: /Users/.../.../.../0p


You can see that there is some consistency in first two connectors (both Apache) but there is quite problematic situation in Lighttpd. Event that mount point is defined as '/', SCRIPT_NAME is reported as '/0p' (which will be the same case as if mount point is '/0p'). This is weird and it is effectively killing an possibility to properly use SCGI and lighttpd if you need portability and compatibility (e.g. you are author of web application server framework or you just want to enable your application to run using different HTTP frontends).

Being SCGI user for more than 5 years now, I tend to like Apache way of interpreting of PATH_INFO. It is perfectly logical and works in every case (including postfix scheme). Unfortunately this is not standardized and it already started to jeopardize SCGI protocol.

Few other interesting topics

There are few more things that are important for proper and correct use of SCGI protocol, including static file serving thru SCGI and local redirect. I can and maybe will write other blogpost if there will be an interest.