Basic Resource Management in Apache

Privileges for anonymous
Read	Yes
Annotate	Yes
Edit	No
Manage	No

The APR Pools are a fundamental building block of APR and Apache, and are the basis for all resource management. They serve to allocate memory, either directly (in a malloc-like manner) or indirectly (e.g. in string manipulation), and, crucially, ensure the memory is freed at the appropriate time. But they extend much further, to ensure that other resources such as files or mutexes can be allocated and will always be properly cleaned up. They can even deal with resources managed opaquely by third-party libraries.

In this article, we introduce the APR pools, and describe how to use them to ensure all your dynamically-allocated resources are properly managed and have exactly the lifetime they need to.

The Problem of Resource Management

The basic problem of resource management is of course well-known to programmers. When you allocate a resource, you should ensure it is released again when you've finished with it. For example:


	char* buf = malloc(n) ;
	... check buf is non null ...
	... do something with buf ...
	free(buf) ;

or

	FILE* f = fopen(path, "r") ;
	... check f is non null ...
	... read from f ....
	fclose(f) ;

Clearly, failure to free buf or to close f is a bug, and in the context of a long-lasting program such as Apache it would have serious consequences up to and including bringing the entire system down. So we need to get it right!

In trivial cases as presented above, this is straightforward. But in a more complex case with multiple error paths, and even the scope of a resource being uncertain at the time it is allocated, it becomes a serious problem to ensure cleanup takes place in every execution path. So we need a better way to manage resources.

Note by anonymous, Sat May 13 14:08:35 2006

FILE* f = fopen(path, "r") ; ... check f is non null ... ... read from f .... fclose(f) ;

The Constructor/Destructor model

One method of resource management is exemplified by the C++ concept of objects having a constructor and destructor. A method adopted by many (but not all) C++ programmers is to make the destructor responsible for cleanup of all resources allocated by the object. This approach works well provided all dynamic resources are clearly made the responsibility of an object. But, as with the simple C approach, it requires a good deal of care and attention to detail, for example where resources are conditionally allocated, or shared between many different objects, and is vulnerable to programming bugs.

The Garbage Clearance Model

A high-level method of resource management, typified by Lisp and Java, is garbage-clearance. This has the advantage of taking the problem right away from the programmer and transferring it to the language itself, so the danger of crippling programming errors is removed altogether. As against that it is a substantial overhead even where it isn't necessary, and it deprives the programmer of useful levels of control, such as the ability to control the lifetime of a resource. It also requires that all program components - including third-party libraries - are built on the same system, which is clearly not possible in an open system written in C.

The APR Pools

The APR pools provide an alternative model for resource management. Like garbage collection, they liberate the programmer from the complexities of dealing with cleanups in all possible cases. But they offer several additional advantages, including full control over the lifetime of resources, and the ability to manage heterogenous resources.

The basic concept is that whenever you allocate a resource that requires cleanup, you register it with a pool. The pool then takes responsibility for the cleanup, which will happen when the pool itself is cleaned. That means that the problem is reduced to one of allocating and cleaning up a single resource: the pool itself. And since the Apache pools are managed by the server itself, the complexity is removed from applications programming. All the programmer has to do is select the appropriate pool for the required lifetime of a resource.

Note by anonymous, Thu May 15 18:25:28 2008

Basic Memory Management

The most basic usage of pools is for memory management. Instead of


	mytype* myvar = malloc(sizeof(mytype)) ;
	/* make sure it gets freed later in every possible execution path */

we use

	mytype* myvar = apr_palloc(pool, sizeof(mytype)) ;

and the pool automatically takes responsibility for freeing it, regardless of what may happen in the meantime.

This takes many forms in APR and Apache, where memory is allocated within another function. Examples are string manipulation functions and logging, where we gain the immediate benefit of being able to use constructs like the APR version of sprintf() without having to know the size of a string in advance:


	char* result = apr_psprintf(pool, fmt, ...) ;

APR also provides higher-level abstractions of pool memory, for example in the buckets used to pass data down the filter chain. But we'll keep those for another article.

Generalised Memory Management

APR provides builtin functions for managing memory, and a few other basic resources such as files, sockets, and mutexes. But there is no requirement to use these. An alternative is to use native allocation functions, and explicitly register a cleanup with the pool:


	mytype* myvar = malloc(sizeof(mytype)) ;
	apr_pool_cleanup_register(pool, myvar, free, apr_pool_cleanup_null) ;

or

	FILE* f = fopen(filename, "r") ;
	apr_pool_cleanup_register(pool, f, fclose, apr_pool_cleanup_null) ;

will delegate responsibility for cleanup to the pool, so that no further action from the programmer is required. But bear in mind that native functions may be less portable than APR equivalents.

This method generalises to resources opaque to Apache and APR. For example, to open a database connection and ensure it is closed after use:


	MYSQL* sql = NULL ;
	sql = mysql_init(sql) ;
	if ( sql == NULL ) { log error and return failure ; }
	apr_pool_cleanup_register(pool, sql, mysql_close,
		apr_pool_cleanup_null) ;

	sql = mysql_real_connect(sql, host, user, pass, dbname, port, sock, 0) ;
	if ( sql == NULL ) { log error and return failure ; }

Note that APR provides an altogether better method for managing database connections - this will be the subject of another article.

As a second example, consider XML processing:


	xmlDocPtr doc = xmlReadFile(filename) 
	apr_pool_cleanup_register(pool, doc, xmlFreeDoc,
		apr_pool_cleanup_null) ;

	/* now do things with doc, that may allocate further memory
	   managed by the XML library but will be cleaned by xmlFreeDoc
	*/

Integrating C++ destructor-cleanup code provides yet another example:
Suppose we have:


	class myclass {
	public:
  	  virtual ~myclass() { do cleanup ; }
  	  // ....
	} ;

We define a C wrapper:

	void myclassCleanup(void* ptr) { delete (myclass*)ptr ; }

and register it with the pool when we allocate myclass:

	myclass* myobj = new myclass(...) ;
	apr_pool_cleanup_register(pool, (void*)myobj, myclassCleanup,
		apr_pool_cleanup_null) ;

	// now we've hooked our existing resource management from C++
	// into apache and never need to delete myobj

Implicit and Explicit Cleanup

Now, supposing we want to free our resource explicitly before the end of the request - for example, because we're doing something memory-intensive but have objects we can free. We may want to do everything according to normal scoping rules, and just use pool-based cleanup as a fallback to deal with error paths. Since we registered the cleanup, it will run regardless, leading typically to a double-free and a segfault.

Another pool function apr_pool_cleanup_kill is provided to deal with this situation. When we run the explicit cleanup, we unregister the cleanup from the pool. Or we can be a little more clever about it. Here's the outline of a C++ class that manages itself based on a pool, regardless of whether it is explicitly deleted or not:


	class poolclass {
	private:
	  apr_pool_t* pool ;
	public:
	  poolclass(apr_pool_t* p) : pool(p) {
	    apr_pool_cleanup_register(pool, (void*)this,
		myclassCleanup, apr_pool_cleanup_null) ;
	  }
	  virtual ~poolclass() {
	    apr_pool_cleanup_kill(pool, (void*)this, myclassCleanup) ;
	  }
	} ;

If you use C++ with Apache (or APR), you can derive any class from poolclass. Most APR functions do something equivalent to this, using register and kill whenever resources are allocated or cleaned up.

Resource Lifetime

When we allocate resources on a pool, we ensure they get cleaned up at some point. But when? We need to ensure the cleanup happens at the right time. Neither while the resource is still in use, nor long after it is no longer required.

Note by anonymous, Mon May 12 01:54:39 2008

The Apache Pools

Fortunately, Apache makes this easy for us, by providing different pools for different types of resource. These pools are associated with relevant structures of the httpd, and have the lifetime of the corresponding struct. There are four general-purpose pools always available in Apache

the request pool, with the lifetime of an HTTP request.
the process pool, with the lifetime of an server process.
the connection pool, with the lifetime of a TCP connection.
the configuration pool

The first three are associated with the relevant Apache structs, and accessed as request->pool, connection->pool and process->pool respectively. The fourth, process->pconf, is also associated with the process, but differs from the process pool because it is cleared whenever Apache re-reads its configuration.

The process pool is suitable for long-lived resources, such as those which are initialised at server startup, or those cached for re-use over multiple requests. The request pool is suitable for transient resources used to process a single request.

A third general-purpose pool is the connection pool, which has the lifetime of a connection - being one or more Request. This is useful for transient resources that cannot be associated with a request: most notably in a connection-level filter, where the request_rec structure is undefined.

In addition to these, special-purpose pools are created for other purposes including configuration and logging, or may be created privately by modules for their own use.

Note by anonymous, Thu May 15 18:25:39 2008

Using Pools in Apache: Processing a Request

All the request processing hooks take the form


int my_func(request_rec* r) {
  /* implement the request processing hook here */
}

This puts the request pool r->pool at your disposal. As discussed above, the request pool is appropriate for the vast majority of operations involved in processing a request. That's what you pass to Apache and APR functions that need a pool argument, as well as your own.

The process pool is available as r->server->process->pool for operations that need to allocate long-lived resources; for example, cacheing a resource that should be computed once and subsequently re-used in other requests. The connection pool is r->connection->pool.

Note by anonymous, Mon May 23 08:49:23 2005

Needed

Using Pools in Apache: Initialisation and Configuration

The internals of Apache's initialisation are complex. But as far as modules are concerned, it can normally be treated as simple: you are just setting up your configuration, and everything is permanent. Apache makes that easy: most of the relevant hooks have prototypes that pass you the relevant pool as their first argument:

Configuration handlers

static const char* my_cfg(cmd_parms* cmd, void* cfg, /* args */ ) Use the configuration pool, cmd->pool to give your configuration the lifetime of the directive.

Pre- and Post-config

These hooks are unusual in having several pools passed: static int my_pre_config(apr_pool_t* pool, apr_pool_t* plog,apr_pool_t* ptemp). For most purposes, just use the first pool, but if your function uses pools for temporary resources within itself, use ptemp.

Child init

static void my_child_init(apr_pool_t* pool, server_rec* s). Again, the pool is the first argument.

Using Pools in Apache: Other Cases

Most Apache modules involve the initialisation and request-processing we have discussed. But there are two other cases to deal with:

Connection Functions

The pre_connection and process_connection connection-level hooks pass a conn_rec as first argument, and are directly analagous to request functions as far as pool resources are concerned. The create_connection connection-initialisation hook passes the pool as its first argument: any module implementing it takes responsibility for setting up the connection.

Filter Functions

Filter functions recieve an ap_filter_t as their first argument. This ambiguously contains both a request_rec and a conn_rec as members, regardless of whether it is a request-level or a connection-level filter. Request-level filters (those declared as AP_FTYPE_RESOURCE or AP_FTYPE_CONTENT_SET) should normally use the request pool. Connection-level filters will get a junk pointer in f->r and must use the connection pool. This can be a gotcha for the unwary!

Basic Resource Management in Apache: the APR Pools