Pages

Tuesday, November 20, 2007

Tomcat: The Definitive Guide



Defining Version 5.0

Tomcat 5.0 is, of course, the next major refactoring of Tomcat since version 4.1. The major version number of Tomcat is incremented each time the Java Servlet and JSP specifications have a new final release, hence Tomcat 5 corresponds with the final release of the Servlet 2.4 and JSP 2.0 specifications.
All new development will now occur on the Tomcat 5.0 source code branch, or on one of its future sub-branches. The Tomcat committers could decide to apply new code with new features to Tomcat 4.1's source, but that's only likely to be a backport of features that are already implemented in Tomcat 5. If critical security flaws are fixed in Tomcat 5 that also apply to Tomcat 4.1's code, for example, those will probably all be backported.
Tomcat 5.0 goals

Tomcat 5.0 looks great so far. It includes support for the Java Servlet 2.4 and the JSP 2.0 specifications, fixes many performance issues, and adds many new features and other improvements. Below we'll look at the two overall goals of Tomcat 5.0, which were implemented as of version 5.0.16.

Unifying the Two Tomcat Communities

An important goal for Tomcat 5 was to reunify the Tomcat communities under one codebase. A split had occurred in the Tomcat community between the Tomcat 3 branch proponents and the Tomcat Catalina developers. They each worked on their own branches and didn't collaborate much. Eventually, the Jakarta Project Management Committee made the decision that Tomcat Catalina was ready to replace Tomcat 3, and it was dubbed Tomcat 4. But because Tomcat 4 was a different implementation by a mostly different team of developers, migration from Tomcat 3 to Tomcat 4 was difficult.
Despite the committee's sanctioning of Tomcat 4, the community remained split, among developers and users alike. Each branch of Tomcat separately developed various features and performance enhancements. Several of these developments solved some deep problems in elegant ways, but depending on which Tomcat version a user used, only a portion of these solutions were available.
About the time that developers began working on Tomcat 4.1, both Tomcat development teams agreed to work together on the Tomcat connectors, which included the Java side of the JK2 connector, the HTTP connector (a pure-Java web server), as well as other connectors. A connector is a Tomcat core component that channels servlet container requests into Tomcat.
The consolidation of the Tomcat 3 and Tomcat 4 connector code signaled a larger step toward reunification as it led to collaboration on other core performance issues, as well as development of independent servlet-container-version solutions, one of the last hurdles to unification. Tomcat 3 had been implemented as a small servlet container core that could be configured to load and use plug-ins called modules. Proponents of this design believe that keeping the core modular makes it both easier to maintain and more flexible. Tomcat 4's Catalina core was implemented modularly, but didn't offer the feature of generic servlet container modules.
Tomcat 5 includes Tomcat 4.1's core, and it adds the feature of version 3's generic servlet container modules. The developers still have some work to do in this area before it's clear what can and should be built as a module, and how it's generally done, but this work had led to the developer community working together again on all aspects of Tomcat.

Supporting the Java Servlet 2.4 and JSP 2.0 Specifications
The release of Tomcat 5 corresponds with the final release of the Servlet 2.4 and JSP 2.0 specifications, so now let's take a brief look at what's new in each spec, and what impact, if any, there will be on Tomcat 5 development.
The Java Servlet 2.4 Specification
The Servlet 2.4 Specification is only a slight evolution of the Servlet 2.3 Specification, so the API and semantics are almost completely backwards-compatible. With very few exceptions, Servlet 2.3 web applications should work fine in a Servlet 2.4 container.
This is great news for those currently using Tomcat 4 who wish to upgrade to Tomcat 5 -- you probably don't need to modify your web applications. Keep in mind, though, that your server.xml config file isn't part of your web application; it's a Tomcat-specific file, and as such may need some modifications to upgrade it for use with Tomcat 5. Since the syntax of Tomcat 4's server.xml file is almost the same as that of Tomcat 5's server.xml file, this shouldn't be difficult. I recommend that you start fresh with a Tomcat 5 server.xml file and migrate only the XML elements that you added to Tomcat 4's server.xml file. One way to see what you modified is to diff your modified Tomcat 4 server.xml file against the untouched distribution version. When you add those XML elements, look at the Tomcat 5 Configuration Reference pages and verify that each element's attribute exists for Tomcat 5. This is especially important for the and elements, since they have each been modified in Tomcat 5.
The deepest impact that the Servlet 2.4 specification has on Tomcat 5 is the integration of XML Schema. In previous versions of the servlet specification, the deployment descriptors for servlet web applications were defined and validated using an XML DTD. This worked fine, but as it turned out, DTDs aren't quite modular or flexible enough for other technologies to be able to leverage servlet web apps as a framework. To achieve this level of flexibility and modularity, those involved in revising the Servlet 2.4 Specification and other J2EE 1.4 specifications decided to base the deployment descriptor definition and validation on the newer XML Schema, while maintaining backwards-compatibility with the older servlet 2.3 (and lower) DTDs.
The remaining changes to the Servlet API have been small, amounting to clarifications, bug fixes, and omission fixes. This is good news because it shows that the Servlet API is mature. Tomcat 5 fully implements all these new API changes and additions.
The JavaServer Pages 2.0 Specification
This new version of JSP is quite a bit larger due to its many new features, but aims to be backward-compatible with JSP 1.2. The most important addition to JSP 2.0 is the inclusion of the JavaServer Pages Expression Language (EL). Since EL is now part of the JSP container, EL became more useful because it can be used even in the middle of template text as opposed to just within certain custom tags. For Tomcat developers, the main thing to keep in mind with this new addition is that EL is now also part of Tomcat 5, which makes version 5 more featureful and easier to use than Tomcat 4 for developing detailed web applications.
Another new feature that JSP 2.0 includes is "tag files", which are JSP tags that are implemented as JSP fragment files and contain JSP content. Tag files make it easy to develop such things as modular page components, including XML content, which can be included in JSP pages. It's a feature, therefore, that's available only in Tomcat 5, offering web app developers something that Tomcat 4 does not offer.
Also included in JSP 2.0 are improvements to the handling of XML content, which allow developers to write their dynamic XML content as JSP content. The specification has numerous clarifications since version 1.2, and the file extensions .jspx and .tagx have been added. This feature isn't exclusive to Tomcat 5--other servlet container implementations are free to implement JSP 2.0--but this is a feature that Tomcat 4 does not have because it only implements JSP 1.2.

Tomcat 5.0 Features

Now it's time to get to Tomcat 5's major new features.
Performance Refactoring to Reduce Garbage Creation
The first enhancement to look at is Tomcat 5.0's memory-profiling tools. Tomcat 5 has been carefully optimized so that it produces less object garbage for the Java VM's garbage collector to have to clean up.
A typical problem with busy Java servers is that the server software constantly instantiates new objects, and when they're not needed anymore (typically, when a request ends), the objects are not reused but instead thrown out as garbage. The garbage collector must then find and reap all such objects to reclaim the memory they occupy. This takes time and CPU cycles to do, and in the meantime the whole JVM may be paused so that the garbage collector can finish its work. This means the requests currently in process must simply wait until the garbage collector is done, which makes the whole server slow down a little. Usually, this isn't a big problem because the garbage collector is pretty efficient at collecting garbage objects. But in some cases there is so much garbage being produced that it can't keep up, and eventually the amount of free memory gets low because there is a backlog of garbage objects using lots of memory. Or sometimes a web app creates very large objects that take the garbage collector longer to finalize and destroy, so the amount of free memory is lost in large chunks but isn't being replenished quite as fast.
Tomcat 5.0 has had many garbage creation (read performance enhancement) changes since Tomcat 4.1. Tomcat 5.0's single most important garbage-creation refactoring was the new request URI mapper. After some optimization profiling, Tomcat 4.1's request pipeline was found to create excess garbage while mapping a Connector's requests to the proper Container. For Tomcat 5, a whole new mapper was implemented that generates little or no garbage (lots of object recycling is going on in there), and thus Tomcat 5.0's request pipeline performs noticeably better than that of Tomcat 4.1. This also lowers the overall memory usage compared to Tomcat 4.1, which helps to prevent OutOfMemoryExceptions in the web apps it runs, and helps Tomcat 5 to scale higher vertically (i.e. higher scalability on a single machine)
Tomcat 4's configuration system offered some modularity in the form of deployment context fragments. These were XML configuration files that contained a single element and everything nested within it. If the deployer found one in the CATALINA_HOME/web apps directory, it would deploy that context (web app) the same as if it had been configured in Tomcat's server.xml file. This was helpful because any changes to the server.xml cannot be reread until Tomcat is restarted, while context XML fragment files can be reloaded at any time. But, the administrator didn't have any fine-grained way to control which they were deployed into, or which (for those who have multiple s configured -- probably not many).
To add better scoping control of the context fragments, and to consolidate Tomcat's configuration files in the CATALINA_HOME/conf directory, Tomcat 5 now supports placing the context fragments in a CATALINA_HOME/conf/[Enginename]/[Hostname]/ directory. For example, if your Engine is named Catalina, and you have a Host named www.example.com, then you can place context XML fragments into the CATALINA_HOME/conf/Catalina/www.example.com/ directory. If you have multiple s, each of them has its own directory, separating its config files from other s. Reloading the context fragments in Tomcat 5 works the same as in Tomcat 4, they're just in a different file system location.
Many of the deployment attributes have been renamed, or have had their behavior changed in Tomcat 5. Mainly this was to rename ambiguously named attributes, but it was also to make the deployment and redeployment behavior a bit more desirable. For example, the Tomcat 4 attribute named liveDeploy has been renamed autoDeploy in Tomcat 5, and the Tomcat 4 attribute named autoDeploy has been renamed deployOnStartup in Tomcat 5. In short, Tomcat 4's configuration elements in server.xml are not compatible with those of Tomcat 5. Compare the Tomcat 4.1 Host configuration reference page with the Tomcat 5.0 Host configuration reference page in order to migrate your configs.

Standalone Deployer

As of Tomcat 5, a new standalone web app deployer is included. By standalone I mean that it isn't bundled as part of Tomcat but as a separate package released alongside of Tomcat. This deployer is actually just an Apache Ant build file that uses some custom Ant tasks to do things to web applications, like:
• Compile the JSPs
• Validate the deployment descriptor using XML schema validation
• Deploy it into Tomcat
• Redeploy a new version
• Undeploy a web app
• Reload a web app without redeploying it
• Start a web application
• Stop a web application
• List the context paths of all currently installed web applications of a specified virtual host
• List the available global JNDI resources in Tomcat
• List the available roles and their descriptions
But not all of these tasks are used in the deployer's build file. Here's output from the build file showing the public targets that it offers:
$ ant -projecthelp
Buildfile: build.xml
Main targets:

clean Removes build directory
compile Compile web application
deploy Deploy web application
reload Reload web application
start Start web application
stop Stop web application
undeploy Undeploy web application

Default target: compile
Almost all of the custom Ant tasks that come with the deployer were already included in Tomcat 4.1, but they weren't packaged as a component separate from Tomcat. Mainly, this deployer serves as a rough but functional example of how to instrument an Ant build system to perform these functions.
Better Support for JMX
Tomcat 5.0 has substantially better JMX instrumentation than Tomcat 4.1. How much better? I did a direct comparison between the two versions by counting all of the MBeans, attributes of the MBeans, and operations (method calls) exposed by the MBeans in each version of Tomcat. Here are the numbers as of this writing:
Number of items exposed through JMX

Tomcat Version MBeans Attributes Operations
4.1.30 52 282 79
5.1.18 68 391 148

As you can see, Tomcat 5.0 adds a significant number of attributes and operations that are exposed via JMX. Basically all of Tomcat's internal objects are exposed as MBeans in both versions, so both do a great job of exposing enough objects as MBeans. For the purpose of monitoring, the number of attributes is the most important: Tomcat 5.0 exposes 109 more of them than Tomcat 4.1. For the purposes of management/administration and dynamic configurability, operations are the most important. Tomcat 5.0 really shines here, exposing 69 more operations -- almost double that of Tomcat 4.1.
If you download a Tomcat 4.1 binary release, it comes bundled with the open source MX4J version 1.1.1 implementation of the JMX 1.1 Specification. This version of JMX does not include the JMX Remote API (JSR 160), since the JMX Remote API only works with JMX 1.2 implementations. This means that there is no standard network protocol for managing and monitoring Tomcat 4.1 via the network. The Tomcat 5.0 binary release comes bundled with Sun's JMX 1.2 reference implementation, along with Sun's JMX Remote API 1.0 reference implementation (although it could just as easily be bundled with the MX4J 2.0 implementation of JMX 1.2 and the JMX Remote API 1.0 -- it's still in beta but it works great).
As it turns out, both of these versions of Tomcat can be compiled against a JMX 1.2 implementation, and both can run with them. But, if you're not building your own Tomcat (as you'll see in the next section, it's easier to do with Tomcat 5), then Tomcat 4 doesn't come with JMX 1.2 and Tomcat 5 does. But Tomcat 5 doesn't yet offer any way of turning on a JMX Remote API connection server. That's coming soon for Tomcat 5, but as of this writing it isn't included yet.
Improved Tomcat Build System
Tomcat 5's build system is quite a bit more automated than that of Tomcat 4. It has also been cleaned up in many ways.
Anyone who has tried to build Tomcat 4 can tell you that it isn't easy to do -- you have to download and carefully install a swarm of jar files from many different web sites, pull down Tomcat 4's source code either via a source snapshot archive or from CVS, set some paths in a properties file, and then try to build it. Most people do all that only to have the build fail due to either missing jar files or due to improper jar file versions (despite properly following directions from Tomcat 4's BUILDING.txt file). Tomcat's dependencies changed frequently in Tomcat 4; the BUILDING.txt file wasn't always kept up-to-date with the exact version numbers that were necessary; and the paths where the build expected to find them are exact and inconsistent. The Tomcat committers did quite a bit of work to keep everything up-to-date and building cleanly, but assembling a working build environment from scratch was still mainly a manual, error-prone process.
Tomcat 5's build system, on the other hand, has been automated so that it can assemble its own build environment, all except for installing the JDK and Apache Ant (they're using version 1.6). Just download the top-level build file and invoke the default target and away it goes! It will do all the pulling and installing of the jars and Tomcat 5 source code, configure the build, and then start building Tomcat 5. Of course, you may want to make your own custom-build properties file if you have special build-configuration needs -- if you have no special needs then why not just use a binary release? See this page for info on customizing the build. Regardless, the build is so automated it feels like it's on autopilot.

Session Clustering Code as a Module

Tomcat 4 did not implement nor include any servlet session-clustering, except for some old, mainly broken code that was once a valiant attempt at implementing session-clustering via IP multicast. A new implementation of session-clustering was implemented for Tomcat, and worked with Tomcat 4.1 as an add-on feature, but was never included nor bundled with Tomcat 4.1.
Tomcat 5 includes this newer session-clustering implementation as a module. The default server.xml configuration file contains some text about how to configure it, but it is turned off by default as most people only run one instance of Tomcat and do not use session-clustering. But it's included in Tomcat 5 for those who want it.
This session-clustering implementation barely worked for Tomcat 4, but has been significantly refactored for Tomcat 5. Tomcat 5.0.16 included lots of improvements over Tomcat 4, but then many more improvements were made after 5.0.16, and I recommend only using version 5.0.18 and higher if you want to use session-clustering! Many important performance enhancements and serious bug fixes have been applied since version 5.0.16.
For background information about how this kind of session-clustering works, you might want to see our book Tomcat: The Definitive Guide (O'Reilly). In Chapter 10, Tomcat Clustering we go over many details about distributed web applications and how servlet session-clustering works. Also, see the Cluster How-to for information specific to this session-clustering implementation.
This clustering module is the first (and so far the only) Tomcat 5 module included in the distribution. It will be interesting to see what other modules get added to Tomcat 5 in the future.
Increased JSP Tag Library Efficiency via Tag Pooling and Tag Plug-ins
Since JSP is popular, and JSP custom tag libraries are becoming popular, Tomcat needs to keep these features efficient and scalable by implementing ways to speed things up behind the scenes in the implementation of the JSP engine. To avoid unnecessary object instantiation and garbage creation, which are costly in terms of time and memory, two techniques are being used in Tomcat 5 to speed things up: tag pooling and tag plug-ins.
Tag pooling is much like servlet pooling, or any other object pooling, for that matter. The idea is that you may have many request threads at once that each need the same kind of object, and that object does not need to be stateful across more than one request. In that case, the server can instantiate a pool of them that it can draw from instead of instantiating new ones all the time. When a request is done with the object, it gets returned to the pool after any request state is first purged.
Tomcat 5's JSP engine, Jasper 2, can pool JSP tag objects this way, which speeds things up nicely. Tomcat 4.1 also had an early implementation of tag pooling, but the implementation in Tomcat 5 is more mature.

Another feature of Jasper 2 is "tag plug-ins," not to be confused with tag files (discussed earlier in this article). Tag plug-ins are not a JCP standard, but instead something specific to Tomcat 5's JSP implementation. In short, it is a modular way of optimizing the Jasper 2 JSP engine to render a custom JSP tag library as fast as possible. Probably the best explanation about it that I could find on the Net (other than the source code) is the original tomcat-dev mailing list thread where the concept was first introduced.
Tomcat 4 does not implement tag plug-ins. Tomcat 5.0 is the first to do so. This means that Tomcat 5 should be faster at rendering some JSP pages than Tomcat 4, especially when one or more tag libraries being used is accompanied by tag plug-ins.

The Balancer Web App

Also new in Tomcat 5.0 is the Balancer web app, a rules-based load balancer implemented as a web application. Balancer is useful for anyone who doesn't already own a hardware load balancer (or doesn't want to spend the money to buy one), and who needs to load balance across more than one Tomcat instance.
There are many ways to load balance HTTP requests, and not all methods are right for your site. But Balancer is a free, pure-Java implementation of a software load balancer that uses infrastructure that Tomcat users already have running: Tomcat. It doesn't do low-level TCP NAT or TCP tunneling, but instead load balances via HTTP redirects.
In Tomcat 5.0.16, Balancer is a new implementation and lacks many features people will want. And it's probably not a good solution for load-balancing requests for very high-traffic web sites. But the Balancer web app is useful for some people as it existed in Tomcat 5.0.16, and is included in part so that it is possible to install and configure a 100 percent Java cluster using only what comes with Tomcat 5.0. With the inclusion of the Balancer web app, Tomcat alone now implements a clustered servlet and JSP container, implemented in pure Java. See the Balancer How-to for details about how to use it.

Conclusion

Tomcat 5.0 contains many substantial updates and improvements over Tomcat 4.1. Many of the underlying technologies that Tomcat builds upon have been updated, enabling Tomcat 5 to offer a wider range of solutions and features to the administrator and developer. This, combined with many performance enhancements and a smaller memory footprint during heavy loads means that Tomcat 5 does a better job with the same web apps than does Tomcat 4. Tomcat 5 is also more manageable, more easily monitored, and is easier to build. Tomcat 5.0 is production-ready now. The Tomcat community tested many releases of Tomcat 5 before it was voted stable late last year. I recommend you consider upgrading to Tomcat 5.0.

No comments:

Web Stats

Amazing Web Counter
Pay Roll Advance