Sunday Apr 25, 2010

Apache mod_substitute

When building my new Apache Roller system, I had quite a few challenges with the i-am.ws domain, served from a web-server in the DMZ and Roller on a Tomcat server in the backend. The mod_proxy Apache module takes care of most of it, but hardcoded URLs in the html code need to be translated as well from the internal IP to the external domain name.

A month went by and I found the solution in stumbling on the "mod_substitute" and "mod_sed" modules. With those, the proxy can fix on the fly the hard-coded URLs generated by Roller. In this scenario, substitute "" for "http://www.i-am.ws/" and you're all set. The only thing to be aware of is that you must be running a pretty recent version of Apache (2.2.7 for "mod_substitute" and even 2.3 for "mod_sed").

Now that sounds like a done deal, but it took me couple of weekends to figure out a nasty snag with "mod_substitute" and the like. I thought it was caused by mod_proxy and mod_substitute clashing with each other. So I thought I was clever :-) and moved mod_substitute to the back-end. I installed another Apache on a different port (8642) that would do a local proxy to Tomcat (on 8080). It didn't fix the problem, but by doing I discovered that when I retrieved the web-page with "wget" (one of my favorite hacking tools) instead of Firefox, string substitution worked fine. Now my suspision moved to issues with HTTP 1.0 vs. 1.1. Again, that wasn't the case.

Getting desperate, I decided to write my own proxy server. Grabbing some old bits and pieces of code still lying around :-), that wasn't too tough. At first, things went fine and then the same thing happened again. But with my own code – not running in the background – I had a much better handle on debugging. Checking the HTTP headers of the request, I suddenly noticed that Firefox sends to the server an "Accept-Encoding: gzip, deflate" http header record (BTW, same with IE). And yes, the server replied with all html code nicely compressed. Which suddenly explained why all this time mod-substitute couldn't do anything with it. It also explained why wget, not sending that header, was working fine.

For now, I'm sticking with my home-grown proxy, but the proper solution is I guess to use mod_rewrite to get rid of those compression HTTP headers. And then mod_substitute can do what it's supposed to do.