[Setup] How to Configure CacheFu for Correct PURGE Request Format when Using "Inside-Out" Virtual Hosting?

Dan Knezek dknezek at hotmail.com
Wed Jan 30 02:08:41 UTC 2008


BIG PICTURE/SUMMARY:
====================
Need to find a way to have the correct/complete PURGE requests sent to a 
Varnish front end cache via CacheFu when using Apache virtual hosting 
and Zope Virtual Host Monster (VHM) "Inside-Out" virtual hosting syntax 
in the Apache RewriteRule statements.


ENVIRONMENT SNAPSHOT:
=====================

Zope 2.10.5-final
Python 2.4.4
Plone 3.0.5
Archetypes 1.5.5
CacheFu 1.1.1 (SVN/UNRELEASED Rev. 56454)

Red Hat Enterprise Linux AS 4

Setup is a caching/HTTP-server machine in front (running Varnish and 
Apache), then several backend machines running ZEO clients, then a 
single ZEO Storage Server:


     Varnish 1.1.2 (port 80) &
        Apache 2.2 (port 81)
                |
                |
                |
          ZEO Client(s)
                |
                |
                |
        ZEO Storage Server


Let's say we're running Apache/Varnish on a server called 
cache.mycompany.com (IP address 10.5.54.156) and the site we're hosting 
is http://www.mycompany.com/foo, which maps to a Plone site called 
"PloneSite" under the ZODB mount point "mount1" (i.e., 
/mount1/PloneSite) on a ZEO client instance running on port 8080 on a 
machine called zopebackend1.mycompany.com (IP address 10.5.54.155). That 
is, http://www.mycompany.com/foo maps to 
http://zopebackend1.mycompany.com:8080/mount1/PloneSite.

Apache's httpd.conf has rewrite rules such as the following within the 
www.company.com <VirtualHost> section:

   <VirtualHost 10.5.54.156>
   ServerName www.mycompany.com

   .
   .
   .

   RewriteRule  ^/foo(.*) 
http://zopebackend1.mycompany.com:8080/VirtualHostBase/http/www.mycompany.com:80/mount1/PloneSite/VirtualHostRoot/_vh_foo$1 
[L,P]

   .
   .
   .

   </VirtualHost>

Note the use of the _vh_foo at the end of the Rewrite rule--i.e., we are 
using the "Inside-Out" virtual hosting feature of Virtual Host Monster 
syntax.

CacheFu is enabled on the Plone site, and left at default settings 
except for the following:

   Proxy Cache Purge Configuration: Simple Purge (squid/varnish in 
front)
   Site Domains: http://www.mycompany.com:80


STEPS TO REPRODUCE THE PROBLEM:
===============================
Set up an environment similar to the one described above (Varnish cache 
and Apache on same server, Varnish listening on port 80, Apache on port 
81; backend ZEO client servers that Apache passes non-cache-hit requests 
to; a ZEO storage server). Could probably use a single Zope instance 
instead of ZEO, too...the main thing here is what Varnish caches, the 
virtual hosting on Apache, and the use of VHM "Inside-Out" virtual 
hosting syntax in the Apache RewriteRule statement.

Start up the site and make sure you can get to it and its contents via 
the virtual host, and verify that Varnish is caching that site. Then 
change some items on the Plone site in order to force a "PURGE" request 
to be sent via CacheFu on the Plone site.


EXPECTED RESULT:
================
I believe what's cached by Varnish is the URL it sees coming in. E.g., 
if the client request/GET is http://www.mycompany.com/foo/news Varnish 
will cache /foo/news as the object, I believe. It may include the "http" 
and hostname...I'm not positive.

For example, say at one point a GET request is done and it logs the 
following as an entry in the Varnish NCSA format log file:

   67.164.214.168 - - [22/Jan/2008:11:45:29 -1000] "GET 
http://www.mycompany.com/foo/office/party-photos/view HTTP/1.1" 404 322 
"http://www.mycompany.com/foo/office/index.html" "Mozilla/4.0 
(compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 
1.1.4322; Media Center PC 4.0)"

Based on this, I would expect the "PURGE" requests sent to Varnish by 
the ZEO client instance (caused by CacheFu) when the Plone site content 
changes to have the same basic URL..e.g., to be something like the 
following (as an entry in the Varnish NCSA format log):

   10.5.54.155 - - [22/Jan/2008:12:05:13 -1000] "PURGE 
http://www.mycompany.com/foo/office/party-photos/view HTTP/1.1" 404 425 
"-" "-"



ACTUAL RESULT:
==============
The actual "PURGE" request sent looks like the following:


   10.5.54.155 - - [22/Jan/2008:12:05:13 -1000] "PURGE 
http://www.mycompany.com/office/party-photos/view HTTP/1.1" 404 425 "-" 
"-"

I.e., it is missing the "foo" in between the "www.mycompany.com/" and 
"office/party-photos/view". It appears that, because of this, Varnish 
does not purge the object from the cache because the URL does not match 
what was originally cached.

The PURGE request is missing the "foo" because, I think, the actual path 
after the true "/mount1/PloneSite" Plone site is 
"office/party-photos/view" (i.e., the full path would be 
"http://zopebackend1.mycompany.com:8080/mount1/PloneSite/office/party-photos/view" 
 -- the missing "foo" portion is from the virtual hosting, not the 
actual Plone site path).


NOTES/QUESTIONS/THOUGHTS:
=========================
If there's an easy way to set the settings of CacheFu to do what I want, 
I guess I've missed it. I've tried using the "Purge with Custom URLs" 
purge configuration too, but no luck (although I haven't tried modifying 
the rewritePurgeUrls.py script...it'd take me a bit to figure out what 
to change).

If I try to put something after the port number within a "Site Domains" 
section entry, it gets truncated when I hit "Save". E.g., if I put the 
following in "Site Domains":

   http://www.mycompany.com:80/foo

it gets truncated to

   http://www.mycompany.com:80

(i.e., the "/foo" is taken off) when I hit "Save".

I tried this with "Simple Purge" and "Purge with Custom URLs" both 
(although I only thought it might work for the custom URLs option). I 
did so because I was hoping I could force CacheFu to add the "/foo" to 
its PURGE requests.

Actually, if the functionality isn't there, maybe the following can be a 
request for a future version of CacheFu (or the current 1.1.1 release 
due Feb. 3rd, if it's not too hard to add--perhaps I should submit an 
issue/suggestion via the CacheFu development area?):

   Allow the addition of something after the port number within the Site 
Domains section of CacheFu configs, so that if inside-out virtual 
hosting is used, the correct PURGE URL is sent to the front-end cache.

Another suggestion along those lines would be the following:

   There may be more than one cache/HTTP-server machine (for redundancy 
and load-balancing)--so there would be a need to send the 
correct/complete PURGE request to more than one cache proxy...but 
perhaps that's taken care of by the "Proxy Cache Domains" setting in 
CacheFu?



Thanks ahead of time for any insight and help anyone can give me on this 
issue.


Dan K.
dknezek at hotmail.com





More information about the Setup mailing list