Muffinresearch Labs by Stuart Colville

Avoiding the use of .htaccess for performance | 10 Comments

Posted in Code, Hosting, performance on 7th April 2008, 2:16 pm by

.htaccess files are often used because they allow quick changes to the apache web server configuration and don’t require apache to be restarted. However with this flexibility comes a number of performance implications and as a result they should be avoided unless you have absolutely no other way to put configuration changes into effect e.g: shared hosting where you are unlikely to have access to httpd.conf.

The .htaccess performance hit

If .htaccess files are allowed (through the AllowOverride directive) then apache will look for .htaccess files on every request. In addition apache has to look up the directory tree to see if there are further .htaccess files in locations above the location the file is requested from so that it can tell which directives have precedence. For example: if you request a file from /foo/bar, apache has to look for /foo/bar/.htaccess, /foo/.htaccess and /.htaccess and that’s even if there are no .htaccess files present in any of those locations! Once a .htaccess file is found it has to be parsed and don’t forget all of this has to happen for every request!!

As there’s nothing in a .htaccess file that can’t be achieved in your httpd.conf (or files included by httpd.conf) it makes sense to move existing .htaccess rules into your apache conf using a Directory or Location directive as appropriate.

Restarting apache without dropping users

One other reason why .htaccess are seen as a convenience is that they don’t require apache to be restarted after configuration changes. However if you restart apache with “apachectl graceful” you only force the parent process to re-read the apache configuration files and the children to restart when they’re not doing anything and as a result no end users are suddenly dropped.

Finding all of the .htaccess files on your system

Simply drop on to your terminal and change directory to where you site files are located and run: find . -name *.htaccess -print. This should provide output that shows the location of all of the .htaccess files so you can go through one by one and move them to into httpd.conf or a file included by httpd.conf.

Post Tools

  • http://www.kulor.com James Broad

    Great summary, the only problem is usually with shared hosting plans where you may not have access/ability to modify the httpd.conf.

  • http://muffinresearch.co.uk Stuart Colville

    @James: Absolutely! I’ve updated the post to make that point a touch clearer.

  • http://blog.dscpl.com.au Graham Dumpleton

    You talk about .htaccess files being a performance problem, but have you actually done proper benchmarking on it with a real world web application?

    Take as an example a simple hello world application implemented in Python on top of WSGI and hosted using mod_wsgi. If you compare raw requests per second between an application configured in main Apache configuration, to one configured in .htaccess file of document root, you may see approximately five percent drop in throughput for the single .htaccess file case.

    This may seem a lot to those aiming to get maximum performance out of their server, but you really have to put it in context. For a simple hello world program the extra time taken up in a single request due to use of .htaccess files may end up being something like five percent of half a millisecond.

    If you now load up a large Python web application such as Django talking to a back end database, whereas you might achieve 2000 requests per second with the hello world application, you may be lucky to achieve 200 requests per second (and probably a lot less) with Django. The extra overhead of using .htaccess files in that case becomes microseconds within a request taking tens of milliseconds, so basically the overhead of the .htaccess files just get swallowed up in the greater overheads of Django itself and the database.

    Thus, that .htaccess files are bad in some way due to performance is a bit of a myth in some respects. It may have some relevance if you are serving only static files in a very high load site, but if you are running a large dynamic web application, it is generally not an issue.

  • http://muffinresearch.co.uk Stuart Colville

    @Graham: You make some good points here and in particular the point about taking these things in context is very valid. I think the point here I’m overall making here is using .htaccess when there’s no real need to is the important one but I absolutely can see there’s situations where other bottlenecks are going to obliterate any benefits of using conf over .htaccess.

    To really get what the performance differences needs to be looked at in further detail. For example; the difference between what directories .htaccess is enabled for i.e. globally or just specific dirs. How much slower is the request if the .htaccess is packed with config for example? Also what hit is encountered when you have several levels of .htaccess files which override each other as the precedence will have to be analysed to apply the rules correctly. I guess I failed in the regard of not providing a more qualified data based analysis for this post!

    All in all I guess it’s fair to say worrying about this is not worth it. However, I also think it’s fair to say that avoiding .htaccess removes any performance hit caused by the reasons listed above and therefore is a good practice especially for production usage.

  • http://blog.dscpl.com.au Graham Dumpleton

    Definitely concur in respect of production setup for a large volume site if you really want to squeeze out the most performance possible, but then if your site is that big you would be looking at separate media servers and other techniques as well to improve performance.

    Frankly, for most of the people out there with their grand plans of having the next super duper site, they will never ever see enough traffic for it to be an issue though. :-)

  • Pingback: 20 .htaccess Hacks You Probably Didn’t Know About | DevMoose

  • http://www.sandyhillottawa.com Roch

    Do you know of anyway with htaccess to disable someone from using your domain to point to their own website on the same server? Ex: they use YOURDOMAIN.com to promote their PHISHING WEBSITE.COM by using this simple URL to send users : YOURDOMAIN.COM/~phishing/file.html

    Any help would be greatly appreciated. Thanks

  • http://janakspen.blogspot.com Janak

    Liked the original post, and also the comments about the overhead being too little in many real world applications. Few points I would like to make:

    1) Many times, such optimizations CAN be useful, maybe not for a site getting 100K visitors per day and having a dedicated server with 8GB RAM etc, but for sites having 1K visitors a day and running on a cheap low end 64MB VPS costing $1 a month.

    2) Also, disallowing .htaccess is also a way for cleaner maintenance. One doesn’t have to worry about overrides, .htaccess in multiple folders etc.

    3) Consider the case of mass virtual hosting. Lets say you want to make some change to the htaccess rules. Rather than changing all htacess files in all folders of all websites, its easier to change them in one apache config file.

    Overall – this is a useful tip, HOWEVER, if you already have a decent setup with .htaccess files, you NEED NOT bother too much to change it unless you really are hard pressed for server performance.

  • Paul

    What Graham fails to see is that such minor performance improvements are vital for anyone who wants to be #1. In real life situations, Graham, every busy server goes through periodic peaks throughout the day. When the server hits these peaks, everything can come to a crawl. a few hundred microseconds may not sound like much to you, but it goes far beyond that. Under such peaks, resources become critical that goes far beyond a few dozen microseconds.

    Such performance improvements can mean the difference of the server not becoming overloaded for an appreciable percentage of the time.

  • Paul

    In my last post, microseconds should be milliseconds. File systems can’t cache an entire website, and the .htaccess file’s not going to be cached. So the disk head has move, which takes a good amount of time per .htaccess file load, which can be 4 or more per page request, especially on blog sites.

    This idea of moving .htaccess to httpd.conf is noticeable during peak moments throughout the day on busy websites.

GNU screen: open tab in current working directory|(1)

A nice trick for having screen open a new tab in the same directory as the one you’re currently in. To use it add it to your .screenrc

# Open new window in current dir.
bind c stuff "screen -X chdir \$PWD;screen^M"
bind ^c stuff "screen -X chdir \$PWD;screen^M"

Hat tip: mteckert on SuperUser.com

Ubuntu: add-apt-repository: command not found|(3)

When you’re using a minimal Ubuntu install if you find the ‘add-apt-repository’ command is missing (it’s useful for adding PPAs and other repositories), then simply run:

sudo apt-get install python-software-properties

Photos on Flickr

© Copyright 2004-12 Stuart Colville, all rights reserved. May contain traces of Muffin. Powered by WordPress. Hosting by Slicehost.com This page was baked in 0.557s.