Muffinresearch Labs by Stuart Colville

PHP: Multiple DNS Queries Using fopen | 4 Comments

Posted in Code on 12th August 2009, 6:03 pm by

Whilst working on some inherited PHP code that used fopen I noticed an interesting comment in the PHP manual which pointed out that fopen always makes a DNS lookup for every request. Taking the following code as an example:

<?php
$handle = fopen("http://muffinresearch.co.uk/robots.txt", "r");
$contents = stream_get_contents($handle); // PHP5+ ONLY
echo $contents;
?>

Using wireshark for capturing and calling that script 3 times I got 3 DNS lookups because fopen doesn’t make use of any DNS lookup caches:

Wireshark dialogue showing 3 DNS queries for muffinresearch.co.uk

Not only is this a problem for fopen but I also found the same problem with file_get_contents too.

The comment in the manual suggests using gethostbyname which uses the DNS cache. You can then use this to provide the ip address in the arguments to fopen. However as soon as you’re trying to fetch something which uses name-based virtual hosts this approach will fail. This is due to there being several sites on the same server all being served on the same ip address; if you contact the server by ip address it will simply serve you content from the default virtual host which is the conf which happens to be first alphabetically.

A Solution

The cURL library (php5-curl is the package you’ll need on Ubuntu) is the preferred way of fetching content with PHP, mainly because it gives you far greater control over requests.

The other big benefit of using cURL is that it makes use of the DNS cache so we can save a DNS lookup for repetitive calls to the same hostname:

<?php
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, "http://muffinresearch.co.uk/robots.txt");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $output = curl_exec($ch);
    echo $output;
    curl_close($ch);
?>

Running that script three times now results in only one DNS request (I manually cleared the DNS cache with sudo /etc/init.d/networking restart first)

Wireshark dialogue showing 1 DNS query for muffinresearch.co.uk

Conclusion

If you’re using php to fetch data from the web cURL is a much more powerful solution than relying on fopen or file_get_contents. Not only that if you’re fetching a lot of data from the same hosts frequently your scripts will run faster as a result of only making the minimum DNS requests.

Post Tools

  • http://yoan.dosimple.ch/ Yoan

    Yet another good reason to use cURL; thanks for the demo.

  • http://muffinresearch.co.uk Stuart Colville

    @Yoan: Exactly!

  • http://squarism.com milkfilk

    nscd restart will also clear the DNS. Or “nscd -i hosts” (invalidate hosts). Sometimes it’s better in case networking drops your SSH session or does something else you didn’t mean to.

    Cool blog, was looking at your Puppet post.

  • http://blog.eood.cn Drupal developer

    First time to see the difference of the above methods.
    Thanks.

GNU screen: open tab in current working directory|(1)

A nice trick for having screen open a new tab in the same directory as the one you’re currently in. To use it add it to your .screenrc

# Open new window in current dir.
bind c stuff "screen -X chdir \$PWD;screen^M"
bind ^c stuff "screen -X chdir \$PWD;screen^M"

Hat tip: mteckert on SuperUser.com

Ubuntu: add-apt-repository: command not found|(2)

When you’re using a minimal Ubuntu install if you find the ‘add-apt-repository’ command is missing (it’s useful for adding PPAs and other repositories), then simply run:

sudo apt-get install python-software-properties

Photos on Flickr

© Copyright 2004-12 Stuart Colville, all rights reserved. May contain traces of Muffin. Powered by WordPress. Hosting by Slicehost.com This page was baked in 0.460s.