Whilst working on some inherited PHP code that used
fopen I noticed an interesting comment in the PHP manual which pointed out that fopen always makes a DNS lookup for every request. Taking the following code as an example:
<?php $handle = fopen("http://muffinresearch.co.uk/robots.txt", "r"); $contents = stream_get_contents($handle); // PHP5+ ONLY echo $contents; ?>
Using wireshark for capturing and calling that script 3 times I got 3 DNS lookups because fopen doesn't make use of any DNS lookup caches:
Not only is this a problem for fopen but I also found the same problem with file_get_contents too.
The comment in the manual suggests using
gethostbyname which uses the DNS cache. You can then use this to provide the ip address in the arguments to fopen. However as soon as you're trying to fetch something which uses name-based virtual hosts this approach will fail. This is due to there being several sites on the same server all being served on the same ip address; if you contact the server by ip address it will simply serve you content from the default virtual host which is the conf which happens to be first alphabetically.
A SolutionThe cURL library (php5-curl is the package you'll need on Ubuntu) is the preferred way of fetching content with PHP, mainly because it gives you far greater control over requests.
The other big benefit of using cURL is that it makes use of the DNS cache so we can save a DNS lookup for repetitive calls to the same hostname:
<?php $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "http://muffinresearch.co.uk/robots.txt"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $output = curl_exec($ch); echo $output; curl_close($ch); ?>
Running that script three times now results in only one DNS request (I manually cleared the DNS cache with
sudo /etc/init.d/networking restart first)
ConclusionIf you're using php to fetch data from the web cURL is a much more powerful solution than relying on
file_get_contents. Not only that if you're fetching a lot of data from the same hosts frequently your scripts will run faster as a result of only making the minimum DNS requests.