Six useful shell commands for web developers | Comments (12)
Posted in Linux/Unix on 16th October 2006, 12:16 am by Stuart
The aim of this article is to give a basic introduction into six shell commands that would be useful to any web developer that regularly works on linux or unix boxes. All of these commands are easy to use and if you aren’t familiar with any of these commands I would suggest that you test out some of the examples and learn all of the capabilities of each command as this is really just a basic taster.
sed
The sed command is used to filter text. It takes text as an input and processes it line by line according to the regex that you define. The output is the modified input. sed is great for running find and replace operations on files and this is where it comes in very handy for web developers. See the following examples:
sed '/^$/d' file.txt- Removes all blank lines from file.txt
sed 's/<br>/<br \/>/g' file.txt- Replaces all <br> tags with <br /> in file.txt
sed 's:<br>:<br />:g' file.txt- Replaces all <br> tags with <br /> (colon delimiter) in file.txt
The first example the regex matches a line with nothing between the beginning ^ and the end $ of the line. The d specifys to delete the pattern that matches.
The second example is a typical find and replace sed command. The first part inside the slashes is the text to be matched <br> and the next part is the text to make the replacement with: <br \/>. The g means that sed should replace all the matches it comes across. The backslash in the replacement is used to escape the forward slash which is used as the delimiter. Another way to write this to negate the need to escape the forward slash would be to use a different delimiter as in the third example.
grep
grep is good for finding lines in files that match a particular regular expression. By default grep returns the lines that match a regular expression, for example:
grep '^monkey' file.txt
would return all of the lines in file.txt that start (^) with monkey. It is also possible to use the l switch to return the filenames of the files that contain matches as well. This feature is particularly useful when working on a web server. For example if you would like to find all of the files in a website that have a particular piece of text you would first navigate to the root of the site and then run the following command:
grep -lir "some text" *
To explain the switches is quite simple. l means to return filenames instead of the lines that matched. i make the search case insensitive. r makes the search recursive. The text to search for is within the quotes and the * means to look at all files. It’s quite an intensive command so be sure to limit it to only looking from a sensible directory rather than “/”
locate
locate searches a database for all pathnames that match the pattern specified. The database that locate uses is typically updated on a periodic basis depending on what OS you are running.
The sort of the thing that locate is good for is when you are working on a new web server and you don’t know the location of the file you want to work on. From the web interface you know it is called (process.php) for example. Thus you can run locate process.php and this will turn up all of the paths with process.php found in the database. If this returns too many results you can narrow it down even further you can run something like locate /tools/process.php assuming “process.php” is in a directory called “tools”.
rsync
rsync is a very useful command as it allows you to sync data not only on your own machine, but with remote servers. The clever part is that rsync will only copy the data that is different thus saving on bandwidth if you would otherwise just overwrite the data with a more up to date copy. rsync does even more than just replace files that have been changed. It makes clever comparisons of the data source and the data target and only sends the parts of the file that are needed along with instructions how to merge those parts into the destination.
For web developers this comes in very handy when needing to continually copy data up to (or between) a webserver(s). In addition rsync is a great tool for for backing up data, as it will complete in as much time as it takes to update your backup with what’s changed.
Here’s an example of an rsync command that will copy data from the /var/www/html/dev directory one server to the /var/www/html/live on the local server. When running this command you will be asked to log in to the source machine.
rsync -az muffin:/var/www/html/dev/ /var/www/html/live
Like I mentioned above you can also use rsynce to simply backup local files, to do this you can use rsync in the same way that you would use cp.
For example you could use: rsync -az /var/www/html/dev /home/muffin/backup
tail
tail is a very basic command but it’s the ideal command for looking at web server logs to keep an eye out for errors. tail by default prints the last 10 lines of a text file. With the f switch tail will “follow” the output of the file as it changes. This means you can see changes to the file in real time.
Example: tail -f /var/www/vhosts/a virtualdomain.co.uk/statistics/logs/error_log will follow the output of the error log.
screen
screen is a terminal multiplexer. What that means is that it allows you to run several “windows” inside one terminal. When I was looking tabbed alternatives to mac osx’s terminal I found that screen provided everything I needed with a distinct advantage, which was the ability to detach from a screen session on a remote server and re-connect to it later on. Thus you would be working on several different things in different directories (each in a different screen session) and then detach, switch off, your desktop and then go home and continue from where you left off with everything where it was.
The use for web developers here is pretty obvious. You can for example one window to edit files, one for running a mysql terminal and another to run top to keep a beady eye on your processes.
To give a taster of how to use screen here’s a list of a few relevent examples. For these examples to work they assume (with the exception of the command to re-attach your screen session) that you are already running screen. (To start screen simply type screen then hit enter).
- Ctrl-a c
- opens a new window
- Ctrl-a “
- Shows a list of open “windows” that you can select from.
- Ctrl-a k
- kill the current window
- Ctrl-a d
- Detach from your screen session
- screen -r
- Attach to a detached screen session
If you have any of your own favorite commands please feel free to share them in the comments. Use <code> to wrap code snippets.

Nice writeup Stuart. In particular, sed and grep are invaluable to anyone who works with or in unix systems.
you could mention using the wildcard * to change every .txt or .html (or whatever) file in a directory - handy for altering all to across a site…
so:
sed '/^$/d' file.txtbecomes:
sed '/^$/d' *.txt@Phu: thanks!
@Dave: That’s a good addition thanks.
Very useful indeed. Thankyou!
“rename” is also a nice unix command that is a little less intimidating for those who haven’t got so much experience with regular expressions.
rename .html .php *.html
substitutes .html with .php in files ending in .html.
easy-peasy.
anyway, does sed work on filenames?
also from the I-really-used-it-this-week department:
find / -type f -size +500000k
will give you all the files on your system that are larger than 500 MB. Simple but undeniably useful when running
dfshows 95% full.This list is great because most shell tutorials focus on the *real* geeky sysadmin stuff. Which is useless for the photoshop jockeys.
Personally it has only been in the last year that I realized how much it pays to learn basic command line. I would love to see other commenters post their web-developer-relevant commands.
@chrisb: Thanks for your comments and input. AFAIK sed can be used to rename files but only indirectly, by changing text which can then be used via redirection as a filename.
I don’t know how I ever got by without xargs.
As chrisb mentioned,
findis extremely useful.awkandcutare also very handy.manfor obvious reasons (man manif it’s not obvious) and because some options for commands likexargsvary from machine to machine.Some people that don’t use shells very often miss out on some of the
builtinslikehistory— every command line you run gets numbered: you can rerun previous command lines by using !num. Example:% history
1889 find . -name "*.php" | xargs -I% php -l %
1890 grep -rnI needle * | cut -f1 -d: | sort -u | xargs nedit &
1891 history
% !1889
find . -name "*.php" | xargs -I% php -l %
…
Of course you can press the up-arrow in most shells and go through your command history as well and be sure to take advantage of tab-completion for filenames and commands when possible.
Some additional options for the commands discussed in the article:
grep -v#will return lines that DON’T match the search stringrsync -u#update only, don’t overwrite newer filesrsync -n#dry run only, don’t actually do anything but list what files would be transferred - useful because rsync is so powerful and flexible and can let you learn and try additional options (–exclude, –delete, etc) before actually executing them.Another good one from today: I am transitioning to the new mediatemple grid server (highly recommended), and I have an old apache httpd.conf file with LOTS of old dead Virtual host directives sites on my current server. So there is no need to move them.
Could have used something like http://www.dnsstuff.com/ (also recommended) to manually look up each url, but instead i just created a list of the addresses called “siteslist” and used the wonderful
digcommand from the terminal with a Bash “for loop” :for i in $(grep . siteslist); do dig $i >> my_looked_up_sites.txt; done
This produces a nice document of information about each site (notably the ip address) and now I can clean my server with a bit more confidence.
this technique is really useful in a wide variety of situations where you have to repeatedly do X for every occurrence of Y.
For example if you needed to copy a robots.txt file into every subdirectory of a parent “/web” directory you could do something like:
for i in $(ls -l |grep '^d' | awk '{print $9}'); do cp /web/robots.txt $i; done
Which in english is:
ls -l
give a long direcotry listing
grep '^d'
display only the lines for directories
awk '{print $9}'
display only the actual directory name (the 9th column)
cp /web/robots.txt
copy this file into that directory.
It seems complicated at first but if you are a web developer dealing with lots of sites then it pays to invest in this kind of administrative swiss army knife.
could anyone tell me the way to use back-up command in ssh?
@jey: If you’re referring to rsync if you attempt to rsync between different servers uptodate versions of rsync will use ssh by default.
If this is a problem try
rsync -e sshThe Recursive Grep
alias rgrep ‘find . -name \!:2 | xargs grep -i \!:1 | grep -v .svn | grep -v “~”‘
Example Usage
rgrep “whatever” “*”
rgrep “print” “*.xml” | less