Muffinresearch Labs by Stuart Colville

Six useful shell commands for web developers | 12 Comments

Posted in Linux/Unix on 16th October 2006, 12:16 am by

The aim of this article is to give a basic introduction into six shell commands that would be useful to any web developer that regularly works on linux or unix boxes. All of these commands are easy to use and if you aren’t familiar with any of these commands I would suggest that you test out some of the examples and learn all of the capabilities of each command as this is really just a basic taster.

sed

The sed command is used to filter text. It takes text as an input and processes it line by line according to the regex that you define. The output is the modified input. sed is great for running find and replace operations on files and this is where it comes in very handy for web developers. See the following examples:

sed '/^$/d' file.txt
Removes all blank lines from file.txt
sed 's/<br>/<br \/>/g' file.txt
Replaces all <br> tags with <br /> in file.txt
sed 's:<br>:<br />:g' file.txt
Replaces all <br> tags with <br /> (colon delimiter) in file.txt

The first example the regex matches a line with nothing between the beginning ^ and the end $ of the line. The d specifys to delete the pattern that matches.

The second example is a typical find and replace sed command. The first part inside the slashes is the text to be matched <br> and the next part is the text to make the replacement with: <br \/>. The g means that sed should replace all the matches it comes across. The backslash in the replacement is used to escape the forward slash which is used as the delimiter. Another way to write this to negate the need to escape the forward slash would be to use a different delimiter as in the third example.

grep

grep is good for finding lines in files that match a particular regular expression. By default grep returns the lines that match a regular expression, for example:

grep '^monkey' file.txt

would return all of the lines in file.txt that start (^) with monkey. It is also possible to use the l switch to return the filenames of the files that contain matches as well. This feature is particularly useful when working on a web server. For example if you would like to find all of the files in a website that have a particular piece of text you would first navigate to the root of the site and then run the following command:

grep -lir "some text" *

To explain the switches is quite simple. l means to return filenames instead of the lines that matched. i make the search case insensitive. r makes the search recursive. The text to search for is within the quotes and the * means to look at all files. It’s quite an intensive command so be sure to limit it to only looking from a sensible directory rather than “/”

locate

locate searches a database for all pathnames that match the pattern specified. The database that locate uses is typically updated on a periodic basis depending on what OS you are running.

The sort of the thing that locate is good for is when you are working on a new web server and you don’t know the location of the file you want to work on. From the web interface you know it is called (process.php) for example. Thus you can run locate process.php and this will turn up all of the paths with process.php found in the database. If this returns too many results you can narrow it down even further you can run something like locate /tools/process.php assuming “process.php” is in a directory called “tools”.

rsync

rsync is a very useful command as it allows you to sync data not only on your own machine, but with remote servers. The clever part is that rsync will only copy the data that is different thus saving on bandwidth if you would otherwise just overwrite the data with a more up to date copy. rsync does even more than just replace files that have been changed. It makes clever comparisons of the data source and the data target and only sends the parts of the file that are needed along with instructions how to merge those parts into the destination.

For web developers this comes in very handy when needing to continually copy data up to (or between) a webserver(s). In addition rsync is a great tool for for backing up data, as it will complete in as much time as it takes to update your backup with what’s changed.

Here’s an example of an rsync command that will copy data from the /var/www/html/dev directory one server to the /var/www/html/live on the local server. When running this command you will be asked to log in to the source machine.

rsync -az muffin:/var/www/html/dev/ /var/www/html/live

Like I mentioned above you can also use rsynce to simply backup local files, to do this you can use rsync in the same way that you would use cp.

For example you could use: rsync -az /var/www/html/dev /home/muffin/backup

tail

tail is a very basic command but it’s the ideal command for looking at web server logs to keep an eye out for errors. tail by default prints the last 10 lines of a text file. With the f switch tail will “follow” the output of the file as it changes. This means you can see changes to the file in real time.

Example: tail -f /var/www/vhosts/a virtualdomain.co.uk/statistics/logs/error_log will follow the output of the error log.

screen

screen is a terminal multiplexer. What that means is that it allows you to run several “windows” inside one terminal. When I was looking tabbed alternatives to mac osx’s terminal I found that screen provided everything I needed with a distinct advantage, which was the ability to detach from a screen session on a remote server and re-connect to it later on. Thus you would be working on several different things in different directories (each in a different screen session) and then detach, switch off, your desktop and then go home and continue from where you left off with everything where it was.

The use for web developers here is pretty obvious. You can for example one window to edit files, one for running a mysql terminal and another to run top to keep a beady eye on your processes.

To give a taster of how to use screen here’s a list of a few relevent examples. For these examples to work they assume (with the exception of the command to re-attach your screen session) that you are already running screen. (To start screen simply type screen then hit enter).

Ctrl-a c
opens a new window
Ctrl-a “
Shows a list of open “windows” that you can select from.
Ctrl-a k
kill the current window
Ctrl-a d
Detach from your screen session
screen -r
Attach to a detached screen session

If you have any of your own favorite commands please feel free to share them in the comments. Use <code> to wrap code snippets.

Post Tools

  • http://ifelse.co.uk Phu

    Nice writeup Stuart. In particular, sed and grep are invaluable to anyone who works with or in unix systems.

  • Dave Everitt

    you could mention using the wildcard * to change every .txt or .html (or whatever) file in a directory – handy for altering all to across a site…

    so:
    sed '/^$/d' file.txt

    becomes:
    sed '/^$/d' *.txt

  • http://muffinresearch.co.uk Stuart Colville

    @Phu: thanks!

    @Dave: That’s a good addition thanks.

  • http://www.estadobeta.com Ismael

    Very useful indeed. Thankyou!

  • http://www.pictr.org chrisb

    “rename” is also a nice unix command that is a little less intimidating for those who haven’t got so much experience with regular expressions.

    rename .html .php *.html

    substitutes .html with .php in files ending in .html.

    easy-peasy.

    anyway, does sed work on filenames?

  • http://www.pictr.org chrisb

    also from the I-really-used-it-this-week department:


    find / -type f -size +500000k

    will give you all the files on your system that are larger than 500 MB. Simple but undeniably useful when running df shows 95% full.

    This list is great because most shell tutorials focus on the *real* geeky sysadmin stuff. Which is useless for the photoshop jockeys.

    Personally it has only been in the last year that I realized how much it pays to learn basic command line. I would love to see other commenters post their web-developer-relevant commands.

  • http://muffinresearch.co.uk Stuart Colville

    @chrisb: Thanks for your comments and input. AFAIK sed can be used to rename files but only indirectly, by changing text which can then be used via redirection as a filename.

  • http://www.sprokets.net Mike Barone

    I don’t know how I ever got by without xargs.

    As chrisb mentioned, find is extremely useful.

    awk and cut are also very handy.

    man for obvious reasons (man man if it’s not obvious) and because some options for commands like xargs vary from machine to machine.

    Some people that don’t use shells very often miss out on some of the builtins like history — every command line you run gets numbered: you can rerun previous command lines by using !num. Example:


    % history
    1889 find . -name "*.php" | xargs -I% php -l %
    1890 grep -rnI needle * | cut -f1 -d: | sort -u | xargs nedit &
    1891 history
    % !1889
    find . -name "*.php" | xargs -I% php -l %

    Of course you can press the up-arrow in most shells and go through your command history as well and be sure to take advantage of tab-completion for filenames and commands when possible.

    Some additional options for the commands discussed in the article:

    grep -v #will return lines that DON’T match the search string

    rsync -u #update only, don’t overwrite newer files
    rsync -n #dry run only, don’t actually do anything but list what files would be transferred – useful because rsync is so powerful and flexible and can let you learn and try additional options (–exclude, –delete, etc) before actually executing them.

  • http://www.pictr.org chrisb

    Another good one from today: I am transitioning to the new mediatemple grid server (highly recommended), and I have an old apache httpd.conf file with LOTS of old dead Virtual host directives sites on my current server. So there is no need to move them.

    Could have used something like http://www.dnsstuff.com/ (also recommended) to manually look up each url, but instead i just created a list of the addresses called “siteslist” and used the wonderful dig command from the terminal with a Bash “for loop” :


    for i in $(grep . siteslist); do dig $i >> my_looked_up_sites.txt; done

    This produces a nice document of information about each site (notably the ip address) and now I can clean my server with a bit more confidence.

    this technique is really useful in a wide variety of situations where you have to repeatedly do X for every occurrence of Y.

    For example if you needed to copy a robots.txt file into every subdirectory of a parent “/web” directory you could do something like:


    for i in $(ls -l |grep '^d' | awk '{print $9}'); do cp /web/robots.txt $i; done

    Which in english is:

    ls -l

    give a long direcotry listing


    grep '^d'

    display only the lines for directories


    awk '{print $9}'

    display only the actual directory name (the 9th column)


    cp /web/robots.txt

    copy this file into that directory.

    It seems complicated at first but if you are a web developer dealing with lots of sites then it pays to invest in this kind of administrative swiss army knife.

  • jey

    could anyone tell me the way to use back-up command in ssh?

  • http://muffinresearch.co.uk Stuart Colville

    @jey: If you’re referring to rsync if you attempt to rsync between different servers uptodate versions of rsync will use ssh by default.

    If this is a problem try rsync -e ssh

  • Nolan Clayton

    The Recursive Grep

    alias rgrep ‘find . -name \!:2 | xargs grep -i \!:1 | grep -v .svn | grep -v “~”‘

    Example Usage
    rgrep “whatever” “*”
    rgrep “print” “*.xml” | less

GNU screen: open tab in current working directory|(1)

A nice trick for having screen open a new tab in the same directory as the one you’re currently in. To use it add it to your .screenrc

# Open new window in current dir.
bind c stuff "screen -X chdir \$PWD;screen^M"
bind ^c stuff "screen -X chdir \$PWD;screen^M"

Hat tip: mteckert on SuperUser.com

Ubuntu: add-apt-repository: command not found|(3)

When you’re using a minimal Ubuntu install if you find the ‘add-apt-repository’ command is missing (it’s useful for adding PPAs and other repositories), then simply run:

sudo apt-get install python-software-properties

Photos on Flickr

© Copyright 2004-12 Stuart Colville, all rights reserved. May contain traces of Muffin. Powered by WordPress. Hosting by Slicehost.com This page was baked in 0.472s.