Muffinresearch Labs by Stuart Colville

Six useful shell commands for web developers | Comments (12)

Posted in Linux/Unix on 16th October 2006, 12:16 am by Stuart

The aim of this article is to give a basic introduction into six shell commands that would be useful to any web developer that regularly works on linux or unix boxes. All of these commands are easy to use and if you aren’t familiar with any of these commands I would suggest that you test out some of the examples and learn all of the capabilities of each command as this is really just a basic taster.

sed

The sed command is used to filter text. It takes text as an input and processes it line by line according to the regex that you define. The output is the modified input. sed is great for running find and replace operations on files and this is where it comes in very handy for web developers. See the following examples:

sed '/^$/d' file.txt
Removes all blank lines from file.txt
sed 's/<br>/<br \/>/g' file.txt
Replaces all <br> tags with <br /> in file.txt
sed 's:<br>:<br />:g' file.txt
Replaces all <br> tags with <br /> (colon delimiter) in file.txt

The first example the regex matches a line with nothing between the beginning ^ and the end $ of the line. The d specifys to delete the pattern that matches.

The second example is a typical find and replace sed command. The first part inside the slashes is the text to be matched <br> and the next part is the text to make the replacement with: <br \/>. The g means that sed should replace all the matches it comes across. The backslash in the replacement is used to escape the forward slash which is used as the delimiter. Another way to write this to negate the need to escape the forward slash would be to use a different delimiter as in the third example.

grep

grep is good for finding lines in files that match a particular regular expression. By default grep returns the lines that match a regular expression, for example:

grep '^monkey' file.txt

would return all of the lines in file.txt that start (^) with monkey. It is also possible to use the l switch to return the filenames of the files that contain matches as well. This feature is particularly useful when working on a web server. For example if you would like to find all of the files in a website that have a particular piece of text you would first navigate to the root of the site and then run the following command:

grep -lir "some text" *

To explain the switches is quite simple. l means to return filenames instead of the lines that matched. i make the search case insensitive. r makes the search recursive. The text to search for is within the quotes and the * means to look at all files. It’s quite an intensive command so be sure to limit it to only looking from a sensible directory rather than “/”

locate

locate searches a database for all pathnames that match the pattern specified. The database that locate uses is typically updated on a periodic basis depending on what OS you are running.

The sort of the thing that locate is good for is when you are working on a new web server and you don’t know the location of the file you want to work on. From the web interface you know it is called (process.php) for example. Thus you can run locate process.php and this will turn up all of the paths with process.php found in the database. If this returns too many results you can narrow it down even further you can run something like locate /tools/process.php assuming “process.php” is in a directory called “tools”.

rsync

rsync is a very useful command as it allows you to sync data not only on your own machine, but with remote servers. The clever part is that rsync will only copy the data that is different thus saving on bandwidth if you would otherwise just overwrite the data with a more up to date copy. rsync does even more than just replace files that have been changed. It makes clever comparisons of the data source and the data target and only sends the parts of the file that are needed along with instructions how to merge those parts into the destination.

For web developers this comes in very handy when needing to continually copy data up to (or between) a webserver(s). In addition rsync is a great tool for for backing up data, as it will complete in as much time as it takes to update your backup with what’s changed.

Here’s an example of an rsync command that will copy data from the /var/www/html/dev directory one server to the /var/www/html/live on the local server. When running this command you will be asked to log in to the source machine.

rsync -az muffin:/var/www/html/dev/ /var/www/html/live

Like I mentioned above you can also use rsynce to simply backup local files, to do this you can use rsync in the same way that you would use cp.

For example you could use: rsync -az /var/www/html/dev /home/muffin/backup

tail

tail is a very basic command but it’s the ideal command for looking at web server logs to keep an eye out for errors. tail by default prints the last 10 lines of a text file. With the f switch tail will “follow” the output of the file as it changes. This means you can see changes to the file in real time.

Example: tail -f /var/www/vhosts/a virtualdomain.co.uk/statistics/logs/error_log will follow the output of the error log.

screen

screen is a terminal multiplexer. What that means is that it allows you to run several “windows” inside one terminal. When I was looking tabbed alternatives to mac osx’s terminal I found that screen provided everything I needed with a distinct advantage, which was the ability to detach from a screen session on a remote server and re-connect to it later on. Thus you would be working on several different things in different directories (each in a different screen session) and then detach, switch off, your desktop and then go home and continue from where you left off with everything where it was.

The use for web developers here is pretty obvious. You can for example one window to edit files, one for running a mysql terminal and another to run top to keep a beady eye on your processes.

To give a taster of how to use screen here’s a list of a few relevent examples. For these examples to work they assume (with the exception of the command to re-attach your screen session) that you are already running screen. (To start screen simply type screen then hit enter).

Ctrl-a c
opens a new window
Ctrl-a “
Shows a list of open “windows” that you can select from.
Ctrl-a k
kill the current window
Ctrl-a d
Detach from your screen session
screen -r
Attach to a detached screen session

If you have any of your own favorite commands please feel free to share them in the comments. Use <code> to wrap code snippets.

Post Tools

Comments: Add yours

1. On October 16th, 2006 at 9:44 am Phu said:

Nice writeup Stuart. In particular, sed and grep are invaluable to anyone who works with or in unix systems.

2. On October 16th, 2006 at 12:33 pm Dave Everitt said:

you could mention using the wildcard * to change every .txt or .html (or whatever) file in a directory – handy for altering all to across a site…

so:
sed '/^$/d' file.txt

becomes:
sed '/^$/d' *.txt

3. On October 16th, 2006 at 12:57 pm Stuart Colville said:

@Phu: thanks!

@Dave: That’s a good addition thanks.

4. On October 16th, 2006 at 3:04 pm Ismael said:

Very useful indeed. Thankyou!

5. On October 18th, 2006 at 9:01 pm chrisb said:

“rename” is also a nice unix command that is a little less intimidating for those who haven’t got so much experience with regular expressions.

rename .html .php *.html

substitutes .html with .php in files ending in .html.

easy-peasy.

anyway, does sed work on filenames?

6. On October 18th, 2006 at 9:14 pm chrisb said:

also from the I-really-used-it-this-week department:


find / -type f -size +500000k

will give you all the files on your system that are larger than 500 MB. Simple but undeniably useful when running df shows 95% full.

This list is great because most shell tutorials focus on the *real* geeky sysadmin stuff. Which is useless for the photoshop jockeys.

Personally it has only been in the last year that I realized how much it pays to learn basic command line. I would love to see other commenters post their web-developer-relevant commands.

7. On October 19th, 2006 at 12:38 am Stuart Colville said:

@chrisb: Thanks for your comments and input. AFAIK sed can be used to rename files but only indirectly, by changing text which can then be used via redirection as a filename.

8. On October 21st, 2006 at 4:56 am Mike Barone said:

I don’t know how I ever got by without xargs.

As chrisb mentioned, find is extremely useful.

awk and cut are also very handy.

man for obvious reasons (man man if it’s not obvious) and because some options for commands like xargs vary from machine to machine.

Some people that don’t use shells very often miss out on some of the builtins like history — every command line you run gets numbered: you can rerun previous command lines by using !num. Example:


% history
1889 find . -name "*.php" | xargs -I% php -l %
1890 grep -rnI needle * | cut -f1 -d: | sort -u | xargs nedit &
1891 history
% !1889
find . -name "*.php" | xargs -I% php -l %

Of course you can press the up-arrow in most shells and go through your command history as well and be sure to take advantage of tab-completion for filenames and commands when possible.

Some additional options for the commands discussed in the article:

grep -v #will return lines that DON’T match the search string

rsync -u #update only, don’t overwrite newer files
rsync -n #dry run only, don’t actually do anything but list what files would be transferred – useful because rsync is so powerful and flexible and can let you learn and try additional options (–exclude, –delete, etc) before actually executing them.

9. On October 25th, 2006 at 10:32 pm chrisb said:

Another good one from today: I am transitioning to the new mediatemple grid server (highly recommended), and I have an old apache httpd.conf file with LOTS of old dead Virtual host directives sites on my current server. So there is no need to move them.

Could have used something like http://www.dnsstuff.com/ (also recommended) to manually look up each url, but instead i just created a list of the addresses called “siteslist” and used the wonderful dig command from the terminal with a Bash “for loop” :


for i in $(grep . siteslist); do dig $i >> my_looked_up_sites.txt; done

This produces a nice document of information about each site (notably the ip address) and now I can clean my server with a bit more confidence.

this technique is really useful in a wide variety of situations where you have to repeatedly do X for every occurrence of Y.

For example if you needed to copy a robots.txt file into every subdirectory of a parent “/web” directory you could do something like:


for i in $(ls -l |grep '^d' | awk '{print $9}'); do cp /web/robots.txt $i; done

Which in english is:

ls -l

give a long direcotry listing


grep '^d'

display only the lines for directories


awk '{print $9}'

display only the actual directory name (the 9th column)


cp /web/robots.txt

copy this file into that directory.

It seems complicated at first but if you are a web developer dealing with lots of sites then it pays to invest in this kind of administrative swiss army knife.

10. On January 29th, 2007 at 6:45 am jey said:

could anyone tell me the way to use back-up command in ssh?

11. On January 29th, 2007 at 10:22 am Stuart Colville said:

@jey: If you’re referring to rsync if you attempt to rsync between different servers uptodate versions of rsync will use ssh by default.

If this is a problem try rsync -e ssh

12. On March 8th, 2007 at 4:22 pm Nolan Clayton said:

The Recursive Grep

alias rgrep ‘find . -name \!:2 | xargs grep -i \!:1 | grep -v .svn | grep -v “~”‘

Example Usage
rgrep “whatever” “*”
rgrep “print” “*.xml” | less







XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>



Using Loggerhead with mod_wsgi|(0)

Here’s a post I wrote over on the Project Fondue Blog about our use of Loggerhead with mod_wsgi under Apache. Loggerhead is the rather nice branch viewer for bazaar branches as used on Launchpad.net.

If you’re not already subscribed to the Project Fondue blog feed then I can recommend it, as there should be some interesting posts coming out of there in the coming months (yes I’m unashamedly biased!).

Ubuntu: Turn off changing workspace with mouse wheel|(1)

I found the changing with the workspace with the mouse wheel really annoying. To disable it go to System => Preferences => CompizConfig (available if the compizconfig-settings-manager package is installed) and uncheck “Viewport Switcher” which is under the “Desktop” heading.

Photos on Flickr

© Copyright 2004-10 Stuart Colville, all rights reserved. May contain traces of Muffin. Powered by WordPress. Hosting by Slicehost.com This page was baked in 0.636s.