Department of
Physics & Astronomy

 
Academics
Courses
Events
Research
People
Facilities

Main Page
SFSU Home
  • Computer System
    • Unix Shells

Unix Shells
A Mini-Manual

Contents

The Command-line Interface

If you're used to the point-and-click menu interface of popular graphics-based user interfaces, you probably shudder at a system where you have to already know what to do. You can, in fact, invoke a similar graphics screen on a Unix system. If that's what you want to do.

However, knowing a few of the ground rules of the command-line interface should help to dissipate the stigma. There are some very basic operations, as simple as a mouse click, that will help to make you comfortable in the Unix environment.

The Unix command-line interface extremely powerful. Though it's just a bit tedious learning the basics underpinnings and syntax, it soon becomes very usable. Its power lies in the fact that it doubles as a programming language. If you're used to a point-and-click environment, you're accustomed to repetition. Every time you need to do some task you have to redo all the clicks. If you want to do the same operation on hundreds of files, you have to repeat your point-and-click sequence hundreds of times. Furthermore, it's often the case that if the application you're using doesn't quite do exactly what you want it to do, so you just can't do it.

With Unix command-line interface, the command syntax and basic Unix utility set is powerful enough that often a one- or two-line command will accomplish a whole afternoon's worth of pointing and clicking. In addition, what you can do once you can do forever. Just type the commands into a file and you've got a program that does exactly what you want it to do, and you've just added to your tool chest. Let's get started, shall we?

The Command Line

When you have logged onto a Unix/Linux/BSD system, you [nearly] always begin typing into a program called a shell. (An exception is dunsel, which invokes Win4Lin for you.) You see a command prompt waiting for you to type in a command. When you press <enter> the shell reads what you have typed and tries to make a command out of it. First it divides the line up into words, or tokens, according to the white space (spaces and tabs). The first token in your line should be the name of a command. The rest of the line, token by token, belongs to the command to do with as it sees fit. The end of the line denotes the end of the command line. A semi-colon can also mark the end of a command-line; the remaining tokens are expected to form another command-line.

The shell program recognizes a number of tokens as built-in shell commands. A few simple shell commands are 'pwd', 'cd', 'echo'. The echo command will print out the input tokens, separated by spaces.

	quark:twoods% echo hello	world
	hello world
If you want whitespace imbedded in a token, including spaces, tabs, and newlines, you must surround the token by single- or double-quotes:
	quark:twoods% echo 'hello	world'
	hello	world
	quark:twoods% echo "hello
	world"
	hello
	world

In most Unix utilities, command options are specified by a dash and a letter. For instance, the echo command will not add a new line at the end of its message if the -n option of is given:

	quark:twoods% echo -n This message is p; echo rinted on one line.
	This message is printed on one line.

Navigation

In Unix, files are found listed in directories. Directories may contain files and other directory listings, which can contain further listings, and so on. The root directory, named /, is at the top of the directory tree. Ultimately, everything is located somewhere under the root directory, so you can always "get there from here". Pathnames are in essence a list of directories separated by slashes. /usr/bin/ls refers to a file named ls residing in the directory /usr/bin.

The pwd (print working directory) command tells you what directory you are "in". When you log in, you start out in your "home" directory. This directory belongs to you, and is where you will do your work. (Under Unix, your ability to create and alter directories and files depends upon who you are and whom the files/directories belong to. You can't change the contents of most of the rest of the Unix file space, though you can read most but not all of it.) You can change your working directory with the command "cd". If you don't specify a directory, cd defaults to your home directory. Go there now, and use the 'ls' command to list out the contents of your cwd:

	quark:twoods% cd; ls
A file or directory is referred to by its "pathname", which tells how to find it. When you refer to a file or directory on the command line, you can use a simple pathname, such as "myfile", which means "the file named myfile in my current working directory (cwd)". If you're on a P&A system you probably have a directory named "public_html" in your home directory. Find out what's in that directory:
	quark:twoods% ls public_html
You probably have a file there named "Welcome.html". You can construct a pathname telling where to find a file that isn't in your cwd. Using the separator '/', this file's pathname relative to your home directory is "public_html/Welcome.html".

Your home directory is the "parent" directory for your public_html directory. You could also say that your home directory is "immediately above" your public_html directory. Each directory has a listing for itself called '.', and a listing for its parent directory called '..', which you can use anywhere you would use a directory name. This means that you can refer to your cwd as ".", or as "public_html/..".

You can also use an "absolute pathname" to refer to a file or directory. These pathnames always begin with '/', referring to the "root" directory. When you used the "pwd" command, above, it printed out your home directory's absolute pathname. If we call that pathname /<homedir>, your public_html/Welcome.html pathname can be referred to as:

	/<homedir>/public_html/Welcome.html

Commands and Tokens

If the first word (token) of your command line is not recognized as a shell command, the shell interprets it as the name of a program, and tries to find that program. For instance, the 'ls' command is not a builtin shell command, but rather a Unix utility program residing in a file. There are a few places in the Unix system hierarchy where the standard Unix utilities are usually found, such as /bin, or /usr/bin. Locally installed utilities are often installed in the /usr/local/bin directory, while many graphics utilities live in /usr/X11R6/bin. If the shell didn't find the program you wanted, you can, of course, type in the pathname of the program, and the shell is sure to find it, such as /bin/find, or ./myprog.

The rest of the tokens on a line belong to the command or utility you've invoked and will be used accordingly. For instance, the builtin command echo will print out the rest of the tokens on the command line, so that

	quark:twoods% echo hello world
will print 'hello world' for you. However, before the command actually runs, the rest of the line may be altered, according to the rules of Command-Line Substitution. One of the alterations is called file-name expansion. You can use the special characters '*' and '?' to stand for single- and multiple-character wildcards, which may match existing filenames. Thus, if you type in
	quark:twoods% echo *
you will see a list of the entries in your cwd. (However, if your cwd is currently empty, the * will not be expanded.) As another example, you can print out your public_html/Welcome.html file pathname using wildcards:
	quark:twoods% echo pub*/Wel*
If you give a wild-card pattern that matches more than one pathname, the pattern will be replaced by all of the matching pathnames before being passed to the command:
	quark:twoods% echo pub*/*
If you surround a wild-card pattern with double-quotes, the expanded pattern will remain a single token. If you surround any token with single-quotes, no changes will be made to the token, including wild-card file name expansion.

Variables

Another thing that happens to a command line before it's passed to the command is variable substitution. You can make up a shell variable name, like FOO, and assign a value to it. (By convention, most shell variables are fully capitalized in order to make it easier to parse a shell script, in which most commands and file names are lower-case and most text is, well, mostly lower-case.)

The value of a variable is referred to by prepending a '$' to its name. Here's an example. You may be working with data located in a directory far away from your home directory and far down the directory tree, such as

	/usr/local/classes/physics101/data/spacetime/here/now.
Rather than type the whole thing every time you want to cd there, assign it to FOO and you can save some typing. (Unix folks love to save typing.)
	quark:twoods% set FOO /usr/local/classes/physics101/spacetime/here/now
	quark:twoods% echo $FOO
	quark:twoods% set BAR /usr/local/classes/physics101/spacetime/there/then
	quark:twoods% cd $FOO
	quark:twoods% do_work
	quark:twoods% cd $BAR
	quark:twoods% do_work
	quark:twoods% cd
	quark:twoods% read_mail
	quark:twoods% cd $FOO
	  ...
Remember, no changes will be made within single-quotes, including variable substitution.

BSD and C-Shell

Here we must confuse the issue just a tad. Since its invention at Bell Labs several decades ago, Unix was used by several different groups of people before ever being released to the general public. One of the early groups was UC Berkeley. Over time, the Unix at Berkeley got added to and changed relative to the Bell Labs (AT&T) version. The latter also evolved over time, and was eventually released as System V (Five) -- SVR4 is Release 4 of System V -- eventually becoming UnixWare. The Berkeley version came to be known as BSD. Sun Solaris is SVR4, while Mac OS/X is a BSD variant. Linux was written from scratch by Linus Torvald et al, mostly along SVR4 lines.

As the shell program, sh, evolved, SVR4 settled on a variant developed by Bell Labs' Steve Bourne (the Bourne Shell), while at Berkeley Bill Joy developed a version called csh (C-Shell). Unfortunately, the syntaxes for the two shells, though similar, are not identical. Thus, the present-day descendants of these two variants, ksh and bash from sh, and tcsh from csh, likewise have different syntaxes.

Most P&A login accounts are configured with tcsh, which syntax I have used above for FOO's assignment syntax. The sh/ksh/bash version would be:

	$ FOO=/usr/local/classes/physics101/data/spacetime/here/now
You can find out the name of your shell by typing:
	quark:twoods% echo $0

The Search Path

When you type a command into the command-line which is not recognized as a built-in shell command, the shell tries to find the command token as a file somewhere on the Unix system. It looks in directories specified by the csh variable path or the sh variable PATH. If you type in a command that you think should run, but the shell complains that the command wasn't found, echo out your shell path variable. Remember to put a '$' in front of it.
	quark:twoods% echo $path	# sh:  $PATH
(Note: you needn't include the '#' and the text after it, but if you do the shell will ignore it. The '#' character is the "comment character", and the shell discards it and anything after it on each line. Use lots of comments when you write scripts. Without comments it will be easy to get lost when you want to change or add to a script over time.)

You should see a list of absolute directory pathnames separated by spaces ([t]csh) or colons (ksh/bash). If you know that a program is in a directory not included in your search path, you can specify its path, either absolute or relative to your cwd, as you type the program name on the command line.

Alternatively you can add its directory to your search path. When you do so, you should always add directories to the end of your search path rather than to the beginning, so that the behavior of shell scripts won't change if the name of a standard utility is usurped by a program in a non-standard directory. This would constitute a possibly serious security hole. (Since the Unix O/S has been developed in a networked environment since its inception, the Unix development community has always been very aware of security issues.)
Enter:

	quark:twoods% set path = ( $path /newpath )	# csh
		or
	$ PATH=$PATH:/newpath				# sh

More on Variables

There are many other shell variables defined by the shell. You can see what they are by typing:
	quark:twoods% set
Similar to shell variables are environment variables. These are variables whose definitions are passed into the environment of every program that you invoke. The variable path/PATH is an environment variable as well as a shell variable. In csh environment variables are discrete from shell variables, and can be assigned different values from them. To see them, type:
	quark:twoods% setenv
To create and/or assign an environment variable, type:
	quark:twoods% setenv FOO bar
Notice that no '=' sign is used with setenv. Try this:
	quark:twoods% unset FOO		# Removes the FOO variable
	quark:twoods% setenv FOO bar	# Environment variable
	quark:twoods% echo $FOO	
	quark:twoods% set FOO=foo	# Shell variable
	quark:twoods% echo $FOO
	quark:twoods% tcsh		# We're invoking another shell!
	quark:twoods% echo $FOO		# The second shell executes this command
	quark:twoods% exit		# Shell #2 will now quit
	quark:twoods% echo $FOO		# ...back to the first shell...
The FOO shell variable is local to the first shell. You can see that the value of the FOO shell variable overrides the value of the FOO environment variable when both are set. The second shell inherits only the FOO environment variable.

When you change the value of an environment variable you change its value for programs invoked by the shell.

Things are a little different under sh. There, shell variables share duty as environment variables when they are explicitly "exported". They carry only one value:

	$ FOO=bar export FOO

Help!

You've done well to get this far! Here's how to get help.

One of the biggest aids to learning to use the shell is the manual page system. Over time the traditional Unix developers have been religious about documenting the operation of Unix. Just about any utility or functionality would have a Manual Page written explaining it. The beauty of this is that these manual pages are commonly kept on-line, readily accessible by the man utility. Each time you learn a new command, you should take the time to read from its man page. You probably don't need to read the whole thing, but if it's a good man page, whenever you have a question, the answer will be there, including usage examples and cross-references to man pages of related subjects.

	%quark:twoods% man man		# Read about the man utility
If you're on Linux, you may not be impressed. Since Linux has been written from scratch, and there are [expensive] licenses on the original Unix man pages, there hasn't been enough time to develop a full set of manual pages (for free). Furthermore, the GNU Foundation has adopted a parallel documentation standard, relying on the utility "info", which has acted to fragment documentation efforts.
	%quark:twoods% info info	# Read about the info utility
At present, Linux documentation must be termed "spotty". If you're logged into a Linux machine you may, then, want to open a ssh to quark, a Solaris installation, in order to benefit from its more extensive man page support. You'll want to look especially at the man (or info) page for the shell you're running. For example, type man tcsh, or man bash.

Here are some utilities you'll want to learn about right away:

Manipulating Files and Directories:

	ls		# Report the contents of a directory
	mv		# Move or rename a file
	cp		# Copy a file
	ln		# Create another directory entry for a file
	rm		# Remove a directory listing
	mkdir		# Create a new directory
	rmdir		# Remove an empty directory
	mknod		# Create a device node
	chown		# Change the owner of a file or directory
	chgrp		# Change the group of a file or directory
	chmod		# Change the permissions of a file or directory

Looking into Files:

	cat		# Report the contents of a file
	touch		# Update the date attribute of a file
	tee		# Write data to a file and to stdout
	less		# Scroll through a file
	grep		# Find a character pattern in a file
	find		# Locate selective listings in a directory hierarchy
	head		# Report the first several lines of a file
	diff		# Report differences between text files
	cmp		# Report differences between non-text files
	sort		# Sort the lines in a file
	awk		# Manipulate text
	sed		# Edit text in a text stream

Around the System:

	uname		# Report the system identification information
	kpasswd		# Change your password (Kerberos)
	passwd		# Change your password (non-Kerberos)
	who		# Report users currently logged into the system
	id		# Report your user and group ids
	df		# Report the amount of free disk space
	ps		# Report the processes currently running
	lpr		# Print to a printer
	talk		# "Talk" to other users (the original instant messaging)
	wall		# Write a message to all users currently logged in
	tar		# Create a "tar" archive of files
	cpio		# Create a "cpio" archive of files
	stty		# Configure keystrokes

You'll want to choose a text file editor such as pico, vi, or emacs and become comfortable with it. Pico is probably the simplest to master, and resembles a mouse-driven editor, using directional keys instead. Unix's tradition vi (visual) editor is often preferred by facile typists. It features a powerful single-character command set and leaves your hands in typing position. An updated version with more features, vim is often available. Emacs is also powerful, its feature set continuing to grow without bounds.

The manual page for your shell is pretty long, and documents all of the features of the shell. If it were shorter, this tutorial would likely not be needed. You should, however, spend time with it on a regular basis. Each time you do, the Unix system will become even more powerful in your hands.

Permissions

Part of the security strategy of Unix systems is its file permissions feature, having to do with who gets to do what with which files and directories. Each file (or directory) has an owner and a group associated with it. Only the file's owner, and the root user, can change the permission attributes on the file. Permissions are divided into three sets:
  • Owner permissions;
  • Group permissions;
  • Permissions for all others;
Each set carries permissions for reading, writing, and executing, either enabled or disabled. You can see the permission attributes of a file or directory by using the -l option to the ls command:
	quark:twoods% ls -l /etc/passwd
	-rw-r--r--  1 root  daemon 1252 Jun 21 13:48 /etc/passwd
	quark:twoods% ls -ld /etc
	drwxr-xr-x  18 root  wheel  2048 Jul 29 09:28 /etc
The first field shows the permissions attributes. The third field names the file's owner, and the fourth field identifies its group. The first character of the permissions field is the type of the entry. This may be '-' for a file, 'd' for a directory, 'p' for a named pipe, or 'c' or 'b' for a device.

Starting with the second character of the permissions field, the owner read, write, and execute permissions are indicated by the respective letters r, w, and x. A dash signifies that the permission is withheld. The next three characters give the permissions for users who are members of the file's group, and the final three characters apply to all other users (the world). We see that everyone can read the /etc/passwd file, but only the root user can write to it and no one can execute the file. This is not surprising, since /etc/passwd is not a program. (However, the chmod program can grant execute permissions for any file, regardless of whether it is a bona-fide program or not.)

A directory's permission attributes are identical, with the distinction that execute permissions on a directory refer to the ability to access its entries. The directory must also be readable to be able to list them out. To add or remove a listing, the directory must be writable.

If a command writes an error message such as "Can't read file" or "Can't execute", take a look at the permissions with ls -l. Refer to the man page for chmod and the entry forumask in your shell documentation.

Job Control

You already know Unix is multi-user. It's also described as multi-tasking. It has the capacity to run many programs simultaneously. (Well, you probably knew that too.) The shell has commands allowing you to manage multiple tasks. These tasks are called processes, and their management is called job control. Ordinarily, when you invoke a command at the command-line, the command runs until it's completed, then the shell gives you another command prompt. You can see which processes you're running with the command ps:
	quark:twoods% ps

When you type in a command and end the command line with a &, the job runs "in the background", while the shell prints another command prompt. The program does not see input from your keyboard, although its output does appear on your display (unless you've redirected it). In this way, you can run a large number of lengthy processes at once.

Shells other than sh (the Bourne shell), have additional job management capabilities. First, you can suspend a process running in the foreground by pressing CTRL-Z. Each such job is associated with a job identifier number. The two most recently added jobs are also identified with a '+' and a '-'. The following shell commands give you control over suspended or background jobs:

	quark:twoods% jobs		# List background and suspended jobs
	quark:twoods% fg		# Run a listed job in the foreground
	quark:twoods% bg		# Run a listed job in the background
(There's no "stop" command for a background job -- you must bring it to the foreground to stop it.)
By default, the fg and bg commands operate on the last job added to the list ('+'), but you can specify another job with a % plus either its job id, '+', or '-'.
Again, refer to the man page for your shell for the full treatment.

Pipes

Actually, pipes are just the most prominent of several different mechanisms of redirection. You already know that programs typically write messages to your display. You probably also know that programs can open files, and read from them and/or write to them. One of Unix's characteristics is that everything looks like a file to a program. This includes your keyboard and display, as well as a CD-ROM drive and other storage devices.

When a Unix program is invoked, the shell provides the program handles to your keyboard and display as the standard input and standard output (stdin and stdout for short). A third handle, standard error (stderr) is also attached to your display.

Part of shell syntax provides for specifying other "files" as stdin, stdout, and stderr. For instance, you might write program called write_notice. You would invoke it from the command line, then type someone's name in at the keyboard, and it would print out your message to stdout. You can also redirect its stdin and stdout so that it will read from and write to files:

quark:twoods% write_notice < namefile > msgfile 2>errfile
Your program might want to print out an error message if it can't find the name in a database; here, that message would end up in errfile. In the above example, anything already in msgfile or errfile would be overwritten. If you want your program to add to the end of a file instead of beginning it all over again, use ">>":
quark:twoods% write_notice < namefile1 > msgfile 2>errfile
quark:twoods% write_notice < namefile2 >> msgfile 2>errfile
You can even use > to create or truncate a file without a command:
quark:twoods% > msgfile

Another way to redirect output is to capture it on the command line. The syntax uses backticks, '`', to enfold a command. The command within the backticks is replaced on the command line with the output from the command:

quark:twoods% echo "My message to `cat namefile` is `write_notice < namefile `"
Ksh and Bash (but not sh) have another way to do the same thing, using the format "$( command )".

You can coordinate the operation of several programs by using the output of one program as the input for another by redirecting outputs and inputs to/from files:

	th123-13:twoods% myprog1 > ~/tmp/tmpfile1
	th123-13:twoods% myprog2 < ~/tmp/tmpfile1 > ~/tmp/tmpfile2
	th123-13:twoods% myprog3 < ~/tmp/tmpfile2 > ~/tmp/results
Great, right? But, wait, I haven't even told you about pipes yet! Remember, that's what this section is about?

In the above example, you have to wait until myprog1 is through running before you start myprog2. (If you were to put myprog1 into the background, you could start myprog2 right away, but since tmpfile1 isn't complete yet, myprog2 would reach the end of the file and quit before most of the data arrived.)

The pipe allows myprog1, myprog2, and myprog3 to run simultaneously. It uses the pipe symbol "|", and is used like this:

	th123-13:twoods% myprog1 | myprog2 |	# you can continue on the next line after a pipe
	myprog3 > ~/tmp/results
Now as soon as myprog1 writes something to its standard output, it goes directly to the standard input of myprog2, who can read it, perform its own magic, and write out to myprog3 to read. There's no practical limit to the number of procedures you can string together in this way. When myprog1 finishes, myprog2 will read and End Of File, which lets it know the game's over, and eventually the last process will get the message too. Of course you can add a & to the end of the line to put everything in the background, while you continue doing other work.

One other tip: I listed a program called "tee" earlier. This program lets you monitor what's going on while still capturing data to a file. While the following program runs:

	th123-13:twoods% myprog1 | tee ~/tmp/results
you can see myprog1's output, and it will also be captured in the file ~/tmp/results.

Once again, my favorite litany, read from the shell man page for full understanding of the features of redirection.

Programming

Are you ready to become a shell programmer now? There are just a few more items to master, then you'll be an expert. The first item is trivial. When the shell is passed the name of an executable file as a command, it tries to run it. There are three possibilities:
  • It's a binary file, compiled into machine code that will run on the cpu. The shell will call the system loader to load and run the program.
  • It's a script, containing shell commands, or perhaps perl or java code. The shell will call invoke a shell (or perl or java, etc.) to read the script and execute its instructions.
  • It's not a real program at all, and won't run. The shell will invoke another shell to read and execute it, which will barf on the input, complain, and quit.
For the second option, the shell needs to know what kind of script it is. If you write a script in tcsh syntax, someone else who is running bash may come along and try to run it. Because of the different syntaxes, bash will blow up on the script.

You can provide for this by beginning the file with a special comment line. If the first line of the file begins with the characters "#!" followed by a scripting program, such as sh, ksh, perl, java, that program will be invoked to run the script. An absolute pathname is preferred:

	#!	/bin/sh
	
	echo hello world

Now you're really ready to program. I could write another sheef of pages here going into it, but it's really all right there in your shell man page, and besides, I'm out of time. So, go for it!

One last tip:

There are hundreds of shell scripts waiting for your perusal, right here on your system. The utility file will identify which programs are binary programs (compiled from C source or other) and which are [probably] shell scripts. Just start reading them. Some are trivial, such as Some are simple but extremely elegant. Contents
© 2004 by Jas Cluff - All Rights Reserved
webwiz@stars.sfsu.edu