| 调试的难度是编写代码的两倍。因此,如果你尽可能聪明地编写代码,那么根据定义,你就不够聪明来调试它。 --Brian Kernighan | 
Bash shell 不包含内置的调试器,只有最基本的调试特定命令和结构。脚本中的语法错误或明显的拼写错误会生成难以理解的错误消息,这些消息通常对调试非功能脚本没有帮助。
例 32-1. 一个有错误的脚本
| #!/bin/bash # ex74.sh # This is a buggy script. # Where, oh where is the error? a=37 if [$a -gt 27 ] then echo $a fi exit $? # 0! Why? | 
脚本输出
| ./ex74.sh: [37: command not found | 
例 32-2. 缺少 关键字
| #!/bin/bash # missing-keyword.sh # What error message will this script generate? And why? for a in 1 2 3 do echo "$a" # done # Required keyword 'done' commented out in line 8. exit 0 # Will not exit here! # === # # From command line, after script terminates: echo $? # 2 | 
脚本输出
| missing-keyword.sh: line 10: syntax error: unexpected end of file | 
在报告语法错误的行号时,错误消息可能会忽略脚本中的注释行。
如果脚本执行,但未按预期工作怎么办?这是非常常见的逻辑错误。
例 32-3. test24:另一个有错误的脚本
| #!/bin/bash # This script is supposed to delete all filenames in current directory #+ containing embedded spaces. # It doesn't work. # Why not? badname=`ls | grep ' '` # Try this: # echo "$badname" rm "$badname" exit 0 | 
尝试通过取消注释 例 32-3 中的echo "$badname"行来找出问题所在。Echo 语句对于查看你期望的结果是否与你实际得到的结果一致非常有用。
在这种特定情况下,rm "$badname"将不会给出期望的结果,因为$badname不应该被引号括起来。将其放在引号中可确保 rm 只有一个参数(它将仅匹配一个文件名)。一个部分修复方法是从$badname中删除引号,并重置$IFS以仅包含换行符,IFS=$'\n'。但是,还有更简单的方法可以解决这个问题。
| # Correct methods of deleting filenames containing spaces. rm *\ * rm *" "* rm *' '* # Thank you. S.C. | 
总结有错误脚本的症状,
它会因 “语法错误” 消息而崩溃,或者
它运行,但未按预期工作(逻辑错误)。
它运行,按预期工作,但具有不良副作用(逻辑炸弹)。
用于调试非工作脚本的工具包括
在脚本中的关键点插入 echo 语句以跟踪变量,并提供正在发生情况的快照。
|  | 更好的是,只有当 debug 开启时才回显的 echo。 
 | 
使用 tee 过滤器来检查关键点的进程或数据流。
设置选项标志-n -v -x
sh -n 脚本名检查语法错误,而无需实际运行脚本。这等效于插入set -n或set -o noexec到脚本中。请注意,某些类型的语法错误可能会绕过此检查。
sh -v 脚本名在执行每个命令之前回显它。这等效于插入set -v或set -o verbose在脚本中。
的-n和-v标志一起使用效果很好。sh -nv 脚本名给出详细的语法检查。
sh -x 脚本名以缩写方式回显每个命令的结果。这等效于插入set -x或set -o xtrace在脚本中。
插入set -u或set -o nounset到脚本中运行它,但会给出 未绑定的变量 错误消息并中止脚本。
| set -u # Or set -o nounset # Setting a variable to null will not trigger the error/abort. # unset_var= echo $unset_var # Unset (and undeclared) variable. echo "Should not echo!" # sh t2.sh # t2.sh: line 6: unset_var: unbound variable | 
使用 “断言” 函数来测试脚本中关键点的变量或条件。(这是一个从 C 语言借鉴的想法。)
例 32-4. 使用 assert 测试条件
| #!/bin/bash
# assert.sh
#######################################################################
assert ()                 #  If condition false,
{                         #+ exit from script
                          #+ with appropriate error message.
  E_PARAM_ERR=98
  E_ASSERT_FAILED=99
  if [ -z "$2" ]          #  Not enough parameters passed
  then                    #+ to assert() function.
    return $E_PARAM_ERR   #  No damage done.
  fi
  lineno=$2
  if [ ! $1 ] 
  then
    echo "Assertion failed:  \"$1\""
    echo "File \"$0\", line $lineno"    # Give name of file and line number.
    exit $E_ASSERT_FAILED
  # else
  #   return
  #   and continue executing the script.
  fi  
} # Insert a similar assert() function into a script you need to debug.    
#######################################################################
a=5
b=4
condition="$a -lt $b"     #  Error message and exit from script.
                          #  Try setting "condition" to something else
                          #+ and see what happens.
assert "$condition" $LINENO
# The remainder of the script executes only if the "assert" does not fail.
# Some commands.
# Some more commands . . .
echo "This statement echoes only if the \"assert\" does not fail."
# . . .
# More commands . . .
exit $? | 
脚本中的 exit 命令触发信号 0,终止进程,即脚本本身。[1] 捕获 exit 通常很有用,例如,强制 “打印输出” 变量。 trap 必须是脚本中的第一个命令。
指定接收到信号时的操作;也对调试有用。
| 信号 是发送给进程的消息,由内核或其他进程发送,告诉它采取一些指定的动作(通常是终止)。例如,按下 Control-C 会向正在运行的程序发送用户中断,即 INT 信号。 | 
一个简单的例子
| trap '' 2 # Ignore interrupt 2 (Control-C), with no action specified. trap 'echo "Control-C disabled."' 2 # Message when Control-C pressed. | 
例 32-5. 在退出时捕获
| #!/bin/bash # Hunting variables with a trap. trap 'echo Variable Listing --- a = $a b = $b' EXIT # EXIT is the name of the signal generated upon exit from a script. # # The command specified by the "trap" doesn't execute until #+ the appropriate signal is sent. echo "This prints before the \"trap\" --" echo "even though the script sees the \"trap\" first." echo a=39 b=36 exit 0 # Note that commenting out the 'exit' command makes no difference, #+ since the script exits in any case after running out of commands. | 
例 32-6. 在 Control-C 之后清理
| #!/bin/bash
# logon.sh: A quick 'n dirty script to check whether you are on-line yet.
umask 177  # Make sure temp files are not world readable.
TRUE=1
LOGFILE=/var/log/messages
#  Note that $LOGFILE must be readable
#+ (as root, chmod 644 /var/log/messages).
TEMPFILE=temp.$$
#  Create a "unique" temp file name, using process id of the script.
#     Using 'mktemp' is an alternative.
#     For example:
#     TEMPFILE=`mktemp temp.XXXXXX`
KEYWORD=address
#  At logon, the line "remote IP address xxx.xxx.xxx.xxx"
#                      appended to /var/log/messages.
ONLINE=22
USER_INTERRUPT=13
CHECK_LINES=100
#  How many lines in log file to check.
trap 'rm -f $TEMPFILE; exit $USER_INTERRUPT' TERM INT
#  Cleans up the temp file if script interrupted by control-c.
echo
while [ $TRUE ]  #Endless loop.
do
  tail -n $CHECK_LINES $LOGFILE> $TEMPFILE
  #  Saves last 100 lines of system log file as temp file.
  #  Necessary, since newer kernels generate many log messages at log on.
  search=`grep $KEYWORD $TEMPFILE`
  #  Checks for presence of the "IP address" phrase,
  #+ indicating a successful logon.
  if [ ! -z "$search" ] #  Quotes necessary because of possible spaces.
  then
     echo "On-line"
     rm -f $TEMPFILE    #  Clean up temp file.
     exit $ONLINE
  else
     echo -n "."        #  The -n option to echo suppresses newline,
                        #+ so you get continuous rows of dots.
  fi
  sleep 1  
done  
#  Note: if you change the KEYWORD variable to "Exit",
#+ this script can be used while on-line
#+ to check for an unexpected logoff.
# Exercise: Change the script, per the above note,
#           and prettify it.
exit 0
# Nick Drage suggests an alternate method:
while true
  do ifconfig ppp0 | grep UP 1> /dev/null && echo "connected" && exit 0
  echo -n "."   # Prints dots (.....) until connected.
  sleep 2
done
# Problem: Hitting Control-C to terminate this process may be insufficient.
#+         (Dots may keep on echoing.)
# Exercise: Fix this.
# Stephane Chazelas has yet another alternative:
CHECK_INTERVAL=1
while ! tail -n 1 "$LOGFILE" | grep -q "$KEYWORD"
do echo -n .
   sleep $CHECK_INTERVAL
done
echo "On-line"
# Exercise: Discuss the relative strengths and weaknesses
#           of each of these various approaches. | 
例 32-7. 进度条的简单实现
| #! /bin/bash
# progress-bar2.sh
# Author: Graham Ewart (with reformatting by ABS Guide author).
# Used in ABS Guide with permission (thanks!).
# Invoke this script with bash. It doesn't work with sh.
interval=1
long_interval=10
{
     trap "exit" SIGUSR1
     sleep $interval; sleep $interval
     while true
     do
       echo -n '.'     # Use dots.
       sleep $interval
     done; } &         # Start a progress bar as a background process.
pid=$!
trap "echo !; kill -USR1 $pid; wait $pid"  EXIT        # To handle ^C.
echo -n 'Long-running process '
sleep $long_interval
echo ' Finished!'
kill -USR1 $pid
wait $pid              # Stop the progress bar.
trap EXIT
exit $? | 
当然,trap 命令除了调试之外还有其他用途,例如禁用脚本中的某些击键(请参阅 例 A-43)。
例 32-9. 运行多个进程(在 SMP 机器上)
| #!/bin/bash
# parent.sh
# Running multiple processes on an SMP box.
# Author: Tedman Eng
#  This is the first of two scripts,
#+ both of which must be present in the current working directory.
LIMIT=$1         # Total number of process to start
NUMPROC=4        # Number of concurrent threads (forks?)
PROCID=1         # Starting Process ID
echo "My PID is $$"
function start_thread() {
        if [ $PROCID -le $LIMIT ] ; then
                ./child.sh $PROCID&
                let "PROCID++"
        else
           echo "Limit reached."
           wait
           exit
        fi
}
while [ "$NUMPROC" -gt 0 ]; do
        start_thread;
        let "NUMPROC--"
done
while true
do
trap "start_thread" SIGRTMIN
done
exit 0
# ======== Second script follows ========
#!/bin/bash
# child.sh
# Running multiple processes on an SMP box.
# This script is called by parent.sh.
# Author: Tedman Eng
temp=$RANDOM
index=$1
shift
let "temp %= 5"
let "temp += 4"
echo "Starting $index  Time:$temp" "$@"
sleep ${temp}
echo "Ending $index"
kill -s SIGRTMIN $PPID
exit 0
# ======================= SCRIPT AUTHOR'S NOTES ======================= #
#  It's not completely bug free.
#  I ran it with limit = 500 and after the first few hundred iterations,
#+ one of the concurrent threads disappeared!
#  Not sure if this is collisions from trap signals or something else.
#  Once the trap is received, there's a brief moment while executing the
#+ trap handler but before the next trap is set.  During this time, it may
#+ be possible to miss a trap signal, thus miss spawning a child process.
#  No doubt someone may spot the bug and will be writing 
#+ . . . in the future.
# ===================================================================== #
# ----------------------------------------------------------------------#
#################################################################
# The following is the original script written by Vernia Damiano.
# Unfortunately, it doesn't work properly.
#################################################################
#!/bin/bash
#  Must call script with at least one integer parameter
#+ (number of concurrent processes).
#  All other parameters are passed through to the processes started.
INDICE=8        # Total number of process to start
TEMPO=5         # Maximum sleep time per process
E_BADARGS=65    # No arg(s) passed to script.
if [ $# -eq 0 ] # Check for at least one argument passed to script.
then
  echo "Usage: `basename $0` number_of_processes [passed params]"
  exit $E_BADARGS
fi
NUMPROC=$1              # Number of concurrent process
shift
PARAMETRI=( "$@" )      # Parameters of each process
function avvia() {
         local temp
         local index
         temp=$RANDOM
         index=$1
         shift
         let "temp %= $TEMPO"
         let "temp += 1"
         echo "Starting $index Time:$temp" "$@"
         sleep ${temp}
         echo "Ending $index"
         kill -s SIGRTMIN $$
}
function parti() {
         if [ $INDICE -gt 0 ] ; then
              avvia $INDICE "${PARAMETRI[@]}" &
                let "INDICE--"
         else
                trap : SIGRTMIN
         fi
}
trap parti SIGRTMIN
while [ "$NUMPROC" -gt 0 ]; do
         parti;
         let "NUMPROC--"
done
wait
trap - SIGRTMIN
exit $?
: <<SCRIPT_AUTHOR_COMMENTS
I had the need to run a program, with specified options, on a number of
different files, using a SMP machine. So I thought [I'd] keep running
a specified number of processes and start a new one each time . . . one
of these terminates.
The "wait" instruction does not help, since it waits for a given process
or *all* process started in background. So I wrote [this] bash script
that can do the job, using the "trap" instruction.
  --Vernia Damiano
SCRIPT_AUTHOR_COMMENTS | 
|  | trap '' SIGNAL(两个相邻的单引号)禁用脚本其余部分的 SIGNAL。trap SIGNAL再次恢复 SIGNAL 的功能。这对于保护脚本的关键部分免受不良中断非常有用。 | 
| trap '' 2 # Signal 2 is Control-C, now disabled. command command command trap 2 # Reenables Control-C | 
| [1] | 按照惯例,信号 0被分配给 exit。 |