调试的难度是编写代码的两倍。因此,如果你尽可能聪明地编写代码,那么根据定义,你就不够聪明来调试它。 --Brian Kernighan |
Bash shell 不包含内置的调试器,只有最基本的调试特定命令和结构。脚本中的语法错误或明显的拼写错误会生成难以理解的错误消息,这些消息通常对调试非功能脚本没有帮助。
例 32-1. 一个有错误的脚本
#!/bin/bash # ex74.sh # This is a buggy script. # Where, oh where is the error? a=37 if [$a -gt 27 ] then echo $a fi exit $? # 0! Why? |
脚本输出
./ex74.sh: [37: command not found |
例 32-2. 缺少 关键字
#!/bin/bash # missing-keyword.sh # What error message will this script generate? And why? for a in 1 2 3 do echo "$a" # done # Required keyword 'done' commented out in line 8. exit 0 # Will not exit here! # === # # From command line, after script terminates: echo $? # 2 |
脚本输出
missing-keyword.sh: line 10: syntax error: unexpected end of file |
在报告语法错误的行号时,错误消息可能会忽略脚本中的注释行。
如果脚本执行,但未按预期工作怎么办?这是非常常见的逻辑错误。
例 32-3. test24:另一个有错误的脚本
#!/bin/bash # This script is supposed to delete all filenames in current directory #+ containing embedded spaces. # It doesn't work. # Why not? badname=`ls | grep ' '` # Try this: # echo "$badname" rm "$badname" exit 0 |
尝试通过取消注释 例 32-3 中的echo "$badname"行来找出问题所在。Echo 语句对于查看你期望的结果是否与你实际得到的结果一致非常有用。
在这种特定情况下,rm "$badname"将不会给出期望的结果,因为$badname不应该被引号括起来。将其放在引号中可确保 rm 只有一个参数(它将仅匹配一个文件名)。一个部分修复方法是从$badname中删除引号,并重置$IFS以仅包含换行符,IFS=$'\n'。但是,还有更简单的方法可以解决这个问题。
# Correct methods of deleting filenames containing spaces. rm *\ * rm *" "* rm *' '* # Thank you. S.C. |
总结有错误脚本的症状,
它会因 “语法错误” 消息而崩溃,或者
它运行,但未按预期工作(逻辑错误)。
它运行,按预期工作,但具有不良副作用(逻辑炸弹)。
用于调试非工作脚本的工具包括
在脚本中的关键点插入 echo 语句以跟踪变量,并提供正在发生情况的快照。
![]() | 更好的是,只有当 debug 开启时才回显的 echo。
|
使用 tee 过滤器来检查关键点的进程或数据流。
设置选项标志-n -v -x
sh -n 脚本名检查语法错误,而无需实际运行脚本。这等效于插入set -n或set -o noexec到脚本中。请注意,某些类型的语法错误可能会绕过此检查。
sh -v 脚本名在执行每个命令之前回显它。这等效于插入set -v或set -o verbose在脚本中。
的-n和-v标志一起使用效果很好。sh -nv 脚本名给出详细的语法检查。
sh -x 脚本名以缩写方式回显每个命令的结果。这等效于插入set -x或set -o xtrace在脚本中。
插入set -u或set -o nounset到脚本中运行它,但会给出 未绑定的变量 错误消息并中止脚本。
set -u # Or set -o nounset # Setting a variable to null will not trigger the error/abort. # unset_var= echo $unset_var # Unset (and undeclared) variable. echo "Should not echo!" # sh t2.sh # t2.sh: line 6: unset_var: unbound variable |
使用 “断言” 函数来测试脚本中关键点的变量或条件。(这是一个从 C 语言借鉴的想法。)
例 32-4. 使用 assert 测试条件
#!/bin/bash # assert.sh ####################################################################### assert () # If condition false, { #+ exit from script #+ with appropriate error message. E_PARAM_ERR=98 E_ASSERT_FAILED=99 if [ -z "$2" ] # Not enough parameters passed then #+ to assert() function. return $E_PARAM_ERR # No damage done. fi lineno=$2 if [ ! $1 ] then echo "Assertion failed: \"$1\"" echo "File \"$0\", line $lineno" # Give name of file and line number. exit $E_ASSERT_FAILED # else # return # and continue executing the script. fi } # Insert a similar assert() function into a script you need to debug. ####################################################################### a=5 b=4 condition="$a -lt $b" # Error message and exit from script. # Try setting "condition" to something else #+ and see what happens. assert "$condition" $LINENO # The remainder of the script executes only if the "assert" does not fail. # Some commands. # Some more commands . . . echo "This statement echoes only if the \"assert\" does not fail." # . . . # More commands . . . exit $? |
脚本中的 exit 命令触发信号 0,终止进程,即脚本本身。[1] 捕获 exit 通常很有用,例如,强制 “打印输出” 变量。 trap 必须是脚本中的第一个命令。
指定接收到信号时的操作;也对调试有用。
信号 是发送给进程的消息,由内核或其他进程发送,告诉它采取一些指定的动作(通常是终止)。例如,按下 Control-C 会向正在运行的程序发送用户中断,即 INT 信号。 |
一个简单的例子
trap '' 2 # Ignore interrupt 2 (Control-C), with no action specified. trap 'echo "Control-C disabled."' 2 # Message when Control-C pressed. |
例 32-5. 在退出时捕获
#!/bin/bash # Hunting variables with a trap. trap 'echo Variable Listing --- a = $a b = $b' EXIT # EXIT is the name of the signal generated upon exit from a script. # # The command specified by the "trap" doesn't execute until #+ the appropriate signal is sent. echo "This prints before the \"trap\" --" echo "even though the script sees the \"trap\" first." echo a=39 b=36 exit 0 # Note that commenting out the 'exit' command makes no difference, #+ since the script exits in any case after running out of commands. |
例 32-6. 在 Control-C 之后清理
#!/bin/bash # logon.sh: A quick 'n dirty script to check whether you are on-line yet. umask 177 # Make sure temp files are not world readable. TRUE=1 LOGFILE=/var/log/messages # Note that $LOGFILE must be readable #+ (as root, chmod 644 /var/log/messages). TEMPFILE=temp.$$ # Create a "unique" temp file name, using process id of the script. # Using 'mktemp' is an alternative. # For example: # TEMPFILE=`mktemp temp.XXXXXX` KEYWORD=address # At logon, the line "remote IP address xxx.xxx.xxx.xxx" # appended to /var/log/messages. ONLINE=22 USER_INTERRUPT=13 CHECK_LINES=100 # How many lines in log file to check. trap 'rm -f $TEMPFILE; exit $USER_INTERRUPT' TERM INT # Cleans up the temp file if script interrupted by control-c. echo while [ $TRUE ] #Endless loop. do tail -n $CHECK_LINES $LOGFILE> $TEMPFILE # Saves last 100 lines of system log file as temp file. # Necessary, since newer kernels generate many log messages at log on. search=`grep $KEYWORD $TEMPFILE` # Checks for presence of the "IP address" phrase, #+ indicating a successful logon. if [ ! -z "$search" ] # Quotes necessary because of possible spaces. then echo "On-line" rm -f $TEMPFILE # Clean up temp file. exit $ONLINE else echo -n "." # The -n option to echo suppresses newline, #+ so you get continuous rows of dots. fi sleep 1 done # Note: if you change the KEYWORD variable to "Exit", #+ this script can be used while on-line #+ to check for an unexpected logoff. # Exercise: Change the script, per the above note, # and prettify it. exit 0 # Nick Drage suggests an alternate method: while true do ifconfig ppp0 | grep UP 1> /dev/null && echo "connected" && exit 0 echo -n "." # Prints dots (.....) until connected. sleep 2 done # Problem: Hitting Control-C to terminate this process may be insufficient. #+ (Dots may keep on echoing.) # Exercise: Fix this. # Stephane Chazelas has yet another alternative: CHECK_INTERVAL=1 while ! tail -n 1 "$LOGFILE" | grep -q "$KEYWORD" do echo -n . sleep $CHECK_INTERVAL done echo "On-line" # Exercise: Discuss the relative strengths and weaknesses # of each of these various approaches. |
例 32-7. 进度条的简单实现
#! /bin/bash # progress-bar2.sh # Author: Graham Ewart (with reformatting by ABS Guide author). # Used in ABS Guide with permission (thanks!). # Invoke this script with bash. It doesn't work with sh. interval=1 long_interval=10 { trap "exit" SIGUSR1 sleep $interval; sleep $interval while true do echo -n '.' # Use dots. sleep $interval done; } & # Start a progress bar as a background process. pid=$! trap "echo !; kill -USR1 $pid; wait $pid" EXIT # To handle ^C. echo -n 'Long-running process ' sleep $long_interval echo ' Finished!' kill -USR1 $pid wait $pid # Stop the progress bar. trap EXIT exit $? |
当然,trap 命令除了调试之外还有其他用途,例如禁用脚本中的某些击键(请参阅 例 A-43)。
例 32-9. 运行多个进程(在 SMP 机器上)
#!/bin/bash # parent.sh # Running multiple processes on an SMP box. # Author: Tedman Eng # This is the first of two scripts, #+ both of which must be present in the current working directory. LIMIT=$1 # Total number of process to start NUMPROC=4 # Number of concurrent threads (forks?) PROCID=1 # Starting Process ID echo "My PID is $$" function start_thread() { if [ $PROCID -le $LIMIT ] ; then ./child.sh $PROCID& let "PROCID++" else echo "Limit reached." wait exit fi } while [ "$NUMPROC" -gt 0 ]; do start_thread; let "NUMPROC--" done while true do trap "start_thread" SIGRTMIN done exit 0 # ======== Second script follows ======== #!/bin/bash # child.sh # Running multiple processes on an SMP box. # This script is called by parent.sh. # Author: Tedman Eng temp=$RANDOM index=$1 shift let "temp %= 5" let "temp += 4" echo "Starting $index Time:$temp" "$@" sleep ${temp} echo "Ending $index" kill -s SIGRTMIN $PPID exit 0 # ======================= SCRIPT AUTHOR'S NOTES ======================= # # It's not completely bug free. # I ran it with limit = 500 and after the first few hundred iterations, #+ one of the concurrent threads disappeared! # Not sure if this is collisions from trap signals or something else. # Once the trap is received, there's a brief moment while executing the #+ trap handler but before the next trap is set. During this time, it may #+ be possible to miss a trap signal, thus miss spawning a child process. # No doubt someone may spot the bug and will be writing #+ . . . in the future. # ===================================================================== # # ----------------------------------------------------------------------# ################################################################# # The following is the original script written by Vernia Damiano. # Unfortunately, it doesn't work properly. ################################################################# #!/bin/bash # Must call script with at least one integer parameter #+ (number of concurrent processes). # All other parameters are passed through to the processes started. INDICE=8 # Total number of process to start TEMPO=5 # Maximum sleep time per process E_BADARGS=65 # No arg(s) passed to script. if [ $# -eq 0 ] # Check for at least one argument passed to script. then echo "Usage: `basename $0` number_of_processes [passed params]" exit $E_BADARGS fi NUMPROC=$1 # Number of concurrent process shift PARAMETRI=( "$@" ) # Parameters of each process function avvia() { local temp local index temp=$RANDOM index=$1 shift let "temp %= $TEMPO" let "temp += 1" echo "Starting $index Time:$temp" "$@" sleep ${temp} echo "Ending $index" kill -s SIGRTMIN $$ } function parti() { if [ $INDICE -gt 0 ] ; then avvia $INDICE "${PARAMETRI[@]}" & let "INDICE--" else trap : SIGRTMIN fi } trap parti SIGRTMIN while [ "$NUMPROC" -gt 0 ]; do parti; let "NUMPROC--" done wait trap - SIGRTMIN exit $? : <<SCRIPT_AUTHOR_COMMENTS I had the need to run a program, with specified options, on a number of different files, using a SMP machine. So I thought [I'd] keep running a specified number of processes and start a new one each time . . . one of these terminates. The "wait" instruction does not help, since it waits for a given process or *all* process started in background. So I wrote [this] bash script that can do the job, using the "trap" instruction. --Vernia Damiano SCRIPT_AUTHOR_COMMENTS |
![]() | trap '' SIGNAL(两个相邻的单引号)禁用脚本其余部分的 SIGNAL。trap SIGNAL再次恢复 SIGNAL 的功能。这对于保护脚本的关键部分免受不良中断非常有用。 |
trap '' 2 # Signal 2 is Control-C, now disabled. command command command trap 2 # Reenables Control-C |
[1] | 按照惯例,信号 0被分配给 exit。 |