第 32 章. 调试

 

调试的难度是编写代码的两倍。因此,如果你尽可能聪明地编写代码,那么根据定义,你就不够聪明来调试它。

--Brian Kernighan

Bash shell 不包含内置的调试器,只有最基本的调试特定命令和结构。脚本中的语法错误或明显的拼写错误会生成难以理解的错误消息,这些消息通常对调试非功能脚本没有帮助。

例 32-1. 一个有错误的脚本

#!/bin/bash
# ex74.sh

# This is a buggy script.
# Where, oh where is the error?

a=37

if [$a -gt 27 ]
then
  echo $a
fi  

exit $?   # 0! Why?

脚本输出

./ex74.sh: [37: command not found
以上脚本有什么问题?提示:在 if 之后。

例 32-2. 缺少 关键字

#!/bin/bash
# missing-keyword.sh
# What error message will this script generate? And why?

for a in 1 2 3
do
  echo "$a"
# done     # Required keyword 'done' commented out in line 8.

exit 0     # Will not exit here!

# === #

# From command line, after script terminates:
  echo $?    # 2

脚本输出

missing-keyword.sh: line 10: syntax error: unexpected end of file
	
请注意,错误消息不一定引用错误发生的行,而是 Bash 解释器最终意识到错误的行。

在报告语法错误的行号时,错误消息可能会忽略脚本中的注释行。

如果脚本执行,但未按预期工作怎么办?这是非常常见的逻辑错误。

例 32-3. test24:另一个有错误的脚本

#!/bin/bash

#  This script is supposed to delete all filenames in current directory
#+ containing embedded spaces.
#  It doesn't work.
#  Why not?


badname=`ls | grep ' '`

# Try this:
# echo "$badname"

rm "$badname"

exit 0

尝试通过取消注释 例 32-3 中的echo "$badname"行来找出问题所在。Echo 语句对于查看你期望的结果是否与你实际得到的结果一致非常有用。

在这种特定情况下,rm "$badname"将不会给出期望的结果,因为$badname不应该被引号括起来。将其放在引号中可确保 rm 只有一个参数(它将仅匹配一个文件名)。一个部分修复方法是从$badname中删除引号,并重置$IFS以仅包含换行符,IFS=$'\n'。但是,还有更简单的方法可以解决这个问题。

# Correct methods of deleting filenames containing spaces.
rm *\ *
rm *" "*
rm *' '*
# Thank you. S.C.

总结有错误脚本的症状,

  1. 它会因 语法错误 消息而崩溃,或者

  2. 它运行,但未按预期工作(逻辑错误)。

  3. 它运行,按预期工作,但具有不良副作用(逻辑炸弹)。

用于调试非工作脚本的工具包括

  1. 在脚本中的关键点插入 echo 语句以跟踪变量,并提供正在发生情况的快照。

    Tip

    更好的是,只有当 debug 开启时才回显的 echo

    ### debecho (debug-echo), by Stefano Falsetto ###
    ### Will echo passed parameters only if DEBUG is set to a value. ###
    debecho () {
      if [ ! -z "$DEBUG" ]; then
         echo "$1" >&2
         #         ^^^ to stderr
      fi
    }
    
    DEBUG=on
    Whatever=whatnot
    debecho $Whatever   # whatnot
    
    DEBUG=
    Whatever=notwhat
    debecho $Whatever   # (Will not echo.)

  2. 使用 tee 过滤器来检查关键点的进程或数据流。

  3. 设置选项标志-n -v -x

    sh -n 脚本名检查语法错误,而无需实际运行脚本。这等效于插入set -nset -o noexec到脚本中。请注意,某些类型的语法错误可能会绕过此检查。

    sh -v 脚本名在执行每个命令之前回显它。这等效于插入set -vset -o verbose在脚本中。

    -n-v标志一起使用效果很好。sh -nv 脚本名给出详细的语法检查。

    sh -x 脚本名以缩写方式回显每个命令的结果。这等效于插入set -xset -o xtrace在脚本中。

    插入set -uset -o nounset到脚本中运行它,但会给出 未绑定的变量 错误消息并中止脚本。

    set -u   # Or   set -o nounset
    
    # Setting a variable to null will not trigger the error/abort.
    # unset_var=
    
    echo $unset_var   # Unset (and undeclared) variable.
    
    echo "Should not echo!"
    
    # sh t2.sh
    # t2.sh: line 6: unset_var: unbound variable

  4. 使用 “断言” 函数来测试脚本中关键点的变量或条件。(这是一个从 C 语言借鉴的想法。)

    例 32-4. 使用 assert 测试条件

    #!/bin/bash
    # assert.sh
    
    #######################################################################
    assert ()                 #  If condition false,
    {                         #+ exit from script
                              #+ with appropriate error message.
      E_PARAM_ERR=98
      E_ASSERT_FAILED=99
    
    
      if [ -z "$2" ]          #  Not enough parameters passed
      then                    #+ to assert() function.
        return $E_PARAM_ERR   #  No damage done.
      fi
    
      lineno=$2
    
      if [ ! $1 ] 
      then
        echo "Assertion failed:  \"$1\""
        echo "File \"$0\", line $lineno"    # Give name of file and line number.
        exit $E_ASSERT_FAILED
      # else
      #   return
      #   and continue executing the script.
      fi  
    } # Insert a similar assert() function into a script you need to debug.    
    #######################################################################
    
    
    a=5
    b=4
    condition="$a -lt $b"     #  Error message and exit from script.
                              #  Try setting "condition" to something else
                              #+ and see what happens.
    
    assert "$condition" $LINENO
    # The remainder of the script executes only if the "assert" does not fail.
    
    
    # Some commands.
    # Some more commands . . .
    echo "This statement echoes only if the \"assert\" does not fail."
    # . . .
    # More commands . . .
    
    exit $?
  5. 使用 $LINENO 变量和 caller 内建命令。

  6. 在退出时捕获。

    脚本中的 exit 命令触发信号 0,终止进程,即脚本本身。[1] 捕获 exit 通常很有用,例如,强制 “打印输出” 变量。 trap 必须是脚本中的第一个命令。

捕获信号

trap

指定接收到信号时的操作;也对调试有用。

一个简单的例子

trap '' 2
# Ignore interrupt 2 (Control-C), with no action specified. 

trap 'echo "Control-C disabled."' 2
# Message when Control-C pressed.

例 32-5. 在退出时捕获

#!/bin/bash
# Hunting variables with a trap.

trap 'echo Variable Listing --- a = $a  b = $b' EXIT
#  EXIT is the name of the signal generated upon exit from a script.
#
#  The command specified by the "trap" doesn't execute until
#+ the appropriate signal is sent.

echo "This prints before the \"trap\" --"
echo "even though the script sees the \"trap\" first."
echo

a=39

b=36

exit 0
#  Note that commenting out the 'exit' command makes no difference,
#+ since the script exits in any case after running out of commands.

例 32-6. 在 Control-C 之后清理

#!/bin/bash
# logon.sh: A quick 'n dirty script to check whether you are on-line yet.

umask 177  # Make sure temp files are not world readable.


TRUE=1
LOGFILE=/var/log/messages
#  Note that $LOGFILE must be readable
#+ (as root, chmod 644 /var/log/messages).
TEMPFILE=temp.$$
#  Create a "unique" temp file name, using process id of the script.
#     Using 'mktemp' is an alternative.
#     For example:
#     TEMPFILE=`mktemp temp.XXXXXX`
KEYWORD=address
#  At logon, the line "remote IP address xxx.xxx.xxx.xxx"
#                      appended to /var/log/messages.
ONLINE=22
USER_INTERRUPT=13
CHECK_LINES=100
#  How many lines in log file to check.

trap 'rm -f $TEMPFILE; exit $USER_INTERRUPT' TERM INT
#  Cleans up the temp file if script interrupted by control-c.

echo

while [ $TRUE ]  #Endless loop.
do
  tail -n $CHECK_LINES $LOGFILE> $TEMPFILE
  #  Saves last 100 lines of system log file as temp file.
  #  Necessary, since newer kernels generate many log messages at log on.
  search=`grep $KEYWORD $TEMPFILE`
  #  Checks for presence of the "IP address" phrase,
  #+ indicating a successful logon.

  if [ ! -z "$search" ] #  Quotes necessary because of possible spaces.
  then
     echo "On-line"
     rm -f $TEMPFILE    #  Clean up temp file.
     exit $ONLINE
  else
     echo -n "."        #  The -n option to echo suppresses newline,
                        #+ so you get continuous rows of dots.
  fi

  sleep 1  
done  


#  Note: if you change the KEYWORD variable to "Exit",
#+ this script can be used while on-line
#+ to check for an unexpected logoff.

# Exercise: Change the script, per the above note,
#           and prettify it.

exit 0


# Nick Drage suggests an alternate method:

while true
  do ifconfig ppp0 | grep UP 1> /dev/null && echo "connected" && exit 0
  echo -n "."   # Prints dots (.....) until connected.
  sleep 2
done

# Problem: Hitting Control-C to terminate this process may be insufficient.
#+         (Dots may keep on echoing.)
# Exercise: Fix this.



# Stephane Chazelas has yet another alternative:

CHECK_INTERVAL=1

while ! tail -n 1 "$LOGFILE" | grep -q "$KEYWORD"
do echo -n .
   sleep $CHECK_INTERVAL
done
echo "On-line"

# Exercise: Discuss the relative strengths and weaknesses
#           of each of these various approaches.

例 32-7. 进度条的简单实现

#! /bin/bash
# progress-bar2.sh
# Author: Graham Ewart (with reformatting by ABS Guide author).
# Used in ABS Guide with permission (thanks!).

# Invoke this script with bash. It doesn't work with sh.

interval=1
long_interval=10

{
     trap "exit" SIGUSR1
     sleep $interval; sleep $interval
     while true
     do
       echo -n '.'     # Use dots.
       sleep $interval
     done; } &         # Start a progress bar as a background process.

pid=$!
trap "echo !; kill -USR1 $pid; wait $pid"  EXIT        # To handle ^C.

echo -n 'Long-running process '
sleep $long_interval
echo ' Finished!'

kill -USR1 $pid
wait $pid              # Stop the progress bar.
trap EXIT

exit $?

Note

DEBUGtrap 的参数导致在脚本中的每个命令之后执行指定的动作。这允许跟踪变量,例如。

例 32-8. 跟踪变量

#!/bin/bash

trap 'echo "VARIABLE-TRACE> \$variable = \"$variable\""' DEBUG
# Echoes the value of $variable after every command.

variable=29; line=$LINENO

echo "  Just initialized \$variable to $variable in line number $line."

let "variable *= 3"; line=$LINENO
echo "  Just multiplied \$variable by 3 in line number $line."

exit 0

#  The "trap 'command1 . . . command2 . . .' DEBUG" construct is
#+ more appropriate in the context of a complex script,
#+ where inserting multiple "echo $variable" statements might be
#+ awkward and time-consuming.

# Thanks, Stephane Chazelas for the pointer.


Output of script:

VARIABLE-TRACE> $variable = ""
VARIABLE-TRACE> $variable = "29"
  Just initialized $variable to 29.
VARIABLE-TRACE> $variable = "29"
VARIABLE-TRACE> $variable = "87"
  Just multiplied $variable by 3.
VARIABLE-TRACE> $variable = "87"

当然,trap 命令除了调试之外还有其他用途,例如禁用脚本中的某些击键(请参阅 例 A-43)。

例 32-9. 运行多个进程(在 SMP 机器上)

#!/bin/bash
# parent.sh
# Running multiple processes on an SMP box.
# Author: Tedman Eng

#  This is the first of two scripts,
#+ both of which must be present in the current working directory.




LIMIT=$1         # Total number of process to start
NUMPROC=4        # Number of concurrent threads (forks?)
PROCID=1         # Starting Process ID
echo "My PID is $$"

function start_thread() {
        if [ $PROCID -le $LIMIT ] ; then
                ./child.sh $PROCID&
                let "PROCID++"
        else
           echo "Limit reached."
           wait
           exit
        fi
}

while [ "$NUMPROC" -gt 0 ]; do
        start_thread;
        let "NUMPROC--"
done


while true
do

trap "start_thread" SIGRTMIN

done

exit 0



# ======== Second script follows ========


#!/bin/bash
# child.sh
# Running multiple processes on an SMP box.
# This script is called by parent.sh.
# Author: Tedman Eng

temp=$RANDOM
index=$1
shift
let "temp %= 5"
let "temp += 4"
echo "Starting $index  Time:$temp" "$@"
sleep ${temp}
echo "Ending $index"
kill -s SIGRTMIN $PPID

exit 0


# ======================= SCRIPT AUTHOR'S NOTES ======================= #
#  It's not completely bug free.
#  I ran it with limit = 500 and after the first few hundred iterations,
#+ one of the concurrent threads disappeared!
#  Not sure if this is collisions from trap signals or something else.
#  Once the trap is received, there's a brief moment while executing the
#+ trap handler but before the next trap is set.  During this time, it may
#+ be possible to miss a trap signal, thus miss spawning a child process.

#  No doubt someone may spot the bug and will be writing 
#+ . . . in the future.



# ===================================================================== #



# ----------------------------------------------------------------------#



#################################################################
# The following is the original script written by Vernia Damiano.
# Unfortunately, it doesn't work properly.
#################################################################

#!/bin/bash

#  Must call script with at least one integer parameter
#+ (number of concurrent processes).
#  All other parameters are passed through to the processes started.


INDICE=8        # Total number of process to start
TEMPO=5         # Maximum sleep time per process
E_BADARGS=65    # No arg(s) passed to script.

if [ $# -eq 0 ] # Check for at least one argument passed to script.
then
  echo "Usage: `basename $0` number_of_processes [passed params]"
  exit $E_BADARGS
fi

NUMPROC=$1              # Number of concurrent process
shift
PARAMETRI=( "$@" )      # Parameters of each process

function avvia() {
         local temp
         local index
         temp=$RANDOM
         index=$1
         shift
         let "temp %= $TEMPO"
         let "temp += 1"
         echo "Starting $index Time:$temp" "$@"
         sleep ${temp}
         echo "Ending $index"
         kill -s SIGRTMIN $$
}

function parti() {
         if [ $INDICE -gt 0 ] ; then
              avvia $INDICE "${PARAMETRI[@]}" &
                let "INDICE--"
         else
                trap : SIGRTMIN
         fi
}

trap parti SIGRTMIN

while [ "$NUMPROC" -gt 0 ]; do
         parti;
         let "NUMPROC--"
done

wait
trap - SIGRTMIN

exit $?

: <<SCRIPT_AUTHOR_COMMENTS
I had the need to run a program, with specified options, on a number of
different files, using a SMP machine. So I thought [I'd] keep running
a specified number of processes and start a new one each time . . . one
of these terminates.

The "wait" instruction does not help, since it waits for a given process
or *all* process started in background. So I wrote [this] bash script
that can do the job, using the "trap" instruction.
  --Vernia Damiano
SCRIPT_AUTHOR_COMMENTS

Note

trap '' SIGNAL(两个相邻的单引号)禁用脚本其余部分的 SIGNAL。trap SIGNAL再次恢复 SIGNAL 的功能。这对于保护脚本的关键部分免受不良中断非常有用。

	trap '' 2  # Signal 2 is Control-C, now disabled.
	command
	command
	command
	trap 2     # Reenables Control-C
	

注释

[1]

按照惯例,信号 0被分配给 exit