Bash 支持非常多的字符串操作。 不幸的是,这些工具缺乏统一的焦点。 其中一些是 参数替换 的子集,另一些则属于 UNIX expr 命令的功能。 这导致了命令语法的不一致和功能的重叠,更不用说令人困惑了。
字符串长度
示例 10-1. 在文本文件中段落之间插入空行
#!/bin/bash
# paragraph-space.sh
# Ver. 2.1, Reldate 29Jul12 [fixup]
# Inserts a blank line between paragraphs of a single-spaced text file.
# Usage: $0 <FILENAME
MINLEN=60 # Change this value? It's a judgment call.
# Assume lines shorter than $MINLEN characters ending in a period
#+ terminate a paragraph. See exercises below.
while read line # For as many lines as the input file has ...
do
echo "$line" # Output the line itself.
len=${#line}
if [[ "$len" -lt "$MINLEN" && "$line" =~ [*{\.}]$ ]]
# if [[ "$len" -lt "$MINLEN" && "$line" =~ \[*\.\] ]]
# An update to Bash broke the previous version of this script. Ouch!
# Thank you, Halim Srama, for pointing this out and suggesting a fix.
then echo # Add a blank line immediately
fi #+ after a short line terminated by a period.
done
exit
# Exercises:
# ---------
# 1) The script usually inserts a blank line at the end
#+ of the target file. Fix this.
# 2) Line 17 only considers periods as sentence terminators.
# Modify this to include other common end-of-sentence characters,
#+ such as ?, !, and ". |
字符串开头匹配子字符串的长度
$substring是一个 正则表达式。
$substring是一个正则表达式。
stringZ=abcABC123ABCabc # |------| # 12345678 echo `expr match "$stringZ" 'abc[A-Z]*.2'` # 8 echo `expr "$stringZ" : 'abc[A-Z]*.2'` # 8 |
索引
子字符串提取
从$string的$position.
位置提取子字符串。$string如果$position.
提取从$length个字符的子字符串,从$string的$position.
stringZ=abcABC123ABCabc
# 0123456789.....
# 0-based indexing.
echo ${stringZ:0} # abcABC123ABCabc
echo ${stringZ:1} # bcABC123ABCabc
echo ${stringZ:7} # 23ABCabc
echo ${stringZ:7:3} # 23A
# Three characters of substring.
# Is it possible to index from the right end of the string?
echo ${stringZ:-4} # abcABC123ABCabc
# Defaults to full string, as in ${parameter:-default}.
# However . . .
echo ${stringZ:(-4)} # Cabc
echo ${stringZ: -4} # Cabc
# Now, it works.
# Parentheses or added space "escape" the position parameter.
# Thank you, Dan Jacobson, for pointing this out. |
position 和 length 参数可以是 "参数化的",也就是说,可以用变量而不是数值常量来表示。
示例 10-2. 生成一个 8 字符的 "随机" 字符串
#!/bin/bash
# rand-string.sh
# Generating an 8-character "random" string.
if [ -n "$1" ] # If command-line argument present,
then #+ then set start-string to it.
str0="$1"
else # Else use PID of script as start-string.
str0="$$"
fi
POS=2 # Starting from position 2 in the string.
LEN=8 # Extract eight characters.
str1=$( echo "$str0" | md5sum | md5sum )
# Doubly scramble ^^^^^^ ^^^^^^
#+ by piping and repiping to md5sum.
randstring="${str1:$POS:$LEN}"
# Can parameterize ^^^^ ^^^^
echo "$randstring"
exit $?
# bozo$ ./rand-string.sh my-password
# 1bdd88c4
# No, this is is not recommended
#+ as a method of generating hack-proof passwords. |
位置提取子字符串。$string参数是 "*" 或 "@",则这将提取最多$length个位置参数,从$position.
echo ${*:2} # Echoes second and following positional parameters.
echo ${@:2} # Same as above.
echo ${*:2:3} # Echoes three positional parameters, starting at second. |
提取从$length个字符,从$string位置开始。$position.
stringZ=abcABC123ABCabc # 123456789...... # 1-based indexing. echo `expr substr $stringZ 1 2` # ab echo `expr substr $stringZ 4 3` # ABC |
提取从$substring的开头,其中$string, 其中$substring是一个 正则表达式。
提取从$substring的开头,其中$string, 其中$substring是一个正则表达式。
stringZ=abcABC123ABCabc # ======= echo `expr match "$stringZ" '\(.[b-c]*[A-Z]..[0-9]\)'` # abcABC1 echo `expr "$stringZ" : '\(.[b-c]*[A-Z]..[0-9]\)'` # abcABC1 echo `expr "$stringZ" : '\(.......\)'` # abcABC1 # All of the above forms give an identical result. |
提取从$substring的末尾$string, 其中$substring是一个正则表达式。
提取从$substring的末尾$string, 其中$substring是一个正则表达式。
stringZ=abcABC123ABCabc # ====== echo `expr match "$stringZ" '.*\([A-C][A-C][A-C][a-c]*\)'` # ABCabc echo `expr "$stringZ" : '.*\(......\)'` # ABCabc |
子字符串删除
删除$substring前面$string.
删除最长匹配的$substring前面$string.
stringZ=abcABC123ABCabc
# |----| shortest
# |----------| longest
echo ${stringZ#a*C} # 123ABCabc
# Strip out shortest match between 'a' and 'C'.
echo ${stringZ##a*C} # abc
# Strip out longest match between 'a' and 'C'.
# You can parameterize the substrings.
X='a*C'
echo ${stringZ#$X} # 123ABCabc
echo ${stringZ##$X} # abc
# As above. |
删除$substring后面$string.
例如
# Rename all filenames in $PWD with "TXT" suffix to a "txt" suffix.
# For example, "file1.TXT" becomes "file1.txt" . . .
SUFF=TXT
suff=txt
for i in $(ls *.$SUFF)
do
mv -f $i ${i%.$SUFF}.$suff
# Leave unchanged everything *except* the shortest pattern match
#+ starting from the right-hand-side of the variable $i . . .
done ### This could be condensed into a "one-liner" if desired.
# Thank you, Rory Winston. |
删除最长匹配的$substring后面$string.
stringZ=abcABC123ABCabc
# || shortest
# |------------| longest
echo ${stringZ%b*c} # abcABC123ABCa
# Strip out shortest match between 'b' and 'c', from back of $stringZ.
echo ${stringZ%%b*c} # a
# Strip out longest match between 'b' and 'c', from back of $stringZ. |
此运算符对于生成文件名很有用。
示例 10-3. 转换图形文件格式,并更改文件名
#!/bin/bash
# cvt.sh:
# Converts all the MacPaint image files in a directory to "pbm" format.
# Uses the "macptopbm" binary from the "netpbm" package,
#+ which is maintained by Brian Henderson (bryanh@giraffe-data.com).
# Netpbm is a standard part of most Linux distros.
OPERATION=macptopbm
SUFFIX=pbm # New filename suffix.
if [ -n "$1" ]
then
directory=$1 # If directory name given as a script argument...
else
directory=$PWD # Otherwise use current working directory.
fi
# Assumes all files in the target directory are MacPaint image files,
#+ with a ".mac" filename suffix.
for file in $directory/* # Filename globbing.
do
filename=${file%.*c} # Strip ".mac" suffix off filename
#+ ('.*c' matches everything
#+ between '.' and 'c', inclusive).
$OPERATION $file > "$filename.$SUFFIX"
# Redirect conversion to new filename.
rm -f $file # Delete original files after converting.
echo "$filename.$SUFFIX" # Log what is happening to stdout.
done
exit 0
# Exercise:
# --------
# As it stands, this script converts *all* the files in the current
#+ working directory.
# Modify it to work *only* on files with a ".mac" suffix.
# *** And here's another way to do it. *** #
#!/bin/bash
# Batch convert into different graphic formats.
# Assumes imagemagick installed (standard in most Linux distros).
INFMT=png # Can be tif, jpg, gif, etc.
OUTFMT=pdf # Can be tif, jpg, gif, pdf, etc.
for pic in *"$INFMT"
do
p2=$(ls "$pic" | sed -e s/\.$INFMT//)
# echo $p2
convert "$pic" $p2.$OUTFMT
done
exit $? |
示例 10-4. 将流式音频文件转换为 ogg
#!/bin/bash
# ra2ogg.sh: Convert streaming audio files (*.ra) to ogg.
# Uses the "mplayer" media player program:
# http://www.mplayerhq.hu/homepage
# Uses the "ogg" library and "oggenc":
# http://www.xiph.org/
#
# This script may need appropriate codecs installed, such as sipr.so ...
# Possibly also the compat-libstdc++ package.
OFILEPREF=${1%%ra} # Strip off the "ra" suffix.
OFILESUFF=wav # Suffix for wav file.
OUTFILE="$OFILEPREF""$OFILESUFF"
E_NOARGS=85
if [ -z "$1" ] # Must specify a filename to convert.
then
echo "Usage: `basename $0` [filename]"
exit $E_NOARGS
fi
##########################################################################
mplayer "$1" -ao pcm:file=$OUTFILE
oggenc "$OUTFILE" # Correct file extension automatically added by oggenc.
##########################################################################
rm "$OUTFILE" # Delete intermediate *.wav file.
# If you want to keep it, comment out above line.
exit $?
# Note:
# ----
# On a Website, simply clicking on a *.ram streaming audio file
#+ usually only downloads the URL of the actual *.ra audio file.
# You can then use "wget" or something similar
#+ to download the *.ra file itself.
# Exercises:
# ---------
# As is, this script converts only *.ra filenames.
# Add flexibility by permitting use of *.ram and other filenames.
#
# If you're really ambitious, expand the script
#+ to do automatic downloads and conversions of streaming audio files.
# Given a URL, batch download streaming audio files (using "wget")
#+ and convert them on the fly. |
使用子字符串提取构造简单地模拟 getopt。
示例 10-5. 模拟 getopt
#!/bin/bash
# getopt-simple.sh
# Author: Chris Morgan
# Used in the ABS Guide with permission.
getopt_simple()
{
echo "getopt_simple()"
echo "Parameters are '$*'"
until [ -z "$1" ]
do
echo "Processing parameter of: '$1'"
if [ ${1:0:1} = '/' ]
then
tmp=${1:1} # Strip off leading '/' . . .
parameter=${tmp%%=*} # Extract name.
value=${tmp##*=} # Extract value.
echo "Parameter: '$parameter', value: '$value'"
eval $parameter=$value
fi
shift
done
}
# Pass all options to getopt_simple().
getopt_simple $*
echo "test is '$test'"
echo "test2 is '$test2'"
exit 0 # See also, UseGetOpt.sh, a modified version of this script.
---
sh getopt_example.sh /test=value1 /test2=value2
Parameters are '/test=value1 /test2=value2'
Processing parameter of: '/test=value1'
Parameter: 'test', value: 'value1'
Processing parameter of: '/test2=value2'
Parameter: 'test2', value: 'value2'
test is 'value1'
test2 is 'value2'
|
子字符串替换
替换第一个匹配项$substring为$replacement. [2]
替换所有匹配项$substring为$replacement.
stringZ=abcABC123ABCabc
echo ${stringZ/abc/xyz} # xyzABC123ABCabc
# Replaces first match of 'abc' with 'xyz'.
echo ${stringZ//abc/xyz} # xyzABC123ABCxyz
# Replaces all matches of 'abc' with # 'xyz'.
echo ---------------
echo "$stringZ" # abcABC123ABCabc
echo ---------------
# The string itself is not altered!
# Can the match and replacement strings be parameterized?
match=abc
repl=000
echo ${stringZ/$match/$repl} # 000ABC123ABCabc
# ^ ^ ^^^
echo ${stringZ//$match/$repl} # 000ABC123ABC000
# Yes! ^ ^ ^^^ ^^^
echo
# What happens if no $replacement string is supplied?
echo ${stringZ/abc} # ABC123ABCabc
echo ${stringZ//abc} # ABC123ABC
# A simple deletion takes place. |
如果$substring匹配$string,则替换$replacement为$substring.
如果$substring匹配后面$string,则替换$replacement为$substring.
stringZ=abcABC123ABCabc
echo ${stringZ/#abc/XYZ} # XYZABC123ABCabc
# Replaces front-end match of 'abc' with 'XYZ'.
echo ${stringZ/%abc/XYZ} # abcABC123ABCXYZ
# Replaces back-end match of 'abc' with 'XYZ'. |
Bash 脚本可以调用 awk 的字符串操作功能,以替代使用其内置操作。
示例 10-6. 提取和定位子字符串的替代方法
#!/bin/bash
# substring-extraction.sh
String=23skidoo1
# 012345678 Bash
# 123456789 awk
# Note different string indexing system:
# Bash numbers first character of string as 0.
# Awk numbers first character of string as 1.
echo ${String:2:4} # position 3 (0-1-2), 4 characters long
# skid
# The awk equivalent of ${string:pos:length} is substr(string,pos,length).
echo | awk '
{ print substr("'"${String}"'",3,4) # skid
}
'
# Piping an empty "echo" to awk gives it dummy input,
#+ and thus makes it unnecessary to supply a filename.
echo "----"
# And likewise:
echo | awk '
{ print index("'"${String}"'", "skid") # 3
} # (skid starts at position 3)
' # The awk equivalent of "expr index" ...
exit 0 |
| [1] | 这适用于命令行参数或传递给 函数 的参数。 |
| [2] | 请注意$substring和$replacement可能指的是 字面字符串 或 变量,具体取决于上下文。 请参阅第一个用法示例。 |