文件排序工具,通常用作管道中的过滤器。此命令按正序或倒序,或根据各种键或字符位置对文本流或文件进行排序。 使用-m选项,它可以合并预先排序的输入文件。信息页列出了它的许多功能和选项。请参阅示例 11-10、示例 11-11 和 示例 A-8。
拓扑排序,读取成对的空格分隔字符串并根据输入模式进行排序。 tsort 的最初目的是为 UNIX "远古" 版本中的 ld 链接器的过时版本对依赖项列表进行排序。
tsort 的结果通常与上面的标准 sort 命令的结果明显不同。
此过滤器从已排序的文件中删除重复的行。它经常与 sort 结合使用在管道中。
cat list-1 list-2 list-3 | sort | uniq > final.list # Concatenates the list files, # sorts them, # removes duplicate lines, # and finally writes the result to an output file. |
有用的-c选项在输入文件的每一行前加上它出现的次数。
bash$ cat testfile This line occurs only once. This line occurs twice. This line occurs twice. This line occurs three times. This line occurs three times. This line occurs three times. bash$ uniq -c testfile 1 This line occurs only once. 2 This line occurs twice. 3 This line occurs three times. bash$ sort testfile | uniq -c | sort -nr 3 This line occurs three times. 2 This line occurs twice. 1 This line occurs only once. |
命令sort INPUTFILE | uniq -c | sort -nr字符串生成出现频率的列表,该列表位于INPUTFILE文件上(sort 的 -nr-nr选项导致反向数字排序)。 此模板用于分析日志文件和字典列表,以及需要检查文档词汇结构的地方。
示例 16-12. 词频分析
#!/bin/bash # wf.sh: Crude word frequency analysis on a text file. # This is a more efficient version of the "wf2.sh" script. # Check for input file on command-line. ARGS=1 E_BADARGS=85 E_NOFILE=86 if [ $# -ne "$ARGS" ] # Correct number of arguments passed to script? then echo "Usage: `basename $0` filename" exit $E_BADARGS fi if [ ! -f "$1" ] # Check if file exists. then echo "File \"$1\" does not exist." exit $E_NOFILE fi ######################################################## # main () sed -e 's/\.//g' -e 's/\,//g' -e 's/ /\ /g' "$1" | tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr # ========================= # Frequency of occurrence # Filter out periods and commas, and #+ change space between words to linefeed, #+ then shift characters to lowercase, and #+ finally prefix occurrence count and sort numerically. # Arun Giridhar suggests modifying the above to: # . . . | sort | uniq -c | sort +1 [-f] | sort +0 -nr # This adds a secondary sort key, so instances of #+ equal occurrence are sorted alphabetically. # As he explains it: # "This is effectively a radix sort, first on the #+ least significant column #+ (word or string, optionally case-insensitive) #+ and last on the most significant column (frequency)." # # As Frank Wang explains, the above is equivalent to #+ . . . | sort | uniq -c | sort +0 -nr #+ and the following also works: #+ . . . | sort | uniq -c | sort -k1nr -k ######################################################## exit 0 # Exercises: # --------- # 1) Add 'sed' commands to filter out other punctuation, #+ such as semicolons. # 2) Modify the script to also filter out multiple spaces and #+ other whitespace. |
bash$ cat testfile This line occurs only once. This line occurs twice. This line occurs twice. This line occurs three times. This line occurs three times. This line occurs three times. bash$ ./wf.sh testfile 6 this 6 occurs 6 line 3 times 3 three 2 twice 1 only 1 once |
expand 过滤器将制表符转换为空格。它通常用在 管道中。
unexpand 过滤器将空格转换为制表符。这会反转 expand 的效果。
用于从文件中提取字段的工具。它类似于awk 中的print $N命令集,但功能更有限。 在脚本中使用 cut 可能比 awk 更简单。 特别重要的是-d(分隔符) 和-f(字段说明符)选项。
使用 cut 获取已挂载文件系统的列表
cut -d ' ' -f1,2 /etc/mtab |
使用 cut 列出操作系统和内核版本
uname -a | cut -d" " -f1,3,11,12 |
使用 cut 从电子邮件文件夹中提取消息标头
bash$ grep '^Subject:' read-messages | cut -c10-80 Re: Linux suitable for mission-critical apps? MAKE MILLIONS WORKING AT HOME!!! Spam complaint Re: Spam complaint |
使用 cut 解析文件
# List all the users in /etc/passwd. FILENAME=/etc/passwd for user in $(cut -d: -f1 $FILENAME) do echo $user done # Thanks, Oleg Philon for suggesting this. |
cut -d ' ' -f2,3 filename等效于awk -F'[ ]' '{ print $2, $3 }' filename
![]() | 甚至可以将换行符指定为分隔符。 诀窍是在命令序列中实际嵌入一个换行符 (RETURN)。
感谢 Jaka Kranjc 指出这一点。 |
另请参阅 示例 16-48。
用于将不同的文件合并到单个多列文件中的工具。与 cut 结合使用,可用于创建系统日志文件。
bash$ cat items alphabet blocks building blocks cables bash$ cat prices $1.00/dozen $2.50 ea. $3.75 bash$ paste items prices alphabet blocks $1.00/dozen building blocks $2.50 ea. cables $3.75 |
可以认为它是 paste 的一个特殊用途的近亲。 这个强大的实用程序允许以有意义的方式合并两个文件,这实际上创建了一个简单的关系数据库版本。
join 命令作用于正好两个文件,但只粘贴那些具有公共标记字段(通常是数字标签)的行,并将结果写入stdout。 要连接的文件应根据标记字段排序,以便匹配正常工作。
File: 1.data 100 Shoes 200 Laces 300 Socks |
File: 2.data 100 $40.00 200 $1.00 300 $2.00 |
bash$ join 1.data 2.data File: 1.data 2.data 100 Shoes $40.00 200 Laces $1.00 300 Socks $2.00 |
![]() | 标记字段在输出中只出现一次。 |
将文件的开头列出到stdout。 默认是10行,但可以指定不同的数字。 该命令有许多有趣的选项。
示例 16-13. 哪些文件是脚本?
#!/bin/bash # script-detector.sh: Detects scripts within a directory. TESTCHARS=2 # Test first 2 characters. SHABANG='#!' # Scripts begin with a "sha-bang." for file in * # Traverse all the files in current directory. do if [[ `head -c$TESTCHARS "$file"` = "$SHABANG" ]] # head -c2 #! # The '-c' option to "head" outputs a specified #+ number of characters, rather than lines (the default). then echo "File \"$file\" is a script." else echo "File \"$file\" is *not* a script." fi done exit 0 # Exercises: # --------- # 1) Modify this script to take as an optional argument #+ the directory to scan for scripts #+ (rather than just the current working directory). # # 2) As it stands, this script gives "false positives" for #+ Perl, awk, and other scripting language scripts. # Correct this. |
示例 16-14. 生成 10 位随机数
#!/bin/bash # rnd.sh: Outputs a 10-digit random number # Script by Stephane Chazelas. head -c4 /dev/urandom | od -N4 -tu4 | sed -ne '1s/.* //p' # =================================================================== # # Analysis # -------- # head: # -c4 option takes first 4 bytes. # od: # -N4 option limits output to 4 bytes. # -tu4 option selects unsigned decimal format for output. # sed: # -n option, in combination with "p" flag to the "s" command, # outputs only matched lines. # The author of this script explains the action of 'sed', as follows. # head -c4 /dev/urandom | od -N4 -tu4 | sed -ne '1s/.* //p' # ----------------------------------> | # Assume output up to "sed" --------> | # is 0000000 1198195154\n # sed begins reading characters: 0000000 1198195154\n. # Here it finds a newline character, #+ so it is ready to process the first line (0000000 1198195154). # It looks at its <range><action>s. The first and only one is # range action # 1 s/.* //p # The line number is in the range, so it executes the action: #+ tries to substitute the longest string ending with a space in the line # ("0000000 ") with nothing (//), and if it succeeds, prints the result # ("p" is a flag to the "s" command here, this is different #+ from the "p" command). # sed is now ready to continue reading its input. (Note that before #+ continuing, if -n option had not been passed, sed would have printed #+ the line once again). # Now, sed reads the remainder of the characters, and finds the #+ end of the file. # It is now ready to process its 2nd line (which is also numbered '$' as #+ it's the last one). # It sees it is not matched by any <range>, so its job is done. # In few word this sed commmand means: # "On the first line only, remove any character up to the right-most space, #+ then print it." # A better way to do this would have been: # sed -e 's/.* //;q' # Here, two <range><action>s (could have been written # sed -e 's/.* //' -e q): # range action # nothing (matches line) s/.* // # nothing (matches line) q (quit) # Here, sed only reads its first line of input. # It performs both actions, and prints the line (substituted) before #+ quitting (because of the "q" action) since the "-n" option is not passed. # =================================================================== # # An even simpler altenative to the above one-line script would be: # head -c4 /dev/urandom| od -An -tu4 exit |
将文件的(尾部)末尾列出到stdout。 默认是10行,但这可以使用-n选项更改。 通常用于使用-f-f
选项,该选项输出附加到文件的行,以跟踪系统日志文件的更改。
#!/bin/bash filename=sys.log cat /dev/null > $filename; echo "Creating / cleaning out file." # Creates the file if it does not already exist, #+ and truncates it to zero length if it does. # : > filename and > filename also work. tail /var/log/messages > $filename # /var/log/messages must have world read permission for this to work. echo "$filename contains tail end of system log." exit 0 |
![]() | 示例 16-15. 使用 tail 监视系统日志要列出文本文件的特定行,请将 head 的输出管道传输到 tail -n 1。 例如head -n 8 database.txt | tail -n 1列出文件. database.txt
|
![]() | 的第 8 行 |
要将变量设置为文本文件的给定块
另请参阅 示例 16-5、示例 16-39 和 示例 32-6。grep一种使用正则表达式的多用途文件搜索工具。 它最初是古老的 ed 行编辑器中的命令/过滤器
g/re/p -- global - regular expression - print(全局 - 正则表达式 - 打印)。 [grep...]
pattern-- global - regular expression - print(全局 - 正则表达式 - 打印)。file-- global - regular expression - print(全局 - 正则表达式 - 打印)。在目标文件中搜索
bash$ grep '[rst]ystem.$' osinfo.txt The GPL governs the distribution of the Linux operating system. |
的出现次数,其中stdout可以是文字文本或正则表达式。
bash$ ps ax | grep clock 765 tty1 S 0:00 xclock 901 pts/1 S 0:00 grep clock |
命令如果未指定目标文件,则 grep 用作 管道中的上的过滤器。
命令-i选项导致不区分大小写的搜索。
命令-w选项仅匹配整个单词。
命令-l选项仅列出找到匹配项的文件,而不列出匹配的行。
命令-n-r
bash$ grep -n Linux osinfo.txt 2:This is a file containing information about Linux. 6:The GPL governs the distribution of the Linux operating system. |
命令(递归)选项搜索当前工作目录及其以下所有子目录中的文件。-n选项列出匹配的行,以及行号。-v
grep pattern1 *.txt | grep -v pattern2 # Matches all lines in "*.txt" files containing "pattern1", # but ***not*** "pattern2". |
命令-c ((或--invert-match
grep -c txt *.sgml # (number of occurrences of "txt" in "*.sgml" files) # grep -cz . # ^ dot # means count (-c) zero-separated (-z) items matching "." # that is, non-empty ones (containing at least 1 character). # printf 'a b\nc d\n\n\n\n\n\000\n\000e\000\000\nf' | grep -cz . # 3 printf 'a b\nc d\n\n\n\n\n\000\n\000e\000\000\nf' | grep -cz '$' # 5 printf 'a b\nc d\n\n\n\n\n\000\n\000e\000\000\nf' | grep -cz '^' # 5 # printf 'a b\nc d\n\n\n\n\n\000\n\000e\000\000\nf' | grep -c '$' # 9 # By default, newline chars (\n) separate items to match. # Note that the -z option is GNU "grep" specific. # Thanks, S.C. |
命令)选项过滤掉匹配项。-n--count)选项给出匹配项的数字计数,而不是实际列出匹配项。--color--colour
)选项以颜色标记匹配的字符串(在控制台上或在 xterm 窗口中)。 由于 grep 打印出包含匹配模式的每一整行,因此您可以准确地看到什么正在被匹配。 另请参阅
#!/bin/bash # from.sh # Emulates the useful 'from' utility in Solaris, BSD, etc. # Echoes the "From" header line in all messages #+ in your e-mail directory. MAILDIR=~/mail/* # No quoting of variable. Why? # Maybe check if-exists $MAILDIR: if [ -d $MAILDIR ] . . . GREP_OPTS="-H -A 5 --color" # Show file, plus extra context lines #+ and display "From" in color. TARGETSTR="^From" # "From" at beginning of line. for file in $MAILDIR # No quoting of variable. do grep $GREP_OPTS "$TARGETSTR" "$file" # ^^^^^^^^^^ # Again, do not quote this variable. echo done exit $? # You might wish to pipe the output of this script to 'more' #+ or redirect it to a file . . . |
-o
bash$ grep Linux osinfo.txt misc.txt osinfo.txt:This is a file containing information about Linux. osinfo.txt:The GPL governs the distribution of the Linux operating system. misc.txt:The Linux operating system is steadily gaining in popularity. |
![]() | 选项,该选项仅显示行中匹配的部分。示例 16-16. 打印存储的电子邮件消息中的 From 行当使用多个目标文件调用时,grep 指定哪个文件包含匹配项。
|
要强制 grep 在仅搜索一个目标文件时显示文件名,只需给出/dev/null作为第二个文件。
SUCCESS=0 # if grep lookup succeeds word=Linux filename=data.file grep -q "$word" "$filename" # The "-q" option #+ causes nothing to echo to stdout. if [ $? -eq $SUCCESS ] # if grep -q "$word" "$filename" can replace lines 5 - 7. then echo "$word found in $filename" else echo "$word not found in $filename" fi |
如果存在成功的匹配,则 grep 返回 0 的退出状态,这使其在脚本中的条件测试中很有用,尤其是在与
-q
#!/bin/bash # grp.sh: Rudimentary reimplementation of grep. E_BADARGS=85 if [ -z "$1" ] # Check for argument to script. then echo "Usage: `basename $0` pattern" exit $E_BADARGS fi echo for file in * # Traverse all files in $PWD. do output=$(sed -n /"$1"/p $file) # Command substitution. if [ ! -z "$output" ] # What happens if "$output" is not quoted? then echo -n "$file: " echo "$output" fi # sed -ne "/$1/s|^|${file}: |p" is equivalent to above. echo done echo exit 0 # Exercises: # --------- # 1) Add newlines to output, if more than one match in any given file. # 2) Add features. |
选项一起使用以抑制输出时。
示例 32-6演示了如何使用 grep 在系统日志文件中搜索单词模式。
示例 16-17. 在脚本中模拟 grep
# Filename: tstfile This is a sample file. This is an ordinary text file. This file does not contain any unusual text. This file is not unusual. Here is some text. |
grep 如何搜索两个(或多个)单独的模式? 如果您希望 grep 显示文件或文件中包含 "pattern1"和 "pattern2" 的所有行,该怎么办?
bash$ grep file tstfile # Filename: tstfile This is a sample file. This is an ordinary text file. This file does not contain any unusual text. This file is not unusual. bash$ grep file tstfile | grep text This is an ordinary text file. This file does not contain any unusual text. |
一种方法是将 grep pattern1 的结果管道传输到 grep pattern2。
例如,给定以下文件
#!/bin/bash # cw-solver.sh # This is actually a wrapper around a one-liner (line 46). # Crossword puzzle and anagramming word game solver. # You know *some* of the letters in the word you're looking for, #+ so you need a list of all valid words #+ with the known letters in given positions. # For example: w...i....n # 1???5????10 # w in position 1, 3 unknowns, i in the 5th, 4 unknowns, n at the end. # (See comments at end of script.) E_NOPATT=71 DICT=/usr/share/dict/word.lst # ^^^^^^^^ Looks for word list here. # ASCII word list, one word per line. # If you happen to need an appropriate list, #+ download the author's "yawl" word list package. # http://ibiblio.org/pub/Linux/libs/yawl-0.3.2.tar.gz # or # http://bash.deta.in/yawl-0.3.2.tar.gz if [ -z "$1" ] # If no word pattern specified then #+ as a command-line argument . . . echo #+ . . . then . . . echo "Usage:" #+ Usage message. echo echo ""$0" \"pattern,\"" echo "where \"pattern\" is in the form" echo "xxx..x.x..." echo echo "The x's represent known letters," echo "and the periods are unknown letters (blanks)." echo "Letters and periods can be in any position." echo "For example, try: sh cw-solver.sh w...i....n" echo exit $E_NOPATT fi echo # =============================================== # This is where all the work gets done. grep ^"$1"$ "$DICT" # Yes, only one line! # | | # ^ is start-of-word regex anchor. # $ is end-of-word regex anchor. # From _Stupid Grep Tricks_, vol. 1, #+ a book the ABS Guide author may yet get around #+ to writing . . . one of these days . . . # =============================================== echo exit $? # Script terminates here. # If there are too many words generated, #+ redirect the output to a file. $ sh cw-solver.sh w...i....n wellington workingman workingmen |
现在,让我们在这个文件中搜索同时包含 "file" 和 "text" 的行 . . .
bash $ egrep 'matches|Matches' file.txt Line 1 matches. Line 3 Matches. Line 4 contains matches, but also Matches |
现在,对于 grep 的一个有趣的娱乐用途 . . .
![]() | 示例 16-18. 纵横字谜求解器egrep -- extended grep(扩展 grep) -- 与 grep -E 相同。 这使用了一组稍微不同的扩展的正则表达式,这可以使搜索更加灵活。 它还允许布尔 | (或) 运算符。fgrep -- fast grep(快速 grep) -- 与 grep -F 相同。 它执行文字字符串搜索(没有正则表达式),这通常会加快速度。在某些 Linux 发行版上,egrep 和 fgrep 是指向 grep 的符号链接或别名,但使用-E |
和
#!/bin/bash # dict-lookup.sh # This script looks up definitions in the 1913 Webster's Dictionary. # This Public Domain dictionary is available for download #+ from various sites, including #+ Project Gutenberg (http://www.gutenberg.org/etext/247). # # Convert it from DOS to UNIX format (with only LF at end of line) #+ before using it with this script. # Store the file in plain, uncompressed ASCII text. # Set DEFAULT_DICTFILE variable below to path/filename. E_BADARGS=85 MAXCONTEXTLINES=50 # Maximum number of lines to show. DEFAULT_DICTFILE="/usr/share/dict/webster1913-dict.txt" # Default dictionary file pathname. # Change this as necessary. # Note: # ---- # This particular edition of the 1913 Webster's #+ begins each entry with an uppercase letter #+ (lowercase for the remaining characters). # Only the *very first line* of an entry begins this way, #+ and that's why the search algorithm below works. if [[ -z $(echo "$1" | sed -n '/^[A-Z]/p') ]] # Must at least specify word to look up, and #+ it must start with an uppercase letter. then echo "Usage: `basename $0` Word-to-define [dictionary-file]" echo echo "Note: Word to look up must start with capital letter," echo "with the rest of the word in lowercase." echo "--------------------------------------------" echo "Examples: Abandon, Dictionary, Marking, etc." exit $E_BADARGS fi if [ -z "$2" ] # May specify different dictionary #+ as an argument to this script. then dictfile=$DEFAULT_DICTFILE else dictfile="$2" fi # --------------------------------------------------------- Definition=$(fgrep -A $MAXCONTEXTLINES "$1 \\" "$dictfile") # Definitions in form "Word \..." # # And, yes, "fgrep" is fast enough #+ to search even a very large text file. # Now, snip out just the definition block. echo "$Definition" | sed -n '1,/^[A-Z]/p' | # Print from first line of output #+ to the first line of the next entry. sed '$d' | sed '$d' # Delete last two lines of output #+ (blank line and first line of next entry). # --------------------------------------------------------- exit $? # Exercises: # --------- # 1) Modify the script to accept any type of alphabetic input # + (uppercase, lowercase, mixed case), and convert it # + to an acceptable format for processing. # # 2) Convert the script to a GUI application, # + using something like 'gdialog' or 'zenity' . . . # The script will then no longer take its argument(s) # + from the command-line. # # 3) Modify the script to parse one of the other available # + Public Domain Dictionaries, such as the U.S. Census Bureau Gazetteer. |
![]() | -F |
选项分别调用。
![]() | 示例 16-19. 在 韦氏 1913 词典中查找定义 另请参阅 示例 A-41,以获取在大型文本文件上进行快速 fgrep 查找的示例。 |
要搜索压缩文件,请使用 zgrep、zegrep 或 zfgrep。 这些也适用于未压缩的文件,但比普通的 grep、egrep、fgrep 慢。 它们便于搜索混合文件集,有些是压缩的,有些则不是。要搜索 bzipped 文件,请使用 bzgrep。look
look 命令的工作方式类似于 grep,但在 "字典"(已排序的单词列表)上执行查找。 默认情况下,look 在
#!/bin/bash # lookup: Does a dictionary lookup on each word in a data file. file=words.data # Data file from which to read words to test. echo echo "Testing file $file" echo while [ "$word" != end ] # Last word in data file. do # ^^^ read word # From data file, because of redirection at end of loop. look $word > /dev/null # Don't want to display lines in dictionary file. # Searches for words in the file /usr/share/dict/words #+ (usually a link to linux.words). lookup=$? # Exit status of 'look' command. if [ "$lookup" -eq 0 ] then echo "\"$word\" is valid." else echo "\"$word\" is invalid." fi done <"$file" # Redirects stdin to $file, so "reads" come from there. echo exit 0 # ---------------------------------------------------------------- # Code below line will not execute because of "exit" command above. # Stephane Chazelas proposes the following, more concise alternative: while read word && [[ $word != end ]] do if look "$word" > /dev/null then echo "\"$word\" is valid." else echo "\"$word\" is invalid." fi done <"$file" exit 0 |
中搜索匹配项,但可以指定不同的字典文件。
sed, awk
sed
wc 用于对文件或 I/O 流进行"字数统计"
bash $ wc /usr/share/doc/sed-4.1.2/README 13 70 447 README [13 lines 70 words 447 characters] |
wc -w仅给出单词计数。
wc -l仅给出行数计数。
wc -c仅给出字节计数。
wc -m仅给出字符计数。
wc -L仅给出最长行的长度。
使用 wc 来统计当前工作目录中有多少.txt文件
$ ls *.txt | wc -l # Will work as long as none of the "*.txt" files #+ have a linefeed embedded in their name. # Alternative ways of doing this are: # find . -maxdepth 1 -name \*.txt -print0 | grep -cz . # (shopt -s nullglob; set -- *.txt; echo $#) # Thanks, S.C. |
使用 wc 来统计所有名称以字母 d - h 开头的文件的大小总和
bash$ wc [d-h]* | grep total | awk '{print $3}' 71832 |
使用 wc 来统计本书的主要源文件中 "Linux" 一词出现的次数。
bash$ grep Linux abs-book.sgml | wc -l 138 |
某些命令包含 wc 的部分功能作为选项。
... | grep foo | wc -l # This frequently used construct can be more concisely rendered. ... | grep -c foo # Just use the "-c" (or "--count") option of grep. # Thanks, S.C. |
字符转换过滤器。
![]() | 必须适当地使用引号和/或括号。引号可防止 shell 重新解释 tr 命令序列中的特殊字符。括号应加上引号以防止 shell 扩展。 |
以下任一种方式:tr "A-Z" "*" <filename或者tr A-Z \* <filename将filename中的所有大写字母更改为星号(写入到stdout)。在某些系统上,这可能无法工作,但tr A-Z '[**]'可以。
命令-d选项删除一系列字符。
echo "abcdef" # abcdef echo "abcdef" | tr -d b-d # aef tr -d 0-9 <filename # Deletes all digits from the file "filename". |
命令--squeeze-repeats-n-s)选项删除除连续字符字符串的第一个实例之外的所有实例。此选项对于删除多余的空白很有用。
bash$ echo "XXXXX" | tr --squeeze-repeats 'X' X |
命令-c "complement" 选项 反转要匹配的字符集。使用此选项,tr 仅对不匹配指定集合的字符起作用。
bash$ echo "acfdeb123" | tr -c b-d + +c+d+b++++ |
bash$ echo "abcd2ef1" | tr '[:alpha:]' - ----2--1 |
示例 16-21. toupper:将文件转换为全大写。
#!/bin/bash # Changes a file to all uppercase. E_BADARGS=85 if [ -z "$1" ] # Standard check for command-line arg. then echo "Usage: `basename $0` filename" exit $E_BADARGS fi tr a-z A-Z <"$1" # Same effect as above, but using POSIX character set notation: # tr '[:lower:]' '[:upper:]' <"$1" # Thanks, S.C. # Or even . . . # cat "$1" | tr a-z A-Z # Or dozens of other ways . . . exit 0 # Exercise: # Rewrite this script to give the option of changing a file #+ to *either* upper or lowercase. # Hint: Use either the "case" or "select" command. |
示例 16-22. lowercase:将工作目录中的所有文件名更改为小写。
#!/bin/bash # # Changes every filename in working directory to all lowercase. # # Inspired by a script of John Dubois, #+ which was translated into Bash by Chet Ramey, #+ and considerably simplified by the author of the ABS Guide. for filename in * # Traverse all files in directory. do fname=`basename $filename` n=`echo $fname | tr A-Z a-z` # Change name to lowercase. if [ "$fname" != "$n" ] # Rename only files not already lowercase. then mv $fname $n fi done exit $? # Code below this line will not execute because of "exit". #--------------------------------------------------------# # To run it, delete script above line. # The above script will not work on filenames containing blanks or newlines. # Stephane Chazelas therefore suggests the following alternative: for filename in * # Not necessary to use basename, # since "*" won't return any file containing "/". do n=`echo "$filename/" | tr '[:upper:]' '[:lower:]'` # POSIX char set notation. # Slash added so that trailing newlines are not # removed by command substitution. # Variable substitution: n=${n%/} # Removes trailing slash, added above, from filename. [[ $filename == $n ]] || mv "$filename" "$n" # Checks if filename already lowercase. done exit $? |
示例 16-23. du:DOS 到 UNIX 文本文件转换。
#!/bin/bash # Du.sh: DOS to UNIX text file converter. E_WRONGARGS=85 if [ -z "$1" ] then echo "Usage: `basename $0` filename-to-convert" exit $E_WRONGARGS fi NEWFILENAME=$1.unx CR='\015' # Carriage return. # 015 is octal ASCII code for CR. # Lines in a DOS text file end in CR-LF. # Lines in a UNIX text file end in LF only. tr -d $CR < $1 > $NEWFILENAME # Delete CR's and write to new file. echo "Original DOS text file is \"$1\"." echo "Converted UNIX text file is \"$NEWFILENAME\"." exit 0 # Exercise: # -------- # Change the above script to convert from UNIX to DOS. |
示例 16-24. rot13:超弱加密。
#!/bin/bash # rot13.sh: Classic rot13 algorithm, # encryption that might fool a 3-year old # for about 10 minutes. # Usage: ./rot13.sh filename # or ./rot13.sh <filename # or ./rot13.sh and supply keyboard input (stdin) cat "$@" | tr 'a-zA-Z' 'n-za-mN-ZA-M' # "a" goes to "n", "b" to "o" ... # The cat "$@" construct #+ permits input either from stdin or from files. exit 0 |
示例 16-25. 生成 "密码引用" 谜题
#!/bin/bash # crypto-quote.sh: Encrypt quotes # Will encrypt famous quotes in a simple monoalphabetic substitution. # The result is similar to the "Crypto Quote" puzzles #+ seen in the Op Ed pages of the Sunday paper. key=ETAOINSHRDLUBCFGJMQPVWZYXK # The "key" is nothing more than a scrambled alphabet. # Changing the "key" changes the encryption. # The 'cat "$@"' construction gets input either from stdin or from files. # If using stdin, terminate input with a Control-D. # Otherwise, specify filename as command-line parameter. cat "$@" | tr "a-z" "A-Z" | tr "A-Z" "$key" # | to uppercase | encrypt # Will work on lowercase, uppercase, or mixed-case quotes. # Passes non-alphabetic characters through unchanged. # Try this script with something like: # "Nothing so needs reforming as other people's habits." # --Mark Twain # # Output is: # "CFPHRCS QF CIIOQ MINFMBRCS EQ FPHIM GIFGUI'Q HETRPQ." # --BEML PZERC # To reverse the encryption: # cat "$@" | tr "$key" "A-Z" # This simple-minded cipher can be broken by an average 12-year old #+ using only pencil and paper. exit 0 # Exercise: # -------- # Modify the script so that it will either encrypt or decrypt, #+ depending on command-line argument(s). |
#!/bin/bash # jabh.sh x="wftedskaebjgdBstbdbsmnjgz" echo $x | tr "a-z" 'oh, turtleneck Phrase Jar!' # Based on the Wikipedia "Just another Perl hacker" article. |
一个过滤器,它将输入行包装到指定的宽度。这对于-s选项尤其有用,它在单词空格处断行(参见 示例 16-26 和 示例 A-1)。
简单的文件格式化程序,用作管道中的过滤器以 "包装" 长文本输出行。
示例 16-26. 格式化的文件列表。
#!/bin/bash WIDTH=40 # 40 columns wide. b=`ls /usr/local/bin` # Get a file listing... echo $b | fmt -w $WIDTH # Could also have been done by # echo $b | fold - -s -w $WIDTH exit 0 |
另请参阅 示例 16-5。
![]() | Kamil Toman 的 par 实用程序是 fmt 的一个强大的替代方案,可从 http://www.cs.berkeley.edu/~amc/Par/ 获取。 |
这个名称具有欺骗性的过滤器从输入流中删除反向换行符。它还会尝试用等效的制表符替换空格。col 的主要用途是过滤来自某些文本处理实用程序(如 groff 和 tbl)的输出。
列格式化程序。此过滤器通过在适当的位置插入制表符,将列表类型的文本输出转换为"漂亮打印"的表格。
示例 16-27. 使用 column 格式化目录列表
#!/bin/bash # colms.sh # A minor modification of the example file in the "column" man page. (printf "PERMISSIONS LINKS OWNER GROUP SIZE MONTH DAY HH:MM PROG-NAME\n" \ ; ls -l | sed 1d) | column -t # ^^^^^^ ^^ # The "sed 1d" in the pipe deletes the first line of output, #+ which would be "total N", #+ where "N" is the total number of files found by "ls -l". # The -t option to "column" pretty-prints a table. exit 0 |
列删除过滤器。它从文件中删除列(字符),并将缺少指定列范围的文件写回到stdout. colrm 2 4 <filename从文本文件的每一行中删除第二个到第四个字符filename.
![]() | 如果文件包含制表符或不可打印字符,这可能会导致不可预测的行为。在这种情况下,请考虑在 colrm 之前的管道中使用 expand 和 unexpand。 |
行编号过滤器nl filename列出filename到stdout,但在每个非空白行的开头插入连续的数字。如果filename省略,则对stdin 操作。
nl 的输出非常类似于cat -b,因为默认情况下 nl 不列出空白行。
示例 16-28. nl:一个自我编号的脚本。
#!/bin/bash # line-number.sh # This script echoes itself twice to stdout with its lines numbered. echo " line number = $LINENO" # 'nl' sees this as line 4 # (nl does not number blank lines). # 'cat -n' sees it correctly as line #6. nl `basename $0` echo; echo # Now, let's try it with 'cat -n' cat -n `basename $0` # The difference is that 'cat -n' numbers the blank lines. # Note that 'nl -ba' will also do so. exit 0 # ----------------------------------------------------------------- |
打印格式化过滤器。这会将文件(或stdout)分页成适合硬拷贝打印或在屏幕上查看的部分。各种选项允许行和列操作、连接行、设置边距、编号行、添加页眉以及合并文件等。pr 命令结合了 nl、paste、fold、column 和 expand 的大部分功能。
pr -o 5 --width=65 fileZZZ | more提供了一个漂亮的屏幕分页列表fileZZZ边距设置为 5 和 65。
一个特别有用的选项是-d,强制双倍行距(与 sed -G 效果相同)。
GNU gettext 软件包是一组用于本地化和将程序的文本输出翻译成外语的实用程序。虽然最初用于 C 程序,但现在它支持相当多的编程和脚本语言。
gettext *程序* 适用于 shell 脚本。参见info 页面.
一个用于生成二进制消息目录的程序。它用于本地化。
一个用于将文件转换为不同编码(字符集)的实用程序。它的主要用途是本地化。
# Convert a string from UTF-8 to UTF-16 and print to the BookList function write_utf8_string { STRING=$1 BOOKLIST=$2 echo -n "$STRING" | iconv -f UTF8 -t UTF16 | \ cut -b 3- | tr -d \\n >> "$BOOKLIST" } # From Peter Knowles' "booklistgen.sh" script #+ for converting files to Sony Librie/PRS-50X format. # (http://booklistgensh.peterknowles.com) |
可以将此视为上面 iconv 的一个更高级版本。这是一个非常通用的实用程序,用于将文件转换为不同的编码方案。请注意,recode 不是标准 Linux 安装的一部分。
TeX 和 Postscript 是用于准备打印副本或格式化视频显示的文本标记语言。
TeX 是 Donald Knuth 精心设计的排版系统。通常方便地编写一个 shell 脚本,封装所有传递给这些标记语言之一的选项和参数。
Ghostscript (gs) 是一个 GPL 许可的 Postscript 解释器。
用于处理 TeX 和 pdf 文件的实用程序。位于/usr/bin在许多 Linux 发行版上,它实际上是一个调用 Perl 来调用 Tex 的 shell 包装器。
texexec --pdfarrange --result=Concatenated.pdf *pdf # Concatenates all the pdf files in the current working directory #+ into the merged file, Concatenated.pdf . . . # (The --pdfarrange option repaginates a pdf file. See also --pdfcombine.) # The above command-line could be parameterized and put into a shell script. |
用于将纯文本文件转换为 PostScript 的实用程序
例如,enscript filename.txt -p filename.ps 生成 PostScript 输出文件filename.ps.
另一种文本标记和显示格式化语言是 groff。这是古老的 UNIX roff/troff 显示和排版软件包的增强 GNU 版本。手册页 使用 groff。
tbl 表处理实用程序被认为是 groff 的一部分,因为它的功能是将表标记转换为 groff 命令。
eqn 方程处理实用程序同样是 groff 的一部分,其功能是将方程标记转换为 groff 命令。
示例 16-29. manview:查看格式化的手册页
#!/bin/bash # manview.sh: Formats the source of a man page for viewing. # This script is useful when writing man page source. # It lets you look at the intermediate results on the fly #+ while working on it. E_WRONGARGS=85 if [ -z "$1" ] then echo "Usage: `basename $0` filename" exit $E_WRONGARGS fi # --------------------------- groff -Tascii -man $1 | less # From the man page for groff. # --------------------------- # If the man page includes tables and/or equations, #+ then the above code will barf. # The following line can handle such cases. # # gtbl < "$1" | geqn -Tlatin1 | groff -Tlatin1 -mtty-char -man # # Thanks, S.C. exit $? # See also the "maned.sh" script. |
另请参阅 示例 A-39。
lex 词法分析器生成用于模式匹配的程序。在 Linux 系统上,它已被非专有的 flex 取代。
yacc 实用程序基于一组规范创建一个解析器。在 Linux 系统上,它已被非专有的 bison 取代。
[1] | 这仅适用于 GNU 版本的 tr,而不适用于商业 UNIX 系统上常见的通用版本。 |