最近重温了一下 awk, sed, tr
等命令,然后发现leetcode中还有几道专门关于shell的题目shell in leetcode, 于是就也做了一下。下面是几道题目的解决方案。
- TOC
{:toc}
tenth-line ref
How would you print just the 10th line of a file?
For example, assume that file.txt has the following content:
Line 1
Line 2
Line 3
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
Your script should output the tenth line, which is:
Line 10
即打印一个文件中的第10行。
我刚开始就写了head -n 10 file.txt | tail -n 1
这个错, 这个是10行内的最后一行,不足10行的就不要打印了。
AC的解决方案有3个如下:
head -n 10 file.txt | tail -n +10
这个是OK的, 先取出前10行, 再从内容中从前往后取第10行及其以后的sed -n '10p' file.txt
-n
去除本来要echo file.txt中的内容awk 'NR==10' file.txt
或者awk 'NR==10{print $0}' file.txt
, awk参考AWK简明教程
valid-phone-numbers ref
Given a text file file.txt that contains list of phone numbers (one per line), write a one liner bash script to print all valid phone numbers.
You may assume that a valid phone number must appear in one of the following two formats: (xxx) xxx-xxxx or xxx-xxx-xxxx. (x means a digit)
You may also assume each line in the text file must not contain leading or trailing white spaces.
For example, assume that file.txt has the following content:
987-123-4567
123 456 7890
(123) 456-7890
Your script should output the following valid phone numbers:
987-123-4567
(123) 456-7890
需要从文件中提取满足特定条件的行,用正则表达式即可。可用grep, sed, awk
实现。
awk '$0 ~ /^\([0-9]{3}\) [0-9]{3}-[0-9]{4}$|^[0-9]{3}-[0-9]{3}-[0-9]{4}$/' file.txt
awk, $0
表示整行, /reg1|reg2/
reg1满足或者reg2满足, 或者省略”$0 ~” 然后将两个reg共同部分提取出来.
awk '/^(\([0-9]{3}\) |[0-9]{3}-)[0-9]{3}-[0-9]{4}$/' file.txt |
Max OS X上的sed
是BSD的版本(BSD的\n, \t
等都得注意),Linux上的是GNU的版本,
例如在Mac中空格替换换行符用sed -E 's/ /\'$'\n/g' words.txt
而Linux下只需要sed 's/ /\n/g' words.txt
Reference
transpose-file ref
Given a text file file.txt, transpose its content.
You may assume that each row has the same number of columns and each field is separated by the ‘ ‘ character.
For example, if file.txt has the following content:
name age
alice 21
ryan 30
Output the following:
name alice ryan
age 21 30
类似shell版本的矩阵转置问题。
ncol=`head -n1 file.txt | wc -w` |
把每一列用cut
取出来,然后echo
成一行。echo
会去掉换行符, 不过ref这个方案超内存了.
最后方案用awk
实现。
awk '{for(i=1;i<=NF;i++){ a[i]=a[i] sprintf("%s ", $i); }} END { for (i=1;i<=NF;i++) print a[i]; }' file.txt | sed 's/ $//' |
解释一下得
- 第一行处理后:
a[1]=name , a[2]=age
- 第二行处理后:
a[1]=name alice , a[2]=age 21
- …
- 都处理完后, 打印a[1]中的得到
name alice ryan
, 最后再删除最后一列跟着的空格即可。
注意最后替换掉换行符前面的空格, 这个方案在GNU/Linux下OK, Mac下貌似还需要逆序一下, 主要区别是for (i in a)
, Mac下逆序了
# GNU/Linux下OK |
或者这样也行, 把空格拿到开头, 最后替换掉即可.
awk '{for(i=1;i<=NF;i++){ a[i]=a[i] " " $i; }} END { for (i=1;i<=NF;i++) print a[i]; }' file.txt | sed 's/^ //' |
这个参考了ref SO
Word Frequency ref
Write a bash script to calculate the frequency of each word in a text file words.txt.
For simplicity sake, you may assume:
- words.txt contains only lowercase characters and space ‘ ‘ characters.
- Each word must consist of lowercase characters only.
- Words are separated by one or more whitespace characters.
the sunny is is
is 3
sunny 2
day 1
计算每个单词出现的次数,并按照次数从多到少排列。这个功能简单, 结合sort, uniq
就可以实现。
Mac下换行注意" \'$'\n"
, sort
参数 -n
(number排序, -d字典序), -r
(reverse), -k
(第几列), -t
(分隔符), 结果为”number string” 再调换一下列位置, 还得去掉空格的行.
sed -E 's/[[:space:]]+/\'$'\n/g' words.txt | sort | uniq -c | sort -n -r -k1 | awk '{print $2 " "$1}' |
最后答案:
sed -E 's/[[:space:]]+/\n/g' words.txt | grep -v "^$" | sort | uniq -c | sort -n -r -k1 | awk '{print $2 " "$1}' |