使用终端和正则表达式从 .txt 文件中选择变量名称
Selecting names of variables from .txt file using the terminal and regular expressions
我正在尝试创建一个仅包含变量名称的文件:
我正在使用自然表达式和 bash 终端。主 .txt 文件包含以下内容:
" 1. symboling: -3, -2, -1, 0, 1, 2, 3.
2. normalized-losses: continuous from 65 to 256.
3. make: alfa-romero, audi, bmw, chevrolet, dodge, honda,
4. fuel-type: diesel, gas.
5. aspiration: std, turbo.
6. num-of-doors: four, two.
7. body-style: hardtop, wagon, sedan, hatchback, convertible.
8. drive-wheels: 4wd, fwd, rwd.
9. engine-location: front, rear.
10. wheel-base: continuous from 86.6 120.9.
11. length: continuous from 141.1 to 208.1.
12. width: continuous from 60.3 to 72.3.
13. height: continuous from 47.8 to 59.8.
14. curb-weight: continuous from 1488 to 4066.
15. engine-type: dohc, dohcv, l, ohc, ohcf, ohcv, rotor.
16. num-of-cylinders: eight, five, four, six, three, twelve, two.
17. engine-size: continuous from 61 to 326.
18. fuel-system: 1bbl, 2bbl, 4bbl, idi, mfi, mpfi, spdi, spfi.
19. bore: continuous from 2.54 to 3.94.
20. stroke: continuous from 2.07 to 4.17.
21. compression-ratio: continuous from 7 to 23.
22. horsepower: continuous from 48 to 288.
23. peak-rpm: continuous from 4150 to 6600.
24. city-mpg: continuous from 13 to 49.
25. highway-mpg: continuous from 16 to 54.
26. price: continuous from 5118 to 45400."
我想要这样的文件:
"symboling
normalized-losses
make
fuel-type
.
.
.
"
我的尝试:
我知道选择正确信息(但带有数字)的正则表达式是:
([0-9]+\.\s([a-z]+-[a-z]+-[a-z]+))|([0-9]+\.\s[a-z]+-[a-z]+)|([0-9]+\.\s[a-z]+)
然后我在bash中尝试了以下命令:
egrep "([0-9]+\.\s([a-z]+-[a-z]+-[a-z]+))|([0-9]+\.\s[a-z]+-[a-z]+)|([0-9]+\.\s[a-z]+)" file.txt > names_col.txt
但没有像我预期的那样工作。任何建议都会很棒!
使用sed
$ sed '/^$/d;s/ [^[:alpha:]]*\([^:]*\)[^"]*//' input_file
"symboling
normalized-losses
make
fuel-type
aspiration
num-of-doors
body-style
drive-wheels
engine-location
wheel-base
length
width
height
curb-weight
engine-type
num-of-cylinders
engine-size
fuel-system
bore
stroke
compression-ratio
horsepower
peak-rpm
city-mpg
highway-mpg
price"
sed -En "s/^(.*[0-9].\s)([a-z\-]*)(:.*$)//p" file.txt > names_col.txt
使用您显示的示例,请尝试遵循 awk
程序。简单的解释就是,在这里使用 awk
的 gsub
(全局替换)。在我使用正则表达式 :[^"]*
和 [[:space:]]+[^.]*\.[[:space:]]+
的地方,在出现 "
之前从冒号中删除所有内容 AND 空格,然后是第一次出现的 .
,然后是带有 NULL 的空格,然后检查NF
如果一行不是空白,则打印该行。
awk '{gsub(/:[^"]*|[[:space:]]+[^.]*\.[[:space:]]+/,"")} NF' Input_file
我正在尝试创建一个仅包含变量名称的文件:
我正在使用自然表达式和 bash 终端。主 .txt 文件包含以下内容:
" 1. symboling: -3, -2, -1, 0, 1, 2, 3.
2. normalized-losses: continuous from 65 to 256.
3. make: alfa-romero, audi, bmw, chevrolet, dodge, honda,
4. fuel-type: diesel, gas.
5. aspiration: std, turbo.
6. num-of-doors: four, two.
7. body-style: hardtop, wagon, sedan, hatchback, convertible.
8. drive-wheels: 4wd, fwd, rwd.
9. engine-location: front, rear.
10. wheel-base: continuous from 86.6 120.9.
11. length: continuous from 141.1 to 208.1.
12. width: continuous from 60.3 to 72.3.
13. height: continuous from 47.8 to 59.8.
14. curb-weight: continuous from 1488 to 4066.
15. engine-type: dohc, dohcv, l, ohc, ohcf, ohcv, rotor.
16. num-of-cylinders: eight, five, four, six, three, twelve, two.
17. engine-size: continuous from 61 to 326.
18. fuel-system: 1bbl, 2bbl, 4bbl, idi, mfi, mpfi, spdi, spfi.
19. bore: continuous from 2.54 to 3.94.
20. stroke: continuous from 2.07 to 4.17.
21. compression-ratio: continuous from 7 to 23.
22. horsepower: continuous from 48 to 288.
23. peak-rpm: continuous from 4150 to 6600.
24. city-mpg: continuous from 13 to 49.
25. highway-mpg: continuous from 16 to 54.
26. price: continuous from 5118 to 45400."
我想要这样的文件:
"symboling
normalized-losses
make
fuel-type
.
.
.
"
我的尝试:
我知道选择正确信息(但带有数字)的正则表达式是:
([0-9]+\.\s([a-z]+-[a-z]+-[a-z]+))|([0-9]+\.\s[a-z]+-[a-z]+)|([0-9]+\.\s[a-z]+)
然后我在bash中尝试了以下命令:
egrep "([0-9]+\.\s([a-z]+-[a-z]+-[a-z]+))|([0-9]+\.\s[a-z]+-[a-z]+)|([0-9]+\.\s[a-z]+)" file.txt > names_col.txt
但没有像我预期的那样工作。任何建议都会很棒!
使用sed
$ sed '/^$/d;s/ [^[:alpha:]]*\([^:]*\)[^"]*//' input_file
"symboling
normalized-losses
make
fuel-type
aspiration
num-of-doors
body-style
drive-wheels
engine-location
wheel-base
length
width
height
curb-weight
engine-type
num-of-cylinders
engine-size
fuel-system
bore
stroke
compression-ratio
horsepower
peak-rpm
city-mpg
highway-mpg
price"
sed -En "s/^(.*[0-9].\s)([a-z\-]*)(:.*$)//p" file.txt > names_col.txt
使用您显示的示例,请尝试遵循 awk
程序。简单的解释就是,在这里使用 awk
的 gsub
(全局替换)。在我使用正则表达式 :[^"]*
和 [[:space:]]+[^.]*\.[[:space:]]+
的地方,在出现 "
之前从冒号中删除所有内容 AND 空格,然后是第一次出现的 .
,然后是带有 NULL 的空格,然后检查NF
如果一行不是空白,则打印该行。
awk '{gsub(/:[^"]*|[[:space:]]+[^.]*\.[[:space:]]+/,"")} NF' Input_file