如何从包含具有相同单词的行数的文件中仅一次提取给定单词的一行
How to extract a line for given word for only one time from a file containing number of lines with same word
我有一个包含一个月数据的数据文件。文件格式如下:
VAAU Observations at 00Z 02 Aug 2017
-------------------------------------------------------------------------------------------
PRES HGHT TEMP DWPT FRPT RELH RELI MIXR DRCT SKNT THTA THTE THTV
hPa m C C C % % g/kg deg knot K K K
-------------------------------------------------------------------------------------------
1000.0 66
942.0 579 22.6 20.3 20.3 87 87 16.20 270 4 300.8 348.6 303.8
925.0 747 21.6 19.9 19.9 90 90 16.09 265 10 301.4 348.9 304.3
850.0 1481 18.8 17.1 17.1 90 90 14.65 275 19 305.8 350.0 308.5
812.0 1873 17.3 14.1 14.1 82 82 12.60 275 22 308.2 346.6 310.6
...................
Station information and sounding indices
Station identifier: VAAU
Station number: 43014
Observation time: 170801/0000
Station latitude: 19.85
Station longitude: 75.40
Station elevation: 579.0
Showalter index: 0.92
Lifted index: 0.99
LIFT computed using virtual temperature: 0.46
SWEAT index: 255.81
K index: 34.70
Cross totals index: 19.70
Vertical totals index: 20.10
Totals totals index: 39.80
Convective Available Potential Energy: 5.98
CAPE using virtual temperature: 9.37
Convective Inhibition: -81.35
CINS using virtual temperature: -69.07
Equilibrum Level: 617.53
Equilibrum Level using virtual temperature: 523.66
Level of Free Convection: 662.87
LFCT using virtual temperature: 669.25
Bulk Richardson Number: 4.12
Bulk Richardson Number using CAPV: 6.44
Temp [K] of the Lifted Condensation Level: 292.45
Pres [hPa] of the Lifted Condensation Level: 894.64
Mean mixed layer potential temperature: 301.92
Mean mixed layer mixing ratio: 16.03
1000 hPa to 500 hPa thickness: 5818.00
Precipitable water [mm] for entire sounding: 51.19
一个月内每天都会重复同样的事情。
我只想从该文件中提取 Station identifier, Station number, Station latitude & Station longitude
一次。
我尝试使用 python 脚本,但没有得到想要的输出。
即使我也尝试过 grep:
grep -E "Station number|Station latitude|Station longitude|Station identifier" wrkk_2017.out
for line in open('vaau_2017.out'):
rec = line.strip()
words = ["Station identifier:", "Station number:", "Station latitude:", "Station longitude"]
for rec in words:
if rec in line:
print (line)
break
我只需要站标识符:..., Station number:...., Station latitude:......, Station longitude:....
只有一次,但我得到了它在该文件中的次数。
您可以添加一个布尔数组,如果已经找到一个词,您可以在其中保存:
still_left = [True] * len(words)
for line in open('vaau_2017.out'):
for i, w in enumerate(words):
if w in line and still_left[i]:
print(line)
still_left[i] = False
if sum(still_left)==0:
break
示例:
s = '''id: 1
num: 2
lat: 3
lon: 4
id: 1
num: 2
lat: 3
lon: 4'''
words = ['id', 'num', 'lat', 'lon']
still_left = [True] * len(words)
for line in s.splitlines(): # for line in open('vaau_2017.out'):
for i, w in enumerate(words):
if w in line and still_left[i]:
print(line)
still_left[i] = False
# id: 1
# num: 2
# lat: 3
# lon: 4
如果你想在找到所有单词后立即中断阅读文件,你可以添加
if sum(still_left)==0:
break
在内部 for i, w...
循环后面的 for line...
层。
您可以使用正则表达式 -
a = 'Station information and sounding indices Station identifier: VAAU Station number: 43014 Observation time: 170801/0000 Station latitude: 19.85 Station longitude: 75.40 Station elevation: 579.0 Showalter index: 0.92 Lifted index: 0.99 LIFT computed using virtual temperature: 0.46 SWEAT index: 255.81 K index: 34.70 Cross totals index: 19.70 Vertical totals index: 20.10'
station_identifier = re.search('Station identifier: ([A-Z]+)',a).group(1)
print station_identifier #VAAU
station_number = re.search('Station number: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',a).group(1)
print station_number #43014
station_latitude = re.search('Station latitude: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',a).group(1)
print station_latitude #19.85
station_longitude = re.search('Station longitude: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',a).group(1)
print station_longitude #75.40
学习路径:
https://www.programiz.com/python-programming/regex
编辑:
您问题的解决方案-
filename = "vaau_2017.out"
with open(filename) as f:
for line in f.readlines():
if 'Station identifier' in line:
station_identifier = re.search('Station identifier: ([\sA-Z]+)',line).group(1)
print station_identifier #VAAU
if 'Station number' in line:
station_number = re.search('Station number: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',line).group(1)
print station_number #43014
if 'Station latitude' in line:
station_latitude = re.search('Station latitude: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',line).group(1)
print station_latitude #19.85
if 'Station longitude' in line:
station_longitude = re.search('Station longitude: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',line).group(1)
print station_longitude #75.40
我有一个包含一个月数据的数据文件。文件格式如下:
VAAU Observations at 00Z 02 Aug 2017
-------------------------------------------------------------------------------------------
PRES HGHT TEMP DWPT FRPT RELH RELI MIXR DRCT SKNT THTA THTE THTV
hPa m C C C % % g/kg deg knot K K K
-------------------------------------------------------------------------------------------
1000.0 66
942.0 579 22.6 20.3 20.3 87 87 16.20 270 4 300.8 348.6 303.8
925.0 747 21.6 19.9 19.9 90 90 16.09 265 10 301.4 348.9 304.3
850.0 1481 18.8 17.1 17.1 90 90 14.65 275 19 305.8 350.0 308.5
812.0 1873 17.3 14.1 14.1 82 82 12.60 275 22 308.2 346.6 310.6
...................
Station information and sounding indices
Station identifier: VAAU
Station number: 43014
Observation time: 170801/0000
Station latitude: 19.85
Station longitude: 75.40
Station elevation: 579.0
Showalter index: 0.92
Lifted index: 0.99
LIFT computed using virtual temperature: 0.46
SWEAT index: 255.81
K index: 34.70
Cross totals index: 19.70
Vertical totals index: 20.10
Totals totals index: 39.80
Convective Available Potential Energy: 5.98
CAPE using virtual temperature: 9.37
Convective Inhibition: -81.35
CINS using virtual temperature: -69.07
Equilibrum Level: 617.53
Equilibrum Level using virtual temperature: 523.66
Level of Free Convection: 662.87
LFCT using virtual temperature: 669.25
Bulk Richardson Number: 4.12
Bulk Richardson Number using CAPV: 6.44
Temp [K] of the Lifted Condensation Level: 292.45
Pres [hPa] of the Lifted Condensation Level: 894.64
Mean mixed layer potential temperature: 301.92
Mean mixed layer mixing ratio: 16.03
1000 hPa to 500 hPa thickness: 5818.00
Precipitable water [mm] for entire sounding: 51.19
一个月内每天都会重复同样的事情。
我只想从该文件中提取 Station identifier, Station number, Station latitude & Station longitude
一次。
我尝试使用 python 脚本,但没有得到想要的输出。 即使我也尝试过 grep:
grep -E "Station number|Station latitude|Station longitude|Station identifier" wrkk_2017.out
for line in open('vaau_2017.out'):
rec = line.strip()
words = ["Station identifier:", "Station number:", "Station latitude:", "Station longitude"]
for rec in words:
if rec in line:
print (line)
break
我只需要站标识符:..., Station number:...., Station latitude:......, Station longitude:....
只有一次,但我得到了它在该文件中的次数。
您可以添加一个布尔数组,如果已经找到一个词,您可以在其中保存:
still_left = [True] * len(words)
for line in open('vaau_2017.out'):
for i, w in enumerate(words):
if w in line and still_left[i]:
print(line)
still_left[i] = False
if sum(still_left)==0:
break
示例:
s = '''id: 1
num: 2
lat: 3
lon: 4
id: 1
num: 2
lat: 3
lon: 4'''
words = ['id', 'num', 'lat', 'lon']
still_left = [True] * len(words)
for line in s.splitlines(): # for line in open('vaau_2017.out'):
for i, w in enumerate(words):
if w in line and still_left[i]:
print(line)
still_left[i] = False
# id: 1
# num: 2
# lat: 3
# lon: 4
如果你想在找到所有单词后立即中断阅读文件,你可以添加
if sum(still_left)==0:
break
在内部 for i, w...
循环后面的 for line...
层。
您可以使用正则表达式 -
a = 'Station information and sounding indices Station identifier: VAAU Station number: 43014 Observation time: 170801/0000 Station latitude: 19.85 Station longitude: 75.40 Station elevation: 579.0 Showalter index: 0.92 Lifted index: 0.99 LIFT computed using virtual temperature: 0.46 SWEAT index: 255.81 K index: 34.70 Cross totals index: 19.70 Vertical totals index: 20.10'
station_identifier = re.search('Station identifier: ([A-Z]+)',a).group(1)
print station_identifier #VAAU
station_number = re.search('Station number: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',a).group(1)
print station_number #43014
station_latitude = re.search('Station latitude: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',a).group(1)
print station_latitude #19.85
station_longitude = re.search('Station longitude: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',a).group(1)
print station_longitude #75.40
学习路径:
https://www.programiz.com/python-programming/regex
编辑:
您问题的解决方案-
filename = "vaau_2017.out"
with open(filename) as f:
for line in f.readlines():
if 'Station identifier' in line:
station_identifier = re.search('Station identifier: ([\sA-Z]+)',line).group(1)
print station_identifier #VAAU
if 'Station number' in line:
station_number = re.search('Station number: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',line).group(1)
print station_number #43014
if 'Station latitude' in line:
station_latitude = re.search('Station latitude: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',line).group(1)
print station_latitude #19.85
if 'Station longitude' in line:
station_longitude = re.search('Station longitude: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',line).group(1)
print station_longitude #75.40