如何从 python 中带有额外空格的列表中提取数据
How to extract data from a list with addtional spaces in between them in python
代码正在尝试从文件中提取:(格式:group, team, val1, val2)。但是,如果没有额外的 space 并且在中间有额外 space 的行中产生错误结果,则某些结果是正确的。
data = {}
with open('source.txt') as f:
for line in f:
print ("this is the line data: ", line)
needed = line.split()[0:2]
print ("this is what i need: ", needed)
source.txt #-- 格式:group, team, val1, val2
alpha diehard group 1 54,00.01
bravo nevermindteam 3 500,000.00
charlie team ultimatum 1 27,722.29 (0.45)
charlie team ultimatum 10 252,336,733.383 (2.06)
delta beyond-imagination 2 11 ()
echo double doubt 5 143,299.00 (1)
echo double doubt 8 145,300 (5.01)
falcon revengers 3 0.1234
falcon revengers 5 9.19
lima almost done 6 45.00181 (.9)
romeo ontheway home 12 980
我试图只提取 val1 之前的值。 #-- 小组,团队
alpha diehard group
bravo nevermindteam
charlie team ultimatum
delta beyond-imagination
echo double doubt
falcon revengers
lima almost done
romeo ontheway home
试试
with open('source.txt') as f:
for line in f:
new_line = ' '.join(filter(lambda s: s.isalpha() , l.split(' ')))
print(new_line)
代码对空格的数量很敏感。
使用正则表达式
import regex
with open('source.txt', 'r') as f:
text = re.sub(r'[0-9|,|\.|\(|\)|$|\s]+\n', '\n', f.read()+'\n', re.M)
我是这样做的,基本上遍历所有单词并在您点击数字时停止:
data = {}
with open('source.txt') as f:
for line in f:
print ("this is the line data: ", line)
split_line = line.split()
for i in range (len(split_line)):
if split_line[i].isnumeric():
break
needed = split_line[0:i]
print ("this is what i need: ", needed)
使用正则表达式。
import regex as re
with open('source.txt') as f:
for line in f:
found = re.search("(.*?)\d", line)
needed = found.group(1).split()[0:3]
print(needed)
输出:
['alpha', 'diehard', 'group']
['bravo', 'nevermindteam']
['charlie', 'team', 'ultimatum']
['charlie', 'team', 'ultimatum']
['delta', 'beyond-imagination']
['echo', 'double', 'doubt']
['echo', 'double', 'doubt']
['falcon', 'revengers']
['falcon', 'revengers']
['lima', 'almost', 'done']
['romeo', 'ontheway', 'home']
代码正在尝试从文件中提取:(格式:group, team, val1, val2)。但是,如果没有额外的 space 并且在中间有额外 space 的行中产生错误结果,则某些结果是正确的。
data = {}
with open('source.txt') as f:
for line in f:
print ("this is the line data: ", line)
needed = line.split()[0:2]
print ("this is what i need: ", needed)
source.txt #-- 格式:group, team, val1, val2
alpha diehard group 1 54,00.01
bravo nevermindteam 3 500,000.00
charlie team ultimatum 1 27,722.29 (0.45)
charlie team ultimatum 10 252,336,733.383 (2.06)
delta beyond-imagination 2 11 ()
echo double doubt 5 143,299.00 (1)
echo double doubt 8 145,300 (5.01)
falcon revengers 3 0.1234
falcon revengers 5 9.19
lima almost done 6 45.00181 (.9)
romeo ontheway home 12 980
我试图只提取 val1 之前的值。 #-- 小组,团队
alpha diehard group
bravo nevermindteam
charlie team ultimatum
delta beyond-imagination
echo double doubt
falcon revengers
lima almost done
romeo ontheway home
试试
with open('source.txt') as f:
for line in f:
new_line = ' '.join(filter(lambda s: s.isalpha() , l.split(' ')))
print(new_line)
代码对空格的数量很敏感。
使用正则表达式
import regex
with open('source.txt', 'r') as f:
text = re.sub(r'[0-9|,|\.|\(|\)|$|\s]+\n', '\n', f.read()+'\n', re.M)
我是这样做的,基本上遍历所有单词并在您点击数字时停止:
data = {}
with open('source.txt') as f:
for line in f:
print ("this is the line data: ", line)
split_line = line.split()
for i in range (len(split_line)):
if split_line[i].isnumeric():
break
needed = split_line[0:i]
print ("this is what i need: ", needed)
使用正则表达式。
import regex as re
with open('source.txt') as f:
for line in f:
found = re.search("(.*?)\d", line)
needed = found.group(1).split()[0:3]
print(needed)
输出:
['alpha', 'diehard', 'group']
['bravo', 'nevermindteam']
['charlie', 'team', 'ultimatum']
['charlie', 'team', 'ultimatum']
['delta', 'beyond-imagination']
['echo', 'double', 'doubt']
['echo', 'double', 'doubt']
['falcon', 'revengers']
['falcon', 'revengers']
['lima', 'almost', 'done']
['romeo', 'ontheway', 'home']