从列表中提取出发和到达
Extract departure and arrival from a list
我正在尝试从结构和长度可变的列表中提取一些参数。基本上,这些参数是路线的出发地址和到达地址。此列表是根据自然语言中的一个句子构建的,因此它不遵循任何特定模板:
1st example : ['go', 'Buzenval', 'from', 'Chatelet']
2nd example : ['How', 'go', 'street', 'Saint', 'Augustin', 'from', 'Buzenval']
3rd example : ['go', 'from', '33', 'street', 'Republique', 'to', '12','street','Napoleon']
我已经设法为每种情况创建了另一个非常相似的列表,除了出发和到达被实际单词 'departure' 和 'arrival' 代替。通过上面的例子,我得到:
1st example : ['go', 'arrival', 'from', 'departure']
2nd example : ['How', 'go', 'arrival', 'from', 'departure']
3rd example : ['go', 'from', 'departure', 'to', 'arrival']
现在我有了这两种列表,我想确定出发和到达:
1rst example : departure = ['Chatelet'], arrival = ['Buzenval']
2nd example : departure = ['Buzenval'], arrival = ['street','Saint','Augustin']
3rd example : departure = ['33','street','Republique'], arrival = ['12','street','Napoleon']
基本上,参数是两个列表中不同的所有内容,但我需要确定哪个是出发,哪个是到达。我认为 Regex 可以在这方面帮助我,但我不知道如何。
感谢您的帮助!
正则表达式在这方面肯定会有帮助,但我尝试了一种简单的方法。如果您提到的模式适用于所有人,则此适用。我正在展示它作为第一个例子。您可以对其余部分应用相同的逻辑并修改代码:
代码:
first = ['go', 'Buzenval', 'from', 'Chatelet'] # First Example
start = first.index('go')
end = first.index('from')
arrival = base[start+1:end]
departure = base[end+1:]
print("Departure: {0} , Arrival: {1}".format(departure,arrival))
输出:
Departure: ['Chatelet'] , Arrival: ['Buzenval']
我找到了解决您的三个示例的方法。你应该改变的一件事是变量名,我不知道如何命名它们。 (这是老版本,速度慢,难懂,后面那个更好)
def extract_places(names, modes):
keywords = set(modes).intersection(names)
extracted = [[] for _ in modes]
j = 0
for i, mode in enumerate(modes):
if mode.lower() in keywords:
if mode.lower() != names[j].lower():
while mode.lower() != names[j].lower():
extracted[i - 1].append(names[j])
j += 1
else:
extracted[i].append(names[j])
j += 1
else:
if names[j].lower() not in keywords:
while j < len(names) and names[j].lower() not in keywords:
extracted[i].append(names[j])
j += 1
extracted = dict(zip(modes, extracted))
return extracted["arrival"], extracted["departure"]
我找到了另一种方法,可能更容易理解。但是这种方式比第一种方式快十倍,所以你可能想使用它。
def partition(l, word): # Helper to split a list or tuple at an specific element
i = l.index(word)
return l[:i], l[i + 1:]
def extract_places(names, modes):
keywords = set(modes).intersection(names)
mapped = [(modes, names)]
for word in keywords:
new_mapped = []
for mode,name in mapped:
if word in mode:
m1, m2 = partition(mode, word)
n1, n2 = partition(name, word)
if m1:
new_mapped.append((m1, n1))
if m2:
new_mapped.append((m2, n2))
else:
new_mapped.append((mode,name))
mapped = new_mapped
mapped = {m[0]: n for m, n in mapped}
return mapped['arrival'], mapped['departure']
两种方式完全相同:
for example in ((['go', 'Buzenval', 'from', 'Chatelet'],
['go', 'arrival', 'from', 'departure']
),
(['How', 'go', 'street', 'Saint', 'Augustin', 'from', 'Buzenval'],
['How', 'go', 'arrival', 'from', 'departure']
),
(['go', 'from', '33', 'street', 'Republique', 'to', '12', 'street', 'Napoleon'],
['go', 'from', 'departure', 'to', 'arrival']
)):
print(extract_places(*example))
两者的打印:
(['Buzenval'], ['Chatelet'])
(['street', 'Saint', 'Augustin'], ['Buzenval'])
(['12', 'street', 'Napoleon'], ['33', 'street', 'Republique'])
来自 Python
解释器的示例:
>>> import itertools
>>> key = None
>>> arr = ['go', 'from', '33', 'street', 'Republique', 'to', '12','street','Napoleon']
>>>
>>> for k, group in itertools.groupby(arr, lambda x: x in ['go', 'to','from']):
... if k:
... key = list(group)[-1]
... continue
... if key is not None:
... if key == 'from':
... tag = 'departure'
... else:
... tag = 'arrival'
... print tag, list(group)
... key = None
...
departure ['33', 'street', 'Republique']
arrival ['12', 'street', 'Napoleon']
这应该适合你:
l1 = ['go', 'Buzenval', 'from', 'Chatelet']
l2 = ['How', 'go', 'street', 'Saint', 'Augustin', 'from', 'Buzenval']
l3 = ['go', 'from', '33', 'street', 'Republique', 'to', '12','street','Napoleon']
def get_locations (inlist):
marker = 0
end_dep = 0
start_dep = 0
for word in inlist:
if word =="go":
if inlist[marker+1] != "from":
end_dep = marker +1
else:
start_dep = marker +2
if word =="from" and start_dep == 0:
start_dep = marker + 1
if word == "to":
end_dep = marker + 1
marker +=1
if end_dep > start_dep:
start_loc = inlist[start_dep:end_dep-1]
end_loc = inlist[end_dep:]
else:
start_loc = inlist [start_dep:]
end_loc = inlist[end_dep: start_dep -1]
return start_loc, end_loc
directions = get_locations (l3) #change to l1 / l2 to see other outputs
print( "departure = " + str(directions[0]))
print( "arrival = " + str(directions[1]))
我正在尝试从结构和长度可变的列表中提取一些参数。基本上,这些参数是路线的出发地址和到达地址。此列表是根据自然语言中的一个句子构建的,因此它不遵循任何特定模板:
1st example : ['go', 'Buzenval', 'from', 'Chatelet']
2nd example : ['How', 'go', 'street', 'Saint', 'Augustin', 'from', 'Buzenval']
3rd example : ['go', 'from', '33', 'street', 'Republique', 'to', '12','street','Napoleon']
我已经设法为每种情况创建了另一个非常相似的列表,除了出发和到达被实际单词 'departure' 和 'arrival' 代替。通过上面的例子,我得到:
1st example : ['go', 'arrival', 'from', 'departure']
2nd example : ['How', 'go', 'arrival', 'from', 'departure']
3rd example : ['go', 'from', 'departure', 'to', 'arrival']
现在我有了这两种列表,我想确定出发和到达:
1rst example : departure = ['Chatelet'], arrival = ['Buzenval']
2nd example : departure = ['Buzenval'], arrival = ['street','Saint','Augustin']
3rd example : departure = ['33','street','Republique'], arrival = ['12','street','Napoleon']
基本上,参数是两个列表中不同的所有内容,但我需要确定哪个是出发,哪个是到达。我认为 Regex 可以在这方面帮助我,但我不知道如何。
感谢您的帮助!
正则表达式在这方面肯定会有帮助,但我尝试了一种简单的方法。如果您提到的模式适用于所有人,则此适用。我正在展示它作为第一个例子。您可以对其余部分应用相同的逻辑并修改代码:
代码:
first = ['go', 'Buzenval', 'from', 'Chatelet'] # First Example
start = first.index('go')
end = first.index('from')
arrival = base[start+1:end]
departure = base[end+1:]
print("Departure: {0} , Arrival: {1}".format(departure,arrival))
输出:
Departure: ['Chatelet'] , Arrival: ['Buzenval']
我找到了解决您的三个示例的方法。你应该改变的一件事是变量名,我不知道如何命名它们。 (这是老版本,速度慢,难懂,后面那个更好)
def extract_places(names, modes):
keywords = set(modes).intersection(names)
extracted = [[] for _ in modes]
j = 0
for i, mode in enumerate(modes):
if mode.lower() in keywords:
if mode.lower() != names[j].lower():
while mode.lower() != names[j].lower():
extracted[i - 1].append(names[j])
j += 1
else:
extracted[i].append(names[j])
j += 1
else:
if names[j].lower() not in keywords:
while j < len(names) and names[j].lower() not in keywords:
extracted[i].append(names[j])
j += 1
extracted = dict(zip(modes, extracted))
return extracted["arrival"], extracted["departure"]
我找到了另一种方法,可能更容易理解。但是这种方式比第一种方式快十倍,所以你可能想使用它。
def partition(l, word): # Helper to split a list or tuple at an specific element
i = l.index(word)
return l[:i], l[i + 1:]
def extract_places(names, modes):
keywords = set(modes).intersection(names)
mapped = [(modes, names)]
for word in keywords:
new_mapped = []
for mode,name in mapped:
if word in mode:
m1, m2 = partition(mode, word)
n1, n2 = partition(name, word)
if m1:
new_mapped.append((m1, n1))
if m2:
new_mapped.append((m2, n2))
else:
new_mapped.append((mode,name))
mapped = new_mapped
mapped = {m[0]: n for m, n in mapped}
return mapped['arrival'], mapped['departure']
两种方式完全相同:
for example in ((['go', 'Buzenval', 'from', 'Chatelet'],
['go', 'arrival', 'from', 'departure']
),
(['How', 'go', 'street', 'Saint', 'Augustin', 'from', 'Buzenval'],
['How', 'go', 'arrival', 'from', 'departure']
),
(['go', 'from', '33', 'street', 'Republique', 'to', '12', 'street', 'Napoleon'],
['go', 'from', 'departure', 'to', 'arrival']
)):
print(extract_places(*example))
两者的打印:
(['Buzenval'], ['Chatelet'])
(['street', 'Saint', 'Augustin'], ['Buzenval'])
(['12', 'street', 'Napoleon'], ['33', 'street', 'Republique'])
来自 Python
解释器的示例:
>>> import itertools
>>> key = None
>>> arr = ['go', 'from', '33', 'street', 'Republique', 'to', '12','street','Napoleon']
>>>
>>> for k, group in itertools.groupby(arr, lambda x: x in ['go', 'to','from']):
... if k:
... key = list(group)[-1]
... continue
... if key is not None:
... if key == 'from':
... tag = 'departure'
... else:
... tag = 'arrival'
... print tag, list(group)
... key = None
...
departure ['33', 'street', 'Republique']
arrival ['12', 'street', 'Napoleon']
这应该适合你:
l1 = ['go', 'Buzenval', 'from', 'Chatelet']
l2 = ['How', 'go', 'street', 'Saint', 'Augustin', 'from', 'Buzenval']
l3 = ['go', 'from', '33', 'street', 'Republique', 'to', '12','street','Napoleon']
def get_locations (inlist):
marker = 0
end_dep = 0
start_dep = 0
for word in inlist:
if word =="go":
if inlist[marker+1] != "from":
end_dep = marker +1
else:
start_dep = marker +2
if word =="from" and start_dep == 0:
start_dep = marker + 1
if word == "to":
end_dep = marker + 1
marker +=1
if end_dep > start_dep:
start_loc = inlist[start_dep:end_dep-1]
end_loc = inlist[end_dep:]
else:
start_loc = inlist [start_dep:]
end_loc = inlist[end_dep: start_dep -1]
return start_loc, end_loc
directions = get_locations (l3) #change to l1 / l2 to see other outputs
print( "departure = " + str(directions[0]))
print( "arrival = " + str(directions[1]))