Python - 如何遍历列表中的每个索引位置?
Python - How to loop through each index position in a list?
给定一个列表 [[["source1"], ["target1"], ["alignment1"]], ["source2"], ["target2"], ["alignment2"]], ...]
,我想提取源中与目标中的单词对齐的单词。
例如,在英德句子对 The hat is on the table 中。 - Der Hut liegt auf dem Tisch .,我想打印以下内容:
The - Der
hat - Hut
is - liegt
on - auf
the - dem
table - Tisch
. - .
所以我写了以下内容:
en_de = [
[['The', 'hat', 'is', 'on', 'the', 'table', '.'], ['Der', 'Hut', 'liegt', 'auf', 'dem', 'Tisch', '.'], '0-0 1-1 2-2 3-3 4-4 5-5 6-6'],
[['The', 'picture', 'is', 'on', 'the', 'wall', '.'], ['Das', 'Bild', 'hängt', 'an', 'der', 'Wand', '.'], '0-0 1-1 2-2 3-3 4-4 5-5 6-6'],
[['The', 'bottle', 'is', 'under', 'the', 'sink', '.'], ['Die', 'Flasche', 'ist', 'under', 'dem', 'Waschbecken', '.'], '0-0 1-1 2-2 3-3 4-4 5-5 6-6']
]
for group in en_de:
src_sent = group[0]
tgt_sent = group[1]
aligns = group[2]
split_aligns = aligns.split()
hyphen_split = [align.split("-") for align in split_aligns]
align_index = hyphen_split[0]
print(src_sent[int(align_index[0])],"-", tgt_sent[int(align_index[1])])
这会按预期打印 src_sent
和 tgt_sent
的索引位置 0 中的单词:
The - Der
The - Das
The - Die
现在,我不知道如何打印src_sent
和tgt_sent
的所有索引位置的单词。显然,我可以手动将 align_index
更新为句子对中每个位置的新索引位置,但在完整数据集上,某些句子最多会有 25 个索引位置。
有没有一种方法可以遍历每个索引位置?
当我尝试时:
align_index = hyphen_split[0:]
print(src_sent[int(align_index[0])],"-", tgt_sent[int(align_index[1])])
我得到一个 TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'
很明显, align_index
不能是一个列表,但我不确定如何将它转换成能做我想做的事情的东西。
任何建议或帮助将不胜感激。提前谢谢你。
IIUC 你想要这个:
en_de = [
[['The', 'hat', 'is', 'on', 'the', 'table', '.'], ['Der', 'Hut', 'liegt', 'auf', 'dem', 'Tisch', '.'], '0-0 1-1 2-2 3-3 4-4 5-5 6-6'],
[['The', 'picture', 'is', 'on', 'the', 'wall', '.'], ['Das', 'Bild', 'hängt', 'an', 'der', 'Wand', '.'], '0-0 1-1 2-2 3-3 4-4 5-5 6-6'],
[['The', 'bottle', 'is', 'under', 'the', 'sink', '.'], ['Die', 'Flasche', 'ist', 'under', 'dem', 'Waschbecken', '.'], '0-0 1-1 2-2 3-3 4-4 5-5 6-6']
]
for sentences in en_de:
for en, de in zip(*sentences[:2]):
print(f'{en} - {de}')
为每个句子打印成对的英语和德语。如果他们总是成对,这应该可行。因此,如果对齐始终是线性的,则根本没有必要。
如果对齐并不总是线性的,您也需要考虑到这一点:
en_de = [
[['The', 'hat', 'is', 'on', 'the', 'table', '.'], ['Der', 'Hut', 'liegt', 'auf', 'dem', 'Tisch', '.'], '0-0 1-1 2-2 3-3 4-4 5-5 6-6'],
[['The', 'picture', 'is', 'on', 'the', 'wall', '.'], ['Das', 'Bild', 'hängt', 'an', 'der', 'Wand', '.'], '0-0 1-1 2-2 3-3 4-4 5-5 6-6'],
[['The', 'bottle', 'is', 'under', 'the', 'sink', '.'], ['Die', 'Flasche', 'ist', 'under', 'dem', 'Waschbecken', '.'], '0-0 1-1 2-2 3-3 4-4 5-5 6-6']
]
for sentences in en_de:
# alternative to the below for loop
# alignment = [(int(a), int(b)) for a, b in [p.split('-') for p in sentences[2].split()]]
alignment = []
for pair in sentences[2].split():
e, g = pair.split('-')
alignment.append((int(e), int(g)))
english = [sentences[0][i] for i, _ in alignment]
german = [sentences[1][i] for _, i in alignment]
for en, ge in zip(english, german):
print(f'{en} - {ge}')
您忘记遍历 hyphen_split
列表:
for group in en_de:
src_sent = group[0]
tgt_sent = group[1]
aligns = group[2]
split_aligns = aligns.split()
hyphen_split = [align.split("-") for align in split_aligns]
for align_index in hyphen_split:
print(src_sent[int(align_index[0])],"-", tgt_sent[int(align_index[1])])
查看最后两行,根据您的代码更新。
给定一个列表 [[["source1"], ["target1"], ["alignment1"]], ["source2"], ["target2"], ["alignment2"]], ...]
,我想提取源中与目标中的单词对齐的单词。
例如,在英德句子对 The hat is on the table 中。 - Der Hut liegt auf dem Tisch .,我想打印以下内容:
The - Der
hat - Hut
is - liegt
on - auf
the - dem
table - Tisch
. - .
所以我写了以下内容:
en_de = [
[['The', 'hat', 'is', 'on', 'the', 'table', '.'], ['Der', 'Hut', 'liegt', 'auf', 'dem', 'Tisch', '.'], '0-0 1-1 2-2 3-3 4-4 5-5 6-6'],
[['The', 'picture', 'is', 'on', 'the', 'wall', '.'], ['Das', 'Bild', 'hängt', 'an', 'der', 'Wand', '.'], '0-0 1-1 2-2 3-3 4-4 5-5 6-6'],
[['The', 'bottle', 'is', 'under', 'the', 'sink', '.'], ['Die', 'Flasche', 'ist', 'under', 'dem', 'Waschbecken', '.'], '0-0 1-1 2-2 3-3 4-4 5-5 6-6']
]
for group in en_de:
src_sent = group[0]
tgt_sent = group[1]
aligns = group[2]
split_aligns = aligns.split()
hyphen_split = [align.split("-") for align in split_aligns]
align_index = hyphen_split[0]
print(src_sent[int(align_index[0])],"-", tgt_sent[int(align_index[1])])
这会按预期打印 src_sent
和 tgt_sent
的索引位置 0 中的单词:
The - Der
The - Das
The - Die
现在,我不知道如何打印src_sent
和tgt_sent
的所有索引位置的单词。显然,我可以手动将 align_index
更新为句子对中每个位置的新索引位置,但在完整数据集上,某些句子最多会有 25 个索引位置。
有没有一种方法可以遍历每个索引位置?
当我尝试时:
align_index = hyphen_split[0:]
print(src_sent[int(align_index[0])],"-", tgt_sent[int(align_index[1])])
我得到一个 TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'
很明显, align_index
不能是一个列表,但我不确定如何将它转换成能做我想做的事情的东西。
任何建议或帮助将不胜感激。提前谢谢你。
IIUC 你想要这个:
en_de = [
[['The', 'hat', 'is', 'on', 'the', 'table', '.'], ['Der', 'Hut', 'liegt', 'auf', 'dem', 'Tisch', '.'], '0-0 1-1 2-2 3-3 4-4 5-5 6-6'],
[['The', 'picture', 'is', 'on', 'the', 'wall', '.'], ['Das', 'Bild', 'hängt', 'an', 'der', 'Wand', '.'], '0-0 1-1 2-2 3-3 4-4 5-5 6-6'],
[['The', 'bottle', 'is', 'under', 'the', 'sink', '.'], ['Die', 'Flasche', 'ist', 'under', 'dem', 'Waschbecken', '.'], '0-0 1-1 2-2 3-3 4-4 5-5 6-6']
]
for sentences in en_de:
for en, de in zip(*sentences[:2]):
print(f'{en} - {de}')
为每个句子打印成对的英语和德语。如果他们总是成对,这应该可行。因此,如果对齐始终是线性的,则根本没有必要。
如果对齐并不总是线性的,您也需要考虑到这一点:
en_de = [
[['The', 'hat', 'is', 'on', 'the', 'table', '.'], ['Der', 'Hut', 'liegt', 'auf', 'dem', 'Tisch', '.'], '0-0 1-1 2-2 3-3 4-4 5-5 6-6'],
[['The', 'picture', 'is', 'on', 'the', 'wall', '.'], ['Das', 'Bild', 'hängt', 'an', 'der', 'Wand', '.'], '0-0 1-1 2-2 3-3 4-4 5-5 6-6'],
[['The', 'bottle', 'is', 'under', 'the', 'sink', '.'], ['Die', 'Flasche', 'ist', 'under', 'dem', 'Waschbecken', '.'], '0-0 1-1 2-2 3-3 4-4 5-5 6-6']
]
for sentences in en_de:
# alternative to the below for loop
# alignment = [(int(a), int(b)) for a, b in [p.split('-') for p in sentences[2].split()]]
alignment = []
for pair in sentences[2].split():
e, g = pair.split('-')
alignment.append((int(e), int(g)))
english = [sentences[0][i] for i, _ in alignment]
german = [sentences[1][i] for _, i in alignment]
for en, ge in zip(english, german):
print(f'{en} - {ge}')
您忘记遍历 hyphen_split
列表:
for group in en_de:
src_sent = group[0]
tgt_sent = group[1]
aligns = group[2]
split_aligns = aligns.split()
hyphen_split = [align.split("-") for align in split_aligns]
for align_index in hyphen_split:
print(src_sent[int(align_index[0])],"-", tgt_sent[int(align_index[1])])
查看最后两行,根据您的代码更新。