Python - 分组顺序数组成员
Python - Group Sequential Array Members
我想这样编辑我的文字:
arr = []
# arr is full of tokenized words from my text
例如:
"Abraham Lincoln Hotel is very beautiful place and i want to go there with
Barbara Palvin. Also there are stores like Adidas ,Nike , Reebok."
编辑:基本上我想检测专有名称并通过在 for 语句中使用 istitle() 和 isAlpha() 对它们进行分组,例如:
for i in arr:
if arr[i].istitle() and arr[i].isAlpha
在这个例子中,直到下一个单词的第一个字母不是大写时,才会追加。
arr[0] + arr[1] + arr[2] = arr[0]
#Abraham Lincoln Hotel
这就是我想要的新 arr:
['Abraham Lincoln Hotel'] is very beautiful place and i want to go there with
['Barbara Palvin']. ['Also'] there are stores like ['Adidas'], ['Nike'],
['Reebok'].
"Also" 对我来说不是问题,当我尝试与我的数据集匹配时它会很有用。
这是你问的吗?
sentence = "Abraham Lincoln Hotel is very beautiful place and i want to go there with Barbara Palvin. Also there are stores like Adidas ,Nike , Reebok."
chars = ".!?," # Characters you want to remove from the words in the array
table = chars.maketrans(chars, " " * len(chars)) # Create a table for replacing characters
sentence = sentence.translate(table) # Replace characters with spaces
arr = sentence.split() # Split the string into an array whereever a space occurs
print(arr)
输出为:
['Abraham',
'Lincoln',
'Hotel',
'is',
'very',
'beautiful',
'place',
'and',
'i',
'want',
'to',
'go',
'there',
'with',
'Barbara',
'Palvin',
'Also',
'there',
'are',
'stores',
'like',
'Adidas',
'Nike',
'Reebok']
关于此代码的注意事项:chars
变量中的任何字符都将从数组中的字符串中删除。解释在代码中。
要删除非名称,只需执行以下操作:
import string
new_arr = []
for i in arr:
if i[0] in string.ascii_uppercase:
new_arr.append(i)
此代码将包括所有以大写字母开头的单词。
要解决此问题,您需要将 chars
更改为:
chars = ","
并将上面的代码改为:
import string
new_arr = []
end = ".!?"
b = 1
for i in arr:
if i[0] in string.ascii_uppercase and arr[b-1][-1] not in end:
new_arr.append(i)
b += 1
这将输出:
['Abraham',
'Lincoln',
'Hotel',
'Barbara',
'Palvin.',
'Adidas',
'Nike',
'Reebok.']
你可以这样做:
sentence = "Abraham Lincoln Hotel is very beautiful place and i want to go there with Barbara Palvin. Also there are stores like Adidas, Nike, Reebok."
all_words = sentence.split()
last_word_index = -100
proper_nouns = []
for idx, word in enumerate(all_words):
if(word.istitle() and word.isalpha()):
if(last_word_index == idx-1):
proper_nouns[-1] = proper_nouns[-1] + " " + word
else:
proper_nouns.append(word)
last_word_index = idx
print(proper_nouns)
此代码将:
- 将所有单词拆分成一个列表
- 遍历所有单词和
- 如果最后一个大写的单词是前一个单词,它将把它附加到列表中的最后一个条目
- 否则它会将单词存储为列表中的新条目
- 记录最后一次找到大写单词的索引
我想这样编辑我的文字:
arr = []
# arr is full of tokenized words from my text
例如:
"Abraham Lincoln Hotel is very beautiful place and i want to go there with
Barbara Palvin. Also there are stores like Adidas ,Nike , Reebok."
编辑:基本上我想检测专有名称并通过在 for 语句中使用 istitle() 和 isAlpha() 对它们进行分组,例如:
for i in arr:
if arr[i].istitle() and arr[i].isAlpha
在这个例子中,直到下一个单词的第一个字母不是大写时,才会追加。
arr[0] + arr[1] + arr[2] = arr[0]
#Abraham Lincoln Hotel
这就是我想要的新 arr:
['Abraham Lincoln Hotel'] is very beautiful place and i want to go there with
['Barbara Palvin']. ['Also'] there are stores like ['Adidas'], ['Nike'],
['Reebok'].
"Also" 对我来说不是问题,当我尝试与我的数据集匹配时它会很有用。
这是你问的吗?
sentence = "Abraham Lincoln Hotel is very beautiful place and i want to go there with Barbara Palvin. Also there are stores like Adidas ,Nike , Reebok."
chars = ".!?," # Characters you want to remove from the words in the array
table = chars.maketrans(chars, " " * len(chars)) # Create a table for replacing characters
sentence = sentence.translate(table) # Replace characters with spaces
arr = sentence.split() # Split the string into an array whereever a space occurs
print(arr)
输出为:
['Abraham',
'Lincoln',
'Hotel',
'is',
'very',
'beautiful',
'place',
'and',
'i',
'want',
'to',
'go',
'there',
'with',
'Barbara',
'Palvin',
'Also',
'there',
'are',
'stores',
'like',
'Adidas',
'Nike',
'Reebok']
关于此代码的注意事项:chars
变量中的任何字符都将从数组中的字符串中删除。解释在代码中。
要删除非名称,只需执行以下操作:
import string
new_arr = []
for i in arr:
if i[0] in string.ascii_uppercase:
new_arr.append(i)
此代码将包括所有以大写字母开头的单词。
要解决此问题,您需要将 chars
更改为:
chars = ","
并将上面的代码改为:
import string
new_arr = []
end = ".!?"
b = 1
for i in arr:
if i[0] in string.ascii_uppercase and arr[b-1][-1] not in end:
new_arr.append(i)
b += 1
这将输出:
['Abraham',
'Lincoln',
'Hotel',
'Barbara',
'Palvin.',
'Adidas',
'Nike',
'Reebok.']
你可以这样做:
sentence = "Abraham Lincoln Hotel is very beautiful place and i want to go there with Barbara Palvin. Also there are stores like Adidas, Nike, Reebok."
all_words = sentence.split()
last_word_index = -100
proper_nouns = []
for idx, word in enumerate(all_words):
if(word.istitle() and word.isalpha()):
if(last_word_index == idx-1):
proper_nouns[-1] = proper_nouns[-1] + " " + word
else:
proper_nouns.append(word)
last_word_index = idx
print(proper_nouns)
此代码将:
- 将所有单词拆分成一个列表
- 遍历所有单词和
- 如果最后一个大写的单词是前一个单词,它将把它附加到列表中的最后一个条目
- 否则它会将单词存储为列表中的新条目
- 记录最后一次找到大写单词的索引