如何将每行的每个单词转换为数据框的数值
How to convert each word of each row to numeric value of a dataframe
这个数据框是给我的。
我想要的字典输出是这样的
**Given the following dictionary:-**
d = {'I': 30,'am': 45,'good': 90,'boy': 50,'We':100,'are':70,'going':110}
如何使用 python ..
我试过这样但失败了:(
dataframe['new'] = data['documents'].apply(lambda x: dictionary[x])
请帮帮我。提前致谢。
您可以使用 explode
获取单词然后映射到您的字典并重塑您的数据框:
MAPPING = {'I': 30,'am': 45,'good': 90,'boy': 50,'We':100,'are':70,'going':110}
df['documents'] = (df['documents'].str.split().explode().map(MAPPING).astype(str)
.groupby(level=0).agg(list).str.join(' '))
print(df)
# Output
id documents
0 0 30 45 90 50
1 1 100 70 110
2 2 30 45 110
循序渐进
第 1 阶段:爆炸
# Split phrase into words
>>> out = df['documents'].str.split()
0 [I, am, good, boy]
1 [We, are, going]
2 [I, am, going]
Name: documents, dtype: object
# Explode lists into scalar values
>>> out = out.explode()
0 I
0 am
0 good
0 boy
1 We
1 are
1 going
2 I
2 am
2 going
Name: documents, dtype: object
第 2 阶段:转换
# Convert words with your dict mapping and convert as string
>>> out = out.map(MAPPING).astype(str)
0 30
0 45
0 90
0 50
1 100
1 70
1 110
2 30
2 45
2 110
Name: documents, dtype: object # <- .astype(str)
第 3 阶段:重塑
# Group by index (level=0) then aggregate to a list
>>> out = out.groupby(level=0).agg(list)
0 [30, 45, 90, 50]
1 [100, 70, 110]
2 [30, 45, 110]
Name: documents, dtype: object
# Join your list of words
>>> out = out.str.join(' ')
0 30 45 90 50
1 100 70 110
2 30 45 110
Name: documents, dtype: object
不要搜索 d[x]
,其中 x
是整个句子,您应该搜索 d[w]
句子 x
中的每个单词 w
.
您可以使用 .split()
将字符串拆分为单词列表。然后,您可以使用列表推导式或 map
在字典中搜索列表中的每个单词:
import pandas as pd
df = pd.DataFrame({'id': range(3), 'documents': ['I am good boy', 'We are going', 'I am going']})
print(df)
# id documents
# 0 0 I am good boy
# 1 1 We are going
# 2 2 I am going
d = {'I': 30,'am': 45,'good': 90,'boy': 50,'We':100,'are':70,'going':110}
df['new'] = df['documents'].apply(lambda s: list(map(d.get, s.split())))
# or alternatively:
# df['new'] = df['documents'].apply(lambda s: [d.get(w) for w in s.split()])
print(df)
# id documents new
# 0 0 I am good boy [30, 45, 90, 50]
# 1 1 We are going [100, 70, 110]
# 2 2 I am going [30, 45, 110]
重要说明:我建议使用 d.get(w)
而不是 d[w]
。如果 w
不在字典中,则尝试 d[w]
将引发异常。但是,d.get
接受默认值,并且永远不会引发异常。默认情况下,如果 w
不在 d
中,d.get(w)
将 return None
,但您可以自己指定默认值:
df = pd.DataFrame({'id': range(4), 'documents': ['I am good boy', 'We are going', 'I am going', 'I am good words not going in dictionary']})
df['new'] = df['documents'].apply(lambda s: [d.get(w, 37) for w in s.split()])
print(df)
# id documents new
# 0 0 I am good boy [30, 45, 90, 50]
# 1 1 We are going [100, 70, 110]
# 2 2 I am going [30, 45, 110]
# 3 3 I am good words not going in dictionary [30, 45, 90, 37, 37, 110, 37, 37]
这个数据框是给我的。
我想要的字典输出是这样的
**Given the following dictionary:-**
d = {'I': 30,'am': 45,'good': 90,'boy': 50,'We':100,'are':70,'going':110}
如何使用 python .. 我试过这样但失败了:(
dataframe['new'] = data['documents'].apply(lambda x: dictionary[x])
请帮帮我。提前致谢。
您可以使用 explode
获取单词然后映射到您的字典并重塑您的数据框:
MAPPING = {'I': 30,'am': 45,'good': 90,'boy': 50,'We':100,'are':70,'going':110}
df['documents'] = (df['documents'].str.split().explode().map(MAPPING).astype(str)
.groupby(level=0).agg(list).str.join(' '))
print(df)
# Output
id documents
0 0 30 45 90 50
1 1 100 70 110
2 2 30 45 110
循序渐进
第 1 阶段:爆炸
# Split phrase into words
>>> out = df['documents'].str.split()
0 [I, am, good, boy]
1 [We, are, going]
2 [I, am, going]
Name: documents, dtype: object
# Explode lists into scalar values
>>> out = out.explode()
0 I
0 am
0 good
0 boy
1 We
1 are
1 going
2 I
2 am
2 going
Name: documents, dtype: object
第 2 阶段:转换
# Convert words with your dict mapping and convert as string
>>> out = out.map(MAPPING).astype(str)
0 30
0 45
0 90
0 50
1 100
1 70
1 110
2 30
2 45
2 110
Name: documents, dtype: object # <- .astype(str)
第 3 阶段:重塑
# Group by index (level=0) then aggregate to a list
>>> out = out.groupby(level=0).agg(list)
0 [30, 45, 90, 50]
1 [100, 70, 110]
2 [30, 45, 110]
Name: documents, dtype: object
# Join your list of words
>>> out = out.str.join(' ')
0 30 45 90 50
1 100 70 110
2 30 45 110
Name: documents, dtype: object
不要搜索 d[x]
,其中 x
是整个句子,您应该搜索 d[w]
句子 x
中的每个单词 w
.
您可以使用 .split()
将字符串拆分为单词列表。然后,您可以使用列表推导式或 map
在字典中搜索列表中的每个单词:
import pandas as pd
df = pd.DataFrame({'id': range(3), 'documents': ['I am good boy', 'We are going', 'I am going']})
print(df)
# id documents
# 0 0 I am good boy
# 1 1 We are going
# 2 2 I am going
d = {'I': 30,'am': 45,'good': 90,'boy': 50,'We':100,'are':70,'going':110}
df['new'] = df['documents'].apply(lambda s: list(map(d.get, s.split())))
# or alternatively:
# df['new'] = df['documents'].apply(lambda s: [d.get(w) for w in s.split()])
print(df)
# id documents new
# 0 0 I am good boy [30, 45, 90, 50]
# 1 1 We are going [100, 70, 110]
# 2 2 I am going [30, 45, 110]
重要说明:我建议使用 d.get(w)
而不是 d[w]
。如果 w
不在字典中,则尝试 d[w]
将引发异常。但是,d.get
接受默认值,并且永远不会引发异常。默认情况下,如果 w
不在 d
中,d.get(w)
将 return None
,但您可以自己指定默认值:
df = pd.DataFrame({'id': range(4), 'documents': ['I am good boy', 'We are going', 'I am going', 'I am good words not going in dictionary']})
df['new'] = df['documents'].apply(lambda s: [d.get(w, 37) for w in s.split()])
print(df)
# id documents new
# 0 0 I am good boy [30, 45, 90, 50]
# 1 1 We are going [100, 70, 110]
# 2 2 I am going [30, 45, 110]
# 3 3 I am good words not going in dictionary [30, 45, 90, 37, 37, 110, 37, 37]