如何解析 pandas Dataframe 对象
How to parse pandas Dataframe object
我在 pandas Dataframe 中读取 csv 文件,然后获取它的虚拟文件并连接它们,但是例如我有一个名为 "Genre" 的列,它包含 "comedy, drama" 和 "action, comedy" 所以当我得到虚拟对象并连接它们时,它为每个句子创建一个对象,但我想解析 them.for 例如我想创建对象 'Genre.comedy' , 'Genre.Drama', 'Genre.action' 而不是 'Genre.comedy,drama' 和 'Genre.action,comedy'
这是我的代码:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import csv
from sklearn import preprocessing
trainset = pd.read_csv("/Users/yada/Downloads/IMDBMovieData.csv", encoding='latin-1')
X = trainset.drop(['Description', 'Runtime'], axis=1)
features = ['Genre','Actors']
for f in features:
X_dummy = pd.get_dummies(X[f], prefix = f)
X = X.drop([f], axis = 1)
X = pd.concat((X, X_dummy), axis = 1)
这是我的 csv 文件的某行:
csv
我认为需要str.get_dummies
with add_prefix
:
features = ['Genre','Actors']
for f in features:
X_dummy = X[f].str.get_dummies(', ').add_prefix(f + '.')
X = X.drop([f], axis = 1)
X = pd.concat((X, X_dummy), axis = 1)
或者:
trainset = pd.DataFrame({'Description':list('abc'),
'Genre':['comedy, drama','action, comedy','action'],
'Actors':['a, b','a, c','d, a'],
'Runtime':[1,3,5],
'E':[5,3,6],
'F':list('aaa')})
print (trainset)
Description Genre Actors Runtime E F
0 a comedy, drama a, b 1 5 a
1 b action, comedy a, c 3 3 a
2 c action d, a 5 6 a
X = trainset.drop(['Description', 'Runtime'], axis=1)
features = ['Genre','Actors']
X_dummy_list = [X.pop(f).str.get_dummies(', ').add_prefix(f + '.') for f in features]
X = pd.concat([X] + X_dummy_list , axis = 1)
print (X)
E F Genre.action Genre.comedy Genre.drama Actors.a Actors.b \
0 5 a 0 1 1 1 1
1 3 a 1 1 0 1 0
2 6 a 1 0 0 1 0
Actors.c Actors.d
0 0 0
1 1 0
2 0 1
我在 pandas Dataframe 中读取 csv 文件,然后获取它的虚拟文件并连接它们,但是例如我有一个名为 "Genre" 的列,它包含 "comedy, drama" 和 "action, comedy" 所以当我得到虚拟对象并连接它们时,它为每个句子创建一个对象,但我想解析 them.for 例如我想创建对象 'Genre.comedy' , 'Genre.Drama', 'Genre.action' 而不是 'Genre.comedy,drama' 和 'Genre.action,comedy' 这是我的代码:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import csv
from sklearn import preprocessing
trainset = pd.read_csv("/Users/yada/Downloads/IMDBMovieData.csv", encoding='latin-1')
X = trainset.drop(['Description', 'Runtime'], axis=1)
features = ['Genre','Actors']
for f in features:
X_dummy = pd.get_dummies(X[f], prefix = f)
X = X.drop([f], axis = 1)
X = pd.concat((X, X_dummy), axis = 1)
这是我的 csv 文件的某行: csv
我认为需要str.get_dummies
with add_prefix
:
features = ['Genre','Actors']
for f in features:
X_dummy = X[f].str.get_dummies(', ').add_prefix(f + '.')
X = X.drop([f], axis = 1)
X = pd.concat((X, X_dummy), axis = 1)
或者:
trainset = pd.DataFrame({'Description':list('abc'),
'Genre':['comedy, drama','action, comedy','action'],
'Actors':['a, b','a, c','d, a'],
'Runtime':[1,3,5],
'E':[5,3,6],
'F':list('aaa')})
print (trainset)
Description Genre Actors Runtime E F
0 a comedy, drama a, b 1 5 a
1 b action, comedy a, c 3 3 a
2 c action d, a 5 6 a
X = trainset.drop(['Description', 'Runtime'], axis=1)
features = ['Genre','Actors']
X_dummy_list = [X.pop(f).str.get_dummies(', ').add_prefix(f + '.') for f in features]
X = pd.concat([X] + X_dummy_list , axis = 1)
print (X)
E F Genre.action Genre.comedy Genre.drama Actors.a Actors.b \
0 5 a 0 1 1 1 1
1 3 a 1 1 0 1 0
2 6 a 1 0 0 1 0
Actors.c Actors.d
0 0 0
1 1 0
2 0 1