Pandas scatter_matrix - 绘制分类变量
Pandas scatter_matrix - plot categorical variables
我正在查看 Kaggle 竞赛中著名的泰坦尼克号数据集:http://www.kaggle.com/c/titanic-gettingStarted/data
我使用以下方法加载并处理了数据:
# import required libraries
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
# load the data from the file
df = pd.read_csv('./data/train.csv')
# import the scatter_matrix functionality
from pandas.tools.plotting import scatter_matrix
# define colors list, to be used to plot survived either red (=0) or green (=1)
colors=['red','green']
# make a scatter plot
scatter_matrix(df,figsize=[20,20],marker='x',c=df.Survived.apply(lambda x:colors[x]))
df.info()
如何将 Sex 和 Embarked 等分类列添加到图中?
您需要将分类变量转换为数字以绘制它们。
示例(假设列 'Sex' 包含性别数据,'M' 代表男性,'F' 代表女性)
df['Sex_int'] = np.nan
df.loc[df['Sex'] == 'M', 'Sex_int'] = 0
df.loc[df['Sex'] == 'F', 'Sex_int'] = 1
现在所有女性都用 0 表示,男性用 1 表示。未知性别(如果有的话)将被忽略。
您的其余代码应该可以很好地处理更新后的数据框。
在谷歌搜索并记住类似 .map() 函数的内容后,我按以下方式修复了它:
colors=['red','green'] # color codes for survived : 0=red or 1=green
# create mapping Series for gender so it can be plotted
gender = Series([0,1],index=['male','female'])
df['gender']=df.Sex.map(gender)
# create mapping Series for Embarked so it can be plotted
embarked = Series([0,1,2,3],index=df.Embarked.unique())
df['embarked']=df.Embarked.map(embarked)
# add survived also back to the df
df['survived']=target
现在我可以再次绘制...然后删除添加的列。
感谢大家的回复.....
这是我的解决方案:
# convert string column to category
df.Sex = df.Sex.astype('category')
# create additional column for its codes
df['Sex_code'] = df_clean.Sex.cat.codes
我正在查看 Kaggle 竞赛中著名的泰坦尼克号数据集:http://www.kaggle.com/c/titanic-gettingStarted/data
我使用以下方法加载并处理了数据:
# import required libraries
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
# load the data from the file
df = pd.read_csv('./data/train.csv')
# import the scatter_matrix functionality
from pandas.tools.plotting import scatter_matrix
# define colors list, to be used to plot survived either red (=0) or green (=1)
colors=['red','green']
# make a scatter plot
scatter_matrix(df,figsize=[20,20],marker='x',c=df.Survived.apply(lambda x:colors[x]))
df.info()
如何将 Sex 和 Embarked 等分类列添加到图中?
您需要将分类变量转换为数字以绘制它们。
示例(假设列 'Sex' 包含性别数据,'M' 代表男性,'F' 代表女性)
df['Sex_int'] = np.nan
df.loc[df['Sex'] == 'M', 'Sex_int'] = 0
df.loc[df['Sex'] == 'F', 'Sex_int'] = 1
现在所有女性都用 0 表示,男性用 1 表示。未知性别(如果有的话)将被忽略。
您的其余代码应该可以很好地处理更新后的数据框。
在谷歌搜索并记住类似 .map() 函数的内容后,我按以下方式修复了它:
colors=['red','green'] # color codes for survived : 0=red or 1=green
# create mapping Series for gender so it can be plotted
gender = Series([0,1],index=['male','female'])
df['gender']=df.Sex.map(gender)
# create mapping Series for Embarked so it can be plotted
embarked = Series([0,1,2,3],index=df.Embarked.unique())
df['embarked']=df.Embarked.map(embarked)
# add survived also back to the df
df['survived']=target
现在我可以再次绘制...然后删除添加的列。
感谢大家的回复.....
这是我的解决方案:
# convert string column to category
df.Sex = df.Sex.astype('category')
# create additional column for its codes
df['Sex_code'] = df_clean.Sex.cat.codes