为具有空单元格的 Python-dataframe 的所有列生成单独的散点图
Produce separate scatterplots for all columns with all columns for a Python-dataframe with empty cells
我尝试自动化大型数据框的相关绘图。目标是将每一列与另一列绘制成散点图,并有一条回归线穿过它。每列代表一个不同的变量,一列可能有空单元格、整数和字符串值(尝试代码和工作示例如下)
示例代码:
Age Height Weight Sex
21 180 54 M
56 171 65 V
23 NaN 84 V
NaN 195 71 M
42 165 67 V
84 167 93 M
12 NaN 88 M
31 152 73 V
NaN 184 NaN V
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df_subset = pd.DataFrame({"Age": [21,56,23,np.nan,42,84,12,31,np.nan], "Height":
[180,171,np.nan,195,165,167,np.nan,152,184], "Weight": [54,65,84,71,67,93,88,73,np.nan], "Sex":
['M','V','V','M','V','M','M','V','V']})
print(df_subset)
col_choice = ["Age", "Height", "Weight"]
for pos1, axis1 in enumerate(col_choice): # Pick a first col
for pos2, axis2 in enumerate(col_choice[pos1+1:]): # Pick a later col
plt.scatter(df_subset.loc[:,axis1], df_subset.loc[:,axis2]) #scatter plot
a, b = np.polyfit(df_subset.loc[:,axis1], df_subset.loc[:,axis2], 1) #determining parameters for regression line
x = df_subset.loc[:,axis1]
plt.plot(x, a*x + b) #regression line on scatter-plot
plt.show()
解决方案:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df_subset = pd.DataFrame({"Age": [21,56,23,np.nan,42,84,12,31,np.nan], "Height":
[180,171,np.nan,195,165,167,np.nan,152,184], "Weight": [54,65,84,71,67,93,88,73,np.nan], "Sex":
['M','V','V','M','V','M','M','V','V']})
print(df_subset)
col_choice = ["Age", "Height", "Weight"]
for pos1, axis1 in enumerate(col_choice): # Pick a first col
for pos2, axis2 in enumerate(col_choice[pos1+1:]): # Pick a later col
df = df_subset[[axis1,axis2]].dropna()
print(df)
plt.scatter(df.iloc[:,0], df.iloc[:,1]) #scatter plot
a, b = np.polyfit(df.iloc[:,0], df.iloc[:,1], 1) #determining parameters for regression line
x = df.iloc[:,0]
plt.plot(x, a*x + b) #regression line on scatter-plot
plt.show()
我尝试自动化大型数据框的相关绘图。目标是将每一列与另一列绘制成散点图,并有一条回归线穿过它。每列代表一个不同的变量,一列可能有空单元格、整数和字符串值(尝试代码和工作示例如下)
示例代码:
Age Height Weight Sex
21 180 54 M
56 171 65 V
23 NaN 84 V
NaN 195 71 M
42 165 67 V
84 167 93 M
12 NaN 88 M
31 152 73 V
NaN 184 NaN V
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df_subset = pd.DataFrame({"Age": [21,56,23,np.nan,42,84,12,31,np.nan], "Height":
[180,171,np.nan,195,165,167,np.nan,152,184], "Weight": [54,65,84,71,67,93,88,73,np.nan], "Sex":
['M','V','V','M','V','M','M','V','V']})
print(df_subset)
col_choice = ["Age", "Height", "Weight"]
for pos1, axis1 in enumerate(col_choice): # Pick a first col
for pos2, axis2 in enumerate(col_choice[pos1+1:]): # Pick a later col
plt.scatter(df_subset.loc[:,axis1], df_subset.loc[:,axis2]) #scatter plot
a, b = np.polyfit(df_subset.loc[:,axis1], df_subset.loc[:,axis2], 1) #determining parameters for regression line
x = df_subset.loc[:,axis1]
plt.plot(x, a*x + b) #regression line on scatter-plot
plt.show()
解决方案:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df_subset = pd.DataFrame({"Age": [21,56,23,np.nan,42,84,12,31,np.nan], "Height":
[180,171,np.nan,195,165,167,np.nan,152,184], "Weight": [54,65,84,71,67,93,88,73,np.nan], "Sex":
['M','V','V','M','V','M','M','V','V']})
print(df_subset)
col_choice = ["Age", "Height", "Weight"]
for pos1, axis1 in enumerate(col_choice): # Pick a first col
for pos2, axis2 in enumerate(col_choice[pos1+1:]): # Pick a later col
df = df_subset[[axis1,axis2]].dropna()
print(df)
plt.scatter(df.iloc[:,0], df.iloc[:,1]) #scatter plot
a, b = np.polyfit(df.iloc[:,0], df.iloc[:,1], 1) #determining parameters for regression line
x = df.iloc[:,0]
plt.plot(x, a*x + b) #regression line on scatter-plot
plt.show()