如何使用 pandas 重塑数据框？

Question

我有一个数据框，其中包含从 2018 年到 2021 年的每一天的产品销售额。 Dataframe 包含四列（日期、地点、产品类别和销售额）。从前两列（日期、地点）开始，我想使用可用数据来填补空白。添加数据后，我想删除 ProductCategory 中没有数据的行。我想在 python pandas.

中做

我的数据集样本如下所示：

我希望数据框看起来像这样：

Answer 1

将 fillna 与方法 'ffill' 结合使用，将最后一个有效观察向前传播到下一个有效回填。然后删除包含 NA 的行。

df['Date'].fillna(method='ffill',inplace=True)
df['Place'].fillna(method='ffill',inplace=True)
df.dropna(inplace=True)

Answer 2

您将使用 forward-filling 方法将空值替换为其上方最接近的值 df['Date', 'Place'] = df['Date', 'Place'].fillna(method='ffill')。接下来，删除具有缺失值 df.dropna(subset='ProductCategory', inplace=True) 的行。恭喜，现在你有了想要的 df

文档：Pandas fillna function, Pandas dropna function

Answer 3

通过绘图计算列中类别的频率，从图中您可以看到代表重复次数最多的值的条形图

df['column'].value_counts().plot.bar()

并使用索引获取最频繁的值，索引[0]给出最重复和 index[1] 给出第二个最重复的，你可以根据你的要求选择。

most_frequent_attribute = df['column'].value_counts().index[0]

然后用上面的方法填充缺失值

df['column'].fillna(df['column'].most_freqent_attribute,inplace=True)

要用相同的方法填充多个列，只需将其定义为函数，就像这样

def impute_nan(df,column):
    most_frequent_category=df[column].mode()[0]
    df[column].fillna(most_frequent_category,inplace=True)

for feature in ['column1','column2']:
    impute_nan(df,feature)

如何使用 pandas 重塑数据框？

How to reshape dataframe with pandas?

python

reshape

dataframe

melt

pandas