列出所有可能的列组合的平均值
list mean values of all possible column combinations
我有一个如下所示的数据集:
Value Type country mean
-1.975767 Weather Brazil
-0.540979 Fruits China
-2.359127 Fruits China
-2.815604 Corona China
-0.712323 Weather UK
-0.929755 Weather Brazil
我想计算类型和国家/地区的所有不同组合的总体平均值。例如:
巴西天气的平均值 = (-1.975767 -0.929755) / 2
然后我想把这些组合加到另一个table:
df2 = pd.DataFrame()
country type mean count
Brazil Weather 2
Brazil Corona
China Corona 1
China Fruits 2
我可以这样计算平均值:
print(df.groupby(["type", "country"])["value"].mean())
但是如何以所需格式将这些值保存在新的 df 中?
编辑:
这有效
df_new = df.groupby(["type", "country"], as_index = False)["value"].mean()
但是如果我尝试以相同的方式添加计数:
df_new = df.groupby(["type", "country"], as_index = False).count()
它转置所有列而不是在均值列之后添加计数列
您可以在 groupby
:
中使用 as_index
参数
df_new = df.groupby(["type", "country"], as_index = False)["value"].mean()
那么结果就是一个标准的dataframe:
type country value
0 Corona China -2.815604
1 Fruits China -1.450053
2 Weather Brazil -1.452761
3 Weather UK -0.712323
编辑: 我们如何使用 count
添加另一列?您可以简单地附加一个新列,其中包含一个新 groupby
的结果,如下所示:
# original answer
df_new = df.groupby(["type", "country"], as_index = False)["value"].mean().rename(columns={'value':'mean'})
# Add count also
df_new['count'] = df.groupby(["type", "country"])["value"].count().tolist()
df_new
输出:
type country mean count
0 Corona China -2.815604 1
1 Fruits China -1.450053 2
2 Weather Brazil -1.452761 2
3 Weather UK -0.712323 1
就个人而言,我更喜欢 JANO 的回答,但这是我得出的结论:
import pandas as pd
dataframe = pd.DataFrame({"Value":[-1.23, -1.65, -0.123, -0.67, 2.456], "Type":["Weather", "Fruits", "Corona", "Corona", "Weather"], "country": ["Brazil", "China", "China", "Iran", "Iran"]})
resultDataframe = {"Value":[], "Type":[], "country":[]}
for country in dataframe["country"].unique():
tempDataframe = dataframe[dataframe["country"] == country]
a = tempDataframe.groupby(by="Type").mean().reset_index()
for index, row in a.iterrows():
resultDataframe["Value"].append(row["Value"])
resultDataframe["Type"].append(row["Type"])
resultDataframe["country"].append(country)
pd.DataFrame(resultDataframe)
Value
Type
country
0
-1.23
Weather
Brazil
1
-0.123
Corona
China
2
-1.65
Fruits
China
3
-0.67
Corona
Iran
4
2.456
Weather
Iran
我有一个如下所示的数据集:
Value Type country mean
-1.975767 Weather Brazil
-0.540979 Fruits China
-2.359127 Fruits China
-2.815604 Corona China
-0.712323 Weather UK
-0.929755 Weather Brazil
我想计算类型和国家/地区的所有不同组合的总体平均值。例如:
巴西天气的平均值 = (-1.975767 -0.929755) / 2
然后我想把这些组合加到另一个table:
df2 = pd.DataFrame()
country type mean count
Brazil Weather 2
Brazil Corona
China Corona 1
China Fruits 2
我可以这样计算平均值:
print(df.groupby(["type", "country"])["value"].mean())
但是如何以所需格式将这些值保存在新的 df 中?
编辑: 这有效
df_new = df.groupby(["type", "country"], as_index = False)["value"].mean()
但是如果我尝试以相同的方式添加计数:
df_new = df.groupby(["type", "country"], as_index = False).count()
它转置所有列而不是在均值列之后添加计数列
您可以在 groupby
:
as_index
参数
df_new = df.groupby(["type", "country"], as_index = False)["value"].mean()
那么结果就是一个标准的dataframe:
type country value
0 Corona China -2.815604
1 Fruits China -1.450053
2 Weather Brazil -1.452761
3 Weather UK -0.712323
编辑: 我们如何使用 count
添加另一列?您可以简单地附加一个新列,其中包含一个新 groupby
的结果,如下所示:
# original answer
df_new = df.groupby(["type", "country"], as_index = False)["value"].mean().rename(columns={'value':'mean'})
# Add count also
df_new['count'] = df.groupby(["type", "country"])["value"].count().tolist()
df_new
输出:
type country mean count
0 Corona China -2.815604 1
1 Fruits China -1.450053 2
2 Weather Brazil -1.452761 2
3 Weather UK -0.712323 1
就个人而言,我更喜欢 JANO 的回答,但这是我得出的结论:
import pandas as pd
dataframe = pd.DataFrame({"Value":[-1.23, -1.65, -0.123, -0.67, 2.456], "Type":["Weather", "Fruits", "Corona", "Corona", "Weather"], "country": ["Brazil", "China", "China", "Iran", "Iran"]})
resultDataframe = {"Value":[], "Type":[], "country":[]}
for country in dataframe["country"].unique():
tempDataframe = dataframe[dataframe["country"] == country]
a = tempDataframe.groupby(by="Type").mean().reset_index()
for index, row in a.iterrows():
resultDataframe["Value"].append(row["Value"])
resultDataframe["Type"].append(row["Type"])
resultDataframe["country"].append(country)
pd.DataFrame(resultDataframe)
Value | Type | country | |
---|---|---|---|
0 | -1.23 | Weather | Brazil |
1 | -0.123 | Corona | China |
2 | -1.65 | Fruits | China |
3 | -0.67 | Corona | Iran |
4 | 2.456 | Weather | Iran |