使用来自多个数据框的数据将列添加到数据框
Adding columns to a data frame with data coming from multiple dataframes
我想要这个矩阵作为结果:
int = {
"vendor":['A','B','C','D','E'],
"country":['Spain','Spain','Germany','Italy','Italy'],
"yeardum":['2015','2020','2014','2016','2019'],
"sales_year_data":['15','205','24','920','1310'],
"country_image_data":['2','5','-6','7','-1'],
}
df_inv = pd.DataFrame(int)
df_inv中列“sales_year_data”的数据来自df1=
sales_year_data = {
"country":['Spain','France','Germany','Belgium','Italy'],
"2014":['45','202','24','216','219'],
"2015":['15','55','214','2016','209'],
"2016":['615','2333','205','207','920'],
"2017":['1215','255','234','2116','101'],
"2018":['415','1320','214','2516','2019'],
"2019":['215','220','5614','416','1310'],
"2020":['205','202','44','296','2011'],
}
df1 = pd.DataFrame(sales_year_data)
正如您在df_inv的“sales_year_data”列中看到的那样,数字15是df1中2015年和西班牙的交集,数字205是西班牙和西班牙的交集2020年,24在德国和2014年的交集等等。
df_inv中列“country_image_data”的数据来自df2
country_change_data = {
"country":['Spain','Spain','Germany','Italy','Italy'],
"2014":['4','2','-6','6','9'],
"2015":['2','5','-5','2','3'],
"2016":['5','3','5','7','9'],
"2017":['8','7','5','6','1'],
"2018":['5','1','4','6','2'],
"2019":['1','2','4','6','-1'],
"2020":['5','2','4','6','2'],
}
df2 = pd.DataFrame(country_change_data)
正如您在df_inv的“country_change_data”列中看到的那样,数字2是df2中2015年与西班牙的交集,数字5是西班牙与西班牙的交集2020,-6在德国和2014年的交点等等。
如果我的原始数据框是:
inv = {
"vendor":['A','B','C','D','E'],
"country":['Spain','Spain','Germany','Italy','Italy'],
"yeardum":['2015','2020','2014','2016','2019'],
}
df0 = pd.DataFrame(inv)
我如何在感兴趣的交叉路口自动搜索各种 df1 和 df2 以构建 df_inv departing prom df0?
这样就可以了。
sales_counters = {}
country_counters = {}
new_df_data = []
for row in df0.iloc:
c = row['country']
y = row['yeardum']
sales_idx = sales_counters[c] = sales_counters.get(c, -1) + 1
country_idx = country_counters[c] = country_counters.get(c, -1) + 1
d1 = df1[df1['country'] == c]
d2 = df2[df2['country'] == c]
sales_year = d1.iloc[min(sales_idx, d1.shape[0]-1)][y]
country_image = d2.iloc[min(country_idx, d2.shape[0]-1)][y]
new_df_data.append([sales_year, country_image])
df0 = pd.concat([df0, pd.DataFrame(new_df_data)], axis=1).rename({0: 'sales_year_data', 1: 'country_image_data'}, axis=1)
测试:
>>> df0
vendor country yeardum sales_year_data country_image_data
0 A Spain 2015 15 2
1 B Spain 2020 205 2
2 C Germany 2014 24 -6
3 D Italy 2016 920 7
4 E Italy 2019 1310 -1
我想要这个矩阵作为结果:
int = {
"vendor":['A','B','C','D','E'],
"country":['Spain','Spain','Germany','Italy','Italy'],
"yeardum":['2015','2020','2014','2016','2019'],
"sales_year_data":['15','205','24','920','1310'],
"country_image_data":['2','5','-6','7','-1'],
}
df_inv = pd.DataFrame(int)
df_inv中列“sales_year_data”的数据来自df1=
sales_year_data = {
"country":['Spain','France','Germany','Belgium','Italy'],
"2014":['45','202','24','216','219'],
"2015":['15','55','214','2016','209'],
"2016":['615','2333','205','207','920'],
"2017":['1215','255','234','2116','101'],
"2018":['415','1320','214','2516','2019'],
"2019":['215','220','5614','416','1310'],
"2020":['205','202','44','296','2011'],
}
df1 = pd.DataFrame(sales_year_data)
正如您在df_inv的“sales_year_data”列中看到的那样,数字15是df1中2015年和西班牙的交集,数字205是西班牙和西班牙的交集2020年,24在德国和2014年的交集等等。
df_inv中列“country_image_data”的数据来自df2
country_change_data = {
"country":['Spain','Spain','Germany','Italy','Italy'],
"2014":['4','2','-6','6','9'],
"2015":['2','5','-5','2','3'],
"2016":['5','3','5','7','9'],
"2017":['8','7','5','6','1'],
"2018":['5','1','4','6','2'],
"2019":['1','2','4','6','-1'],
"2020":['5','2','4','6','2'],
}
df2 = pd.DataFrame(country_change_data)
正如您在df_inv的“country_change_data”列中看到的那样,数字2是df2中2015年与西班牙的交集,数字5是西班牙与西班牙的交集2020,-6在德国和2014年的交点等等。
如果我的原始数据框是:
inv = {
"vendor":['A','B','C','D','E'],
"country":['Spain','Spain','Germany','Italy','Italy'],
"yeardum":['2015','2020','2014','2016','2019'],
}
df0 = pd.DataFrame(inv)
我如何在感兴趣的交叉路口自动搜索各种 df1 和 df2 以构建 df_inv departing prom df0?
这样就可以了。
sales_counters = {}
country_counters = {}
new_df_data = []
for row in df0.iloc:
c = row['country']
y = row['yeardum']
sales_idx = sales_counters[c] = sales_counters.get(c, -1) + 1
country_idx = country_counters[c] = country_counters.get(c, -1) + 1
d1 = df1[df1['country'] == c]
d2 = df2[df2['country'] == c]
sales_year = d1.iloc[min(sales_idx, d1.shape[0]-1)][y]
country_image = d2.iloc[min(country_idx, d2.shape[0]-1)][y]
new_df_data.append([sales_year, country_image])
df0 = pd.concat([df0, pd.DataFrame(new_df_data)], axis=1).rename({0: 'sales_year_data', 1: 'country_image_data'}, axis=1)
测试:
>>> df0
vendor country yeardum sales_year_data country_image_data
0 A Spain 2015 15 2
1 B Spain 2020 205 2
2 C Germany 2014 24 -6
3 D Italy 2016 920 7
4 E Italy 2019 1310 -1