如何在 Pandas 中调用链式 .agg() 和 .assign() 函数
How to method chain .agg() and .assign() functions in Pandas
我希望在 Pandas 中复制此 Dplyr 查询,但在链接 .agg() 和 .assign() 时遇到问题 共同发挥作用,如有任何建议,将不胜感激
Dplyr代码:
counties_selected %>%
group_by(state) %>%
summarize(total_area = sum(land_area),
total_population = sum(population)) %>%
mutate(density = total_population / total_area) %>%
arrange(desc(density))
尝试在 Pandas:
在 .assign() 部分中,我将变量重定向回原始数据帧,但没有其他工作
counties.\
groupby('state').\
agg(total_area = ('land_area', 'sum'),
total_population = ('population', 'sum')).\
reset_index().\
assign(density = counties['total_population'] / counties['total_area']).\
arrange('density', ascending = False).\
head()
问题是您需要 lambda
来处理链式数据,已经在以前的链式方法中进行了处理:
assign(density = counties['total_population'] / counties['total_area'])
至:
assign(density = lambda x: x['total_population'] / x['total_area'])
另一个问题是排序被改用了:
arrange('density', ascending = False)
sort_values('density', ascending = False):
总的来说,.
用于启动以下方法:
df = (counties.groupby('state')
.agg(total_area = ('land_area', 'sum'),
total_population = ('population', 'sum'))
.reset_index()
.assign(density = lambda x: x['total_population'] / x['total_area'])
.sort_values('density', ascending = False)
.head())
使用 datar
,可以轻松地将 dplyr 代码移植到 python 代码,无需学习 pandas API:
from datar.all import f, group_by, summarize, sum, mutate, arrange, desc
counties_selected >> \
group_by(f.state) >> \
summarize(total_area = sum(f.land_area),
total_population = sum(f.population)) >> \
mutate(density = f.total_population / f.total_area) >> \
arrange(desc(f.density))
我是包的作者。有问题欢迎提issue
我希望在 Pandas 中复制此 Dplyr 查询,但在链接 .agg() 和 .assign() 时遇到问题 共同发挥作用,如有任何建议,将不胜感激
Dplyr代码:
counties_selected %>%
group_by(state) %>%
summarize(total_area = sum(land_area),
total_population = sum(population)) %>%
mutate(density = total_population / total_area) %>%
arrange(desc(density))
尝试在 Pandas:
在 .assign() 部分中,我将变量重定向回原始数据帧,但没有其他工作
counties.\
groupby('state').\
agg(total_area = ('land_area', 'sum'),
total_population = ('population', 'sum')).\
reset_index().\
assign(density = counties['total_population'] / counties['total_area']).\
arrange('density', ascending = False).\
head()
问题是您需要 lambda
来处理链式数据,已经在以前的链式方法中进行了处理:
assign(density = counties['total_population'] / counties['total_area'])
至:
assign(density = lambda x: x['total_population'] / x['total_area'])
另一个问题是排序被改用了:
arrange('density', ascending = False)
sort_values('density', ascending = False):
总的来说,.
用于启动以下方法:
df = (counties.groupby('state')
.agg(total_area = ('land_area', 'sum'),
total_population = ('population', 'sum'))
.reset_index()
.assign(density = lambda x: x['total_population'] / x['total_area'])
.sort_values('density', ascending = False)
.head())
使用 datar
,可以轻松地将 dplyr 代码移植到 python 代码,无需学习 pandas API:
from datar.all import f, group_by, summarize, sum, mutate, arrange, desc
counties_selected >> \
group_by(f.state) >> \
summarize(total_area = sum(f.land_area),
total_population = sum(f.population)) >> \
mutate(density = f.total_population / f.total_area) >> \
arrange(desc(f.density))
我是包的作者。有问题欢迎提issue