正在分析 python pandas 中的多个表
Analyzing multiple tables in python pandas
我有一堆 table,我想在这些 table 中计算。我在基础 Python 中执行此操作时遇到问题,所以我在 pandas 中尝试。这些 table 有不同数量的公司(行)和不同的年数(列),每年本身有 3 列。
问题 1: 来自原始 table 我需要计算每一年(利润 - 亏损 + 其他),并将其放入新的 table 只有年份。
我尝试在 python 中这样做,但认为我需要使用 pandas。
def status():
profit = []
loss = []
other = []
status = profit - loss + other
status()
def marketpart():
profit = []
loss = []
sum_profit = [] # sumarize profit of all companies in that column
sum_loss = [] # sumarize loss of all companies in that column
formula = (profit / sum_profit) / (loss / sum_loss)
marketpart(formula)
问题 2: 对于第二个功能,我需要汇总列而不是划分两列并将其导出到新的 table。
如何将这些函数转换为我可以在此 table 中使用的 pandas 函数?
Dropbox link 下载我的 Table
尝试:
import pandas as pd
import numpy as np
df = pd.read_excel('Downloads/products2.xlsx', index_col=[0])
df.columns = df.columns.str.split('_', expand=True)
dfm = df.stack(0).eval('status = profit-loss+other')
dfm[['sum_profit', 'sum_loss']] = dfm.groupby(level=[1])[['profit', 'loss']].transform('sum')
df_out = dfm.eval('formula = (profit / sum_profit) / (loss / sum_loss)')
df_out = df_out.unstack(1).swaplevel(0,1, axis=1).sort_index(axis=1)
df_out.to_excel('newexcel.xlsx')
输出:
2017 2018 \
formula loss other profit status sum_loss sum_profit formula
companies
company1 2.445652 500 3000 3000 5500 7500 18400 1.278539
company2 1.970109 600 2800 2900 5100 7500 18400 1.722114
company3 1.403986 900 3200 3100 5400 7500 18400 1.059361
company4 0.407609 2000 3100 2000 3100 7500 18400 1.664130
company5 5.706522 100 500 1400 1800 7500 18400 0.487062
company6 1.019022 800 800 2000 2000 7500 18400 0.547945
company7 0.733696 1500 1900 2700 3100 7500 18400 1.095890
company8 0.481719 1100 3000 1300 3200 7500 18400 0.649417
... 2019 2020 \
loss other ... status sum_loss sum_profit formula loss other
companies ...
company1 2000 5000 ... 4000 16800 22000 1.064260 3000 4000
company2 1400 3400 ... 1300 16800 22000 1.365895 1700 500
company3 2000 2400 ... 5100 16800 22000 1.170374 3100 1500
company4 1800 400 ... 1900 16800 22000 1.451264 2200 1300
company5 3000 1300 ... 3700 16800 22000 1.814079 1200 1700
company6 2000 4000 ... 3400 16800 22000 0.640263 3400 3600
company7 2000 4400 ... 0 16800 22000 0.414647 3500 1200
company8 1800 3200 ... 1200 16800 22000 0.979603 2000 1400
profit status sum_loss sum_profit
companies
company1 4400 5400 20100 27700
company2 3200 2000 20100 27700
company3 5000 3400 20100 27700
company4 4400 3500 20100 27700
company5 3000 3500 20100 27700
company6 3000 3200 20100 27700
company7 2000 -300 20100 27700
company8 2700 2100 20100 27700
[8 rows x 28 columns]
pd.wide_to_long()
是你的朋友,它将分层命名的列提取到类似 melt
的输出中:
pd.wide_to_long(df, ['2017','2018','2019','2020'],
sep='_', suffix=r'(profit|loss|other)', i='companies', j='value')
2017 2018 2019 2020
companies value
company1 profit 3000 3500 3000 4400
company2 profit 2900 3300 3000 3200
company3 profit 3100 2900 3500 5000
company4 profit 2000 4100 3000 4400
company5 profit 1400 2000 2500 3000
company6 profit 2000 1500 2000 3000
company7 profit 2700 3000 2000 2000
company8 profit 1300 1600 3000 2700
company1 loss 500 2000 3000 3000
company2 loss 600 1400 2000 1700
company3 loss 900 2000 400 3100
company4 loss 2000 1800 3000 2200
company5 loss 100 3000 800 1200
company6 loss 800 2000 600 3400
company7 loss 1500 2000 5000 3500
company8 loss 1100 1800 2000 2000
company1 other 3000 5000 4000 4000
company2 other 2800 3400 300 500
company3 other 3200 2400 2000 1500
company4 other 3100 400 1900 1300
company5 other 500 1300 2000 1700
company6 other 800 4000 2000 3600
company7 other 1900 4400 3000 1200
company8 other 3000 3200 200 1400
剩下的就很简单了。
我有一堆 table,我想在这些 table 中计算。我在基础 Python 中执行此操作时遇到问题,所以我在 pandas 中尝试。这些 table 有不同数量的公司(行)和不同的年数(列),每年本身有 3 列。
问题 1: 来自原始 table 我需要计算每一年(利润 - 亏损 + 其他),并将其放入新的 table 只有年份。
我尝试在 python 中这样做,但认为我需要使用 pandas。
def status():
profit = []
loss = []
other = []
status = profit - loss + other
status()
def marketpart():
profit = []
loss = []
sum_profit = [] # sumarize profit of all companies in that column
sum_loss = [] # sumarize loss of all companies in that column
formula = (profit / sum_profit) / (loss / sum_loss)
marketpart(formula)
问题 2: 对于第二个功能,我需要汇总列而不是划分两列并将其导出到新的 table。
如何将这些函数转换为我可以在此 table 中使用的 pandas 函数?
Dropbox link 下载我的 Table
尝试:
import pandas as pd
import numpy as np
df = pd.read_excel('Downloads/products2.xlsx', index_col=[0])
df.columns = df.columns.str.split('_', expand=True)
dfm = df.stack(0).eval('status = profit-loss+other')
dfm[['sum_profit', 'sum_loss']] = dfm.groupby(level=[1])[['profit', 'loss']].transform('sum')
df_out = dfm.eval('formula = (profit / sum_profit) / (loss / sum_loss)')
df_out = df_out.unstack(1).swaplevel(0,1, axis=1).sort_index(axis=1)
df_out.to_excel('newexcel.xlsx')
输出:
2017 2018 \
formula loss other profit status sum_loss sum_profit formula
companies
company1 2.445652 500 3000 3000 5500 7500 18400 1.278539
company2 1.970109 600 2800 2900 5100 7500 18400 1.722114
company3 1.403986 900 3200 3100 5400 7500 18400 1.059361
company4 0.407609 2000 3100 2000 3100 7500 18400 1.664130
company5 5.706522 100 500 1400 1800 7500 18400 0.487062
company6 1.019022 800 800 2000 2000 7500 18400 0.547945
company7 0.733696 1500 1900 2700 3100 7500 18400 1.095890
company8 0.481719 1100 3000 1300 3200 7500 18400 0.649417
... 2019 2020 \
loss other ... status sum_loss sum_profit formula loss other
companies ...
company1 2000 5000 ... 4000 16800 22000 1.064260 3000 4000
company2 1400 3400 ... 1300 16800 22000 1.365895 1700 500
company3 2000 2400 ... 5100 16800 22000 1.170374 3100 1500
company4 1800 400 ... 1900 16800 22000 1.451264 2200 1300
company5 3000 1300 ... 3700 16800 22000 1.814079 1200 1700
company6 2000 4000 ... 3400 16800 22000 0.640263 3400 3600
company7 2000 4400 ... 0 16800 22000 0.414647 3500 1200
company8 1800 3200 ... 1200 16800 22000 0.979603 2000 1400
profit status sum_loss sum_profit
companies
company1 4400 5400 20100 27700
company2 3200 2000 20100 27700
company3 5000 3400 20100 27700
company4 4400 3500 20100 27700
company5 3000 3500 20100 27700
company6 3000 3200 20100 27700
company7 2000 -300 20100 27700
company8 2700 2100 20100 27700
[8 rows x 28 columns]
pd.wide_to_long()
是你的朋友,它将分层命名的列提取到类似 melt
的输出中:
pd.wide_to_long(df, ['2017','2018','2019','2020'],
sep='_', suffix=r'(profit|loss|other)', i='companies', j='value')
2017 2018 2019 2020
companies value
company1 profit 3000 3500 3000 4400
company2 profit 2900 3300 3000 3200
company3 profit 3100 2900 3500 5000
company4 profit 2000 4100 3000 4400
company5 profit 1400 2000 2500 3000
company6 profit 2000 1500 2000 3000
company7 profit 2700 3000 2000 2000
company8 profit 1300 1600 3000 2700
company1 loss 500 2000 3000 3000
company2 loss 600 1400 2000 1700
company3 loss 900 2000 400 3100
company4 loss 2000 1800 3000 2200
company5 loss 100 3000 800 1200
company6 loss 800 2000 600 3400
company7 loss 1500 2000 5000 3500
company8 loss 1100 1800 2000 2000
company1 other 3000 5000 4000 4000
company2 other 2800 3400 300 500
company3 other 3200 2400 2000 1500
company4 other 3100 400 1900 1300
company5 other 500 1300 2000 1700
company6 other 800 4000 2000 3600
company7 other 1900 4400 3000 1200
company8 other 3000 3200 200 1400
剩下的就很简单了。