正在分析 python pandas 中的多个表

Question

我有一堆 table，我想在这些 table 中计算。我在基础 Python 中执行此操作时遇到问题，所以我在 pandas 中尝试。这些 table 有不同数量的公司（行）和不同的年数（列），每年本身有 3 列。

问题 1： 来自原始 table 我需要计算每一年（利润 - 亏损 + 其他），并将其放入新的 table 只有年份。

我尝试在 python 中这样做，但认为我需要使用 pandas。

def status():
    profit = []
    loss = []
    other = []

    status = profit - loss + other

status()

def marketpart():
    profit = []
    loss = []
    sum_profit = []     # sumarize profit of all companies in that column
    sum_loss = []       # sumarize loss of all companies in that column

    formula = (profit / sum_profit) / (loss / sum_loss)

marketpart(formula)

问题 2： 对于第二个功能，我需要汇总列而不是划分两列并将其导出到新的 table。

如何将这些函数转换为我可以在此 table 中使用的 pandas 函数？

Dropbox link 下载我的 Table

Answer 1

尝试：

import pandas as pd
import numpy as np

df = pd.read_excel('Downloads/products2.xlsx', index_col=[0])

df.columns = df.columns.str.split('_', expand=True)

dfm = df.stack(0).eval('status = profit-loss+other')

dfm[['sum_profit', 'sum_loss']] = dfm.groupby(level=[1])[['profit', 'loss']].transform('sum')

df_out = dfm.eval('formula = (profit / sum_profit) / (loss / sum_loss)')

df_out = df_out.unstack(1).swaplevel(0,1, axis=1).sort_index(axis=1)
df_out.to_excel('newexcel.xlsx')

输出：

               2017                                                    2018  \
            formula  loss other profit status sum_loss sum_profit   formula   
companies                                                                     
company1   2.445652   500  3000   3000   5500     7500      18400  1.278539   
company2   1.970109   600  2800   2900   5100     7500      18400  1.722114   
company3   1.403986   900  3200   3100   5400     7500      18400  1.059361   
company4   0.407609  2000  3100   2000   3100     7500      18400  1.664130   
company5   5.706522   100   500   1400   1800     7500      18400  0.487062   
company6   1.019022   800   800   2000   2000     7500      18400  0.547945   
company7   0.733696  1500  1900   2700   3100     7500      18400  1.095890   
company8   0.481719  1100  3000   1300   3200     7500      18400  0.649417   

                       ...   2019                          2020              \
           loss other  ... status sum_loss sum_profit   formula  loss other   
companies              ...                                                    
company1   2000  5000  ...   4000    16800      22000  1.064260  3000  4000   
company2   1400  3400  ...   1300    16800      22000  1.365895  1700   500   
company3   2000  2400  ...   5100    16800      22000  1.170374  3100  1500   
company4   1800   400  ...   1900    16800      22000  1.451264  2200  1300   
company5   3000  1300  ...   3700    16800      22000  1.814079  1200  1700   
company6   2000  4000  ...   3400    16800      22000  0.640263  3400  3600   
company7   2000  4400  ...      0    16800      22000  0.414647  3500  1200   
company8   1800  3200  ...   1200    16800      22000  0.979603  2000  1400   

                                             
          profit status sum_loss sum_profit  
companies                                    
company1    4400   5400    20100      27700  
company2    3200   2000    20100      27700  
company3    5000   3400    20100      27700  
company4    4400   3500    20100      27700  
company5    3000   3500    20100      27700  
company6    3000   3200    20100      27700  
company7    2000   -300    20100      27700  
company8    2700   2100    20100      27700  

[8 rows x 28 columns]

Answer 2

pd.wide_to_long() 是你的朋友，它将分层命名的列提取到类似 melt 的输出中：

pd.wide_to_long(df, ['2017','2018','2019','2020'],
    sep='_', suffix=r'(profit|loss|other)', i='companies', j='value')

                  2017  2018  2019  2020
companies value                         
company1  profit  3000  3500  3000  4400
company2  profit  2900  3300  3000  3200
company3  profit  3100  2900  3500  5000
company4  profit  2000  4100  3000  4400
company5  profit  1400  2000  2500  3000
company6  profit  2000  1500  2000  3000
company7  profit  2700  3000  2000  2000
company8  profit  1300  1600  3000  2700
company1  loss     500  2000  3000  3000
company2  loss     600  1400  2000  1700
company3  loss     900  2000   400  3100
company4  loss    2000  1800  3000  2200
company5  loss     100  3000   800  1200
company6  loss     800  2000   600  3400
company7  loss    1500  2000  5000  3500
company8  loss    1100  1800  2000  2000
company1  other   3000  5000  4000  4000
company2  other   2800  3400   300   500
company3  other   3200  2400  2000  1500
company4  other   3100   400  1900  1300
company5  other    500  1300  2000  1700
company6  other    800  4000  2000  3600
company7  other   1900  4400  3000  1200
company8  other   3000  3200   200  1400

剩下的就很简单了。

正在分析 python pandas 中的多个表

Analyzing multiple tables in python pandas

python

spreadsheet

dataframe

pandas

pandas-groupby