需要帮助将嵌套数组解包到 pandas 数据帧中

Need help unpacking a nested array into a pandas dataframe

我是 运行 一些生成具有以下形状的数组的代码(长度为 18433、17、600 到 885)。我需要将其解压缩到一个 pandas 数据帧中,其中包含 17 列和包含 18433 个实体数据的行,每个实体具有 600 到 885 个时间序列条目。生成数组的代码如下所示。我是一个相对python的新手,已经达到了我的技术水平。我尝试使用 for 循环解包,但它需要很长时间。有没有更有效的库或方法?

# Generate full monthly cash flow arrays    
# define constant input parameters

eloss = 0
weight = 1.0
prod_wt = 1.0
inv_wt = 1.0
stx_oil = 0.0795
stx_gas = 0.0795
stx_ngl = 0.0795
adval = 0
aban = 150000

# Create function for slicing the volume array and calculating the monthly cash flow
def econ_ncf_iter(r):    
    econ_ncf_iter = econ_cf(index = r, uid = prop_list.loc[r, 'PROPNUM'], wi = prop_list.loc[r, 'WI'], 
                            nri = prop_list.loc[r, 'NRI'], roy = prop_list.loc[r, 'Royalty'], eloss = eloss, 
                            weight = weight, prod_wt = prod_wt, inv_wt = inv_wt, 
                            shrink = np.round(prop_list.loc[r, 'SHRINK'] / 100, 6), 
                            btu = np.round(prop_list.loc[r, 'BTU'] / 1000, 6), 
                            ngl_yield = np.round(prop_list.loc[r, 'NGL/GAS'], 6), 
                            pri_oil = np.extract(oilprice[r][0] == prop_list.loc[r, 'PROPNUM'], oilprice[r][1]),
                            pri_gas = np.extract(gasprice[r][0] == prop_list.loc[r, 'PROPNUM'], gasprice[r][1]),
                            paj_oil = prop_list.loc[r, 'PAJ_OIL'], 
                            paj_gas = np.extract(gasdiff[r][0] == prop_list.loc[r, 'PROPNUM'], gasdiff[r][1]), 
                            paj_ngl = prop_list.loc[r, 'PAJ_NGL'], stx_oil = stx_oil, stx_gas = stx_gas, stx_ngl = stx_ngl,
                            adval = adval, opc_fix = np.round(prop_list.loc[r, 'OPC/T'], 2), 
                            opc_oil = np.round(prop_list.loc[r, 'OIL_OPEX'], 2), 
                            opc_gas = np.round(prop_list.loc[r, 'GAS_OPEX'], 2), 
                            capex = np.round(prop_list.loc[r, 'CAPITAL'] * 1000, 2), aban = aban)
    return econ_ncf_iter

# generate net cash flow array

econ_ncf = lambda r: econ_ncf_iter(r)
vecon_ncf = np.vectorize(econ_ncf_iter, otypes = [object])
ncf_arr_packed = vecon_ncf(R)

我想通了,这很容易 '''

ncf_pd_dflist = []
columns = ['UID', 'Month', 'Grs Oil', 'Grs Gas', 'Net Oil', 'Net Gas', 'Net NGL', 'Oil Revenue', 'Gas Revenue', 
       'NGL Revenue', 'Total Revenue', 'Total Tax', 'OPEX', 'Operating Income', 'Cumulative Op CF', 'Net Cashflow',
       'Cumulative Net CF']

pbar = tqdm(len(R))
for r in R:
    ncf_pd_dflist.append(pd.DataFrame(np.transpose(ncf_arr_packed[r])))
    pbar.update()
ncf_pd = pd.concat(ncf_pd_dflist)
ncf_pd.columns = columns
pbar.close()

''' 循环遍历数组并创建 pandas 数据帧列表的简单代码。循环完成后,我将数据帧列表连接成一个数据帧。这大约需要 5 秒才能完成。

虽然您已经找到了解决方案,但这里有一个没有显式循环的通用替代方案。它需要一些简单的步骤:

  • 如果所需的水平轴(您的中间轴)不是最后一个,请交换它们。
  • 重塑为水平行的二维数组。
  • 使用其他轴的笛卡尔积的 MultiIndex 创建 DataFrame。

假设数组是arr:

x, y, z = arr.shape
df = pd.DataFrame(arr.swapaxes(1, 2).reshape(x*z, -1),
                  pd.MultiIndex.from_product([np.arange(x), np.arange(z)]))