如何遍历 3-D 数组和多个数据帧？

Question

背景信息

multi_data 是一个 3d 数组，(10,5,5) 数组。对于这个例子 multi_data = np.arange(250).reshape(10,5,5) 10 个矩阵中的每一个都有 5X5 个状态 (A-E)。每个矩阵都是按顺序排列的，并以年为单位表示时间，增量为 1。从包含第 1 年矩阵值的矩阵 [0] 开始，直到第 10 年矩阵 [9]。

第 1 年 multi_data 的示例 multi_data[0]

array([[[  0,   1,   2,   3,   4],
        [  5,   6,   7,   8,   9],
        [ 10,  11,  12,  13,  14],
        [ 15,  16,  17,  18,  19],
        [ 20,  21,  22,  23,  24]],

客户通常会在几年内进行购买（不是在注册时立即进行），例如，这位客户在第 3 年进行了一次购买。因此，此客户的矩阵计算从第 3 年开始。每个用户都有一个 current_state (A-E)，我需要转换用户数据，以便我可以将其乘以矩阵。例如，用户 customer1 的当前状态为 B，因此金额是数组 customer1= np.array([0, 1000, 0, 0, 0])

中的第二个元素

数据框 1（客户）

cust_id|state|amount|year|
1      |   B | 1000 | 3
2      |   D | 500  | 2


multi_data = np.arange(250).reshape(10,5,5)
customer1= np.array([0, 1000, 0, 0, 0])
output = customer1
results = []
for arr in multi_data[3:4]: #customer purchases at year 3 hence I am multiplying customer1 by matrix at year 3
    output = output@arr
    results.append(output)

输出示例 results = [array([80000, 81000, 82000, 83000, 84000])]

然后我需要将结果乘以数据帧 2

dataframe_2

| year  | lim %
|   1   |  0.19
|   2   |  0.11
|   3   |  0.02
|   10  |  0.23

所以我在第 3 年将结果乘以 lim %。

dataframe2=dataframe2.loc[dataframe2['year'] == 3]

results=dataframe2['LimitPerc'].values * results

示例输出结果

[array([1600,1620,1640,1660,1680])]

然后我需要将这些结果乘以矩阵第 4 年，然后乘以 lim% 第 4 年，直到达到第 10 年。

像这样：

customer1= [array([1600,1620,1640,1660,1680])]
output = customer1
results = []
for arr in data[4:5]: #multiplying by year 4 matrix (multi_data)
    output = output@arr
    results.append(output)

dataframe2=dataframe2.loc[dataframe2['year'] == 4]

results=dataframe2['LimitPerc'].values * results

有没有更简单的手动操作更少的方法？我需要继续计算直到第 10 年，因为每个 customer.I 需要在每次计算后为每个客户保存结果。

附加信息： 我目前正在运行通过所有客户年份，如下所示，但我的问题是我有很多 vlookup 类型的计算，如 dataframe2 需要在每年之间为每个客户计算，我必须保存每次计算后每个客户的结果。

results_dict = {}
for _id, c, y in zip(cust_id ,cust_amt, year):
    results = []
    for m in multi_data[y:]:
        c = c @ m
        results.append(c)
    results_dict[_id] = results

Answer 1

不幸的是，由于您需要所有中间结果，我认为不可能优化这么多，所以您需要循环。如果您不需要中间结果，您可以预先计算直到第 10 年的每一年的矩阵乘积。但是，它在这里没有用。

要将 look-ups 集成到您的循环中，您可以将所有数据帧放在一个列表中，然后使用 DataFrame 索引来查询值。此外，您可以将 state 转换为整数索引。请注意，您不需要创建 customer1 向量。由于只有一个位置是non-zero，所以可以直接把矩阵的相关行提取出来乘以amount。

示例数据：

import pandas as pd
import numpy as np

customer_data = pd.DataFrame({"cust_id": [1, 2, 3, 4, 5, 6, 7, 8],
                              "state": ['B', 'E', 'D', 'A', 'B', 'E', 'C', 'A'],
                              "cust_amt": [1000,300, 500, 200, 400, 600, 200, 300],
                              "year":[3, 3, 4, 3, 4, 2, 2, 4],
                              "group":[10, 25, 30, 40, 55, 60, 70, 85]})

state_list = ['A','B','C','D','E']

# All lookups should be dataframes with the year and/or group and the value like these.
lookup1 = pd.DataFrame({'year': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                       'lim %': 0.1})
lookup2 = pd.concat([pd.DataFrame({'group':g, 'lookup_val': 0.1, 'year':range(1, 11)} 
                                  for g in customer_data['group'].unique())]).explode('year')

multi_data = np.arange(250).reshape(10,5,5)

预处理：

# Put all lookups in order of calculation in this list.
lookups = [lookup1, lookup2]

# Preprocessing.
# Transform the state to categorical code to use it as array index.
customer_data['state'] = pd.Categorical(customer_data['state'], 
                                        categories=state_list, 
                                        ordered=True).codes

# Set index on lookups.
for i in range(len(lookups)):
    if 'group' in lookups[i].columns:
        lookups[i] = lookups[i].set_index(['year', 'group'])
    else:
        lookups[i] = lookups[i].set_index(['year'])

计算结果：

results = {}
for customer, state, amount, start, group in customer_data.itertuples(name=None, index=False):
    for year in range(start, len(multi_data)+1):
        if year == start:
            results[customer] = [[amount * multi_data[year-1, state, :]]]
        else:
            results[customer].append([results[customer][-1][-1] @ multi_data[year-1]])
                
        for lookup in lookups:
            if isinstance(lookup.index, pd.MultiIndex):
                value = lookup.loc[(year, group)].iat[0]
            else:
                value = lookup.loc[year].iat[0]
            results[customer][-1].append(value * results[customer][-1][-1])

访问结果：

# Here are examples of how you obtain the results from the dictionary:

# Customer 1 at start year, first result.
results[1][0][0]

# Customer 1 at start year, second result (which would be after lookup1 here).
results[1][0][1]

# Customer 1 at start year, third result (which would be after lookup2 here).
results[1][0][2]

# Customer 1 at year start+1, first result.
results[1][1][0]

# ...

# Customer c at year start+y, result k+1.
results[c][y][k]

如何遍历 3-D 数组和多个数据帧？

How to loop through a 3-D array and multiple dataframes?

python

arrays

loops

numpy

pandas

示例数据：

预处理：

计算结果：

访问结果：