计算不同类型的支出 - Pandas/Numpy - Python

Calculate Different Types of Spend - Pandas/Numpy - Python

我有 2 个数据框:

df1
+------------+-------------+------+
| Product ID | Cost Method | Rate |
+------------+-------------+------+
|         10 | CPM         | 10   |
|         20 | CPC         | 0.3  |
|         30 | CPCV        | 0.4  |
|         40 | FLF         | 100  |
|         50 | VAD         | 0    |
|         60 | CPM         | 0.1  |
+------------+-------------+------+

df2
+--------+------------+-------------+--------+-----------------+
|  Date  | Product ID | Impressions | Clicks | Completed Views |
+--------+------------+-------------+--------+-----------------+
| 01-Jan |         10 |         300 |      4 |               0 |
| 02-Jan |         20 |          30 |      3 |               0 |
| 03-Jan |         30 |         200 |      4 |              20 |
| 02-Jan |         40 |         300 |      4 |               0 |
| 02-Jan |         40 |         500 |      4 |               0 |
| 03-Jan |         40 |         200 |      3 |               0 |
| 04-Jan |         90 |        3000 |      3 |               0 |
| 05-Jan |         50 |        3000 |      5 |               0 |
+--------+------------+-------------+--------+-----------------+

理想的输出是这样的:

 +--------+------------+-------------+--------+-----------------+--------+
|  Date  | Product ID | Impressions | Clicks | Completed Views | Spend  |
+--------+------------+-------------+--------+-----------------+--------+
| 01-Jan |         10 |         300 |      4 |               0 |      |
| 02-Jan |         20 |          30 |      3 |               0 |      |
| 03-Jan |         30 |         200 |      4 |              20 |      |
| 02-Jan |         40 |         300 |      4 |               0 |     |
| 02-Jan |         40 |         500 |      4 |               0 |     |
| 03-Jan |         40 |         200 |      3 |               0 |  $-    |
| 04-Jan |         90 |        3000 |      3 |               0 |  $-    |
| 05-Jan |         50 |        3000 |      5 |               0 |  $-    |
+--------+------------+-------------+--------+-----------------+--------+

其中:

  1. 产品通过其 ID 匹配如果 ID 无法匹配,则 产品支出计算为 0
  2. 其中 FLF 计算为 该产品每天的总展示次数之和,如果该总和 超过某个最低限度,例如600 次展示,然后是速率 被申请;被应用。如果同一天有两个或更多条目,则 比率除以它出现在 同一天
  3. 其中,如果产品是 VAD,则支出为 0
  4. 其中 CPC 的计算方式是费率乘以点击次数
  5. 每千次展示费用的计算公式为费率*(展示次数 / 1000)

我会回答你,尽管我真的不应该。您是 Stack Overflow (SO) 的新手,所以让这成为一个教育 post。请放心,post 的语气并非居高临下或刺耳。


首先,要提出正确的问题(请阅读 this),您需要做两件事:

  • 解释您尝试过的方法(提供代码示例!)并解释您的问题所在。您当前格式的问题绝对不符合要求。里面有5、6个完全不同的东西,感觉就是找人做作业。
  • 提供一个可行的例子。

对于可行的例子,你有点这样做了,但是你选择的格式真的很烦人,因为不能直接使用 pd.read_clipboard() 来加载数据。这里的人们 志愿服务 他们的时间,如果他们不得不花 5 或 10 分钟重新创建您的数据,他们可能不会这样做。

我会这样做:

这是第一个数据帧,使用df1 = pd.read_clipboard(index_col=0)加载它:

ProductID      CostMethod   Rate

10               CPM   10.0
20               CPC    0.3
30              CPCV    0.4
40               FLF  100.0
50               VAD    0.0
60               CPM    0.1

这是第二个数据帧,使用df2 = pd.read_clipboard(index_col=0)加载它:

ProductID  Date  Impressions  Clicks  CompletedViews
10         01-Jan          300       4               0
20         02-Jan           30       3               0
30         03-Jan          200       4              20
40         02-Jan          300       4               0
40         02-Jan          500       4               0
40         03-Jan          200       3               0
90         04-Jan         3000       3               0
50         05-Jan         3000       5               0

现在,就您的作业而言,这里有一个建议的解决方案。我相信您会尝试理解这段代码的作用,而不仅仅是重用它。

第 1 步:合并两个数据帧

我在 df2 上向左合并,这真的很重要。在 Merging

的 pandas 文档中阅读更多内容
df3 = df2.merge(df1, left_index=True, right_index=True, how='left')
df3

第 2 步:计算您的支出

我们将编写一个自定义函数,然后执行 dataframe.apply

def calc_spend(row):
    """
    Accepts a row of the dataframe (df3.apply(calc_spend, axis=1)),
    and computes the spend according to these rules:
    * If costMethod is NaN, then zero
    * Where FLF is calculated as the sum of total impressions for that product per day, 
        and if that sums is over a certain minimum limit, 
        e.g. 600 impressions, then the rate is applied. 
        If there are two or more entries for the same day, 
        then the rate is divided equally by the count of times it appears in the same day
    * Where, if a product is VAD, then the spend is 0
    * Where CPC is calculated as the rate times the number of clicks
    * Where CPM is calculated as rate*(impression / 1000)
    """

    if row.CostMethod == 'FLF':
        # Calc the sum of total impressions for that product
        # I'm using boolean indexing to select the rows where both productID and Date
        # are the same as the current row
        filterdateproductid = (df3.Date == row.Date) & (df3.index == row.name)
        total_impressions = df3.ix[filterdateproductid, 'Impressions'].sum()
        if total_impressions < 600:
            spend = total_impressions
        else:
            count = df3.ix[filterdateproductid].shape[0]
            rate = row.Rate / count # If you use python 2.7 make sure you do "from future import division"
            spend = rate * total_impressions / 1000.0

    elif row.CostMethod == 'VAD':
        spend = 0

    elif row.CostMethod == 'CPC':
        spend = row.Rate * row.Clicks

    elif row.CostMethod == 'CPM':
        spend = row.Rate * row.Impressions / 1000.0

    else: # Includes the case where the costMethod is Na
        spend = 0

    return spend

现在我们可以直接应用函数本身了:

df3['Spend'] = df3.apply(calc_spend, axis=1)
df3

您可能会注意到我计算的 "Spend" 与您的不完全相同,但这是因为您对如何计算它的初始规格不是很好。您可以轻松更改 calc_spend 函数以满足您的要求。