生成随机值并根据 pandas 中的条件将它们映射到列

Generate random values and map them to a column based on condition in pandas

我正在尝试生成一个综合数据集。我已经设法生成了几列,但我需要根据另一列的条件生成一列随机数。

def create_trans_dataset(num=1):
    output=[
            {"trans_date": np.random.choice(check),
             "trans_details":np.random.choice(["airtime_purchase",
                                               "customer_transfer",
                                               "deposit_funds",
                                               "withdrawal_amount"],
                                              p=[0.2, 0.2, 0.2, 0.1, 0.1, 0.2]),
             "trans_status": np.random.choice(["completed", "reversed",
                                               "procesing"],
                                               p=[0.9, 0.05, 0.05])
           }
            for x in range(num)
          ]
    return output

trans_dataset = pd.DataFrame(create_dataset(num=20))

def map_values(row, values_dict):
    return values_dict[row]

values_dict = {"airtime_purchase": random.randint(5, 5000),
               "customer_transfer": random.randint(100, 35000),
               "deposit_funds": random.randint(100, 35000),
               "withdrawal": random.randint(100, 35000)
            }

df['trans_details'] = df['trans_details'].apply(map_values, args = (values_dict,))

我目前的解决方案是为 "airtime_purchase"、"customer_transfer"、"deposit_funds" 和 "withdrawal" 生成一个常数。 我当前的输出是

trans_date  trans_details           trans_status    amount_transacted
0   2020-02-27  customer_transfer   completed        30165
1   2020-03-03  airtime_purchase    completed        14945
2   2020-01-02  withdrawal          completed        14595
3   2020-01-01  withdrawal          completed        26700
4   2020-02-18  airtime_purchase    completed        22860
5   2020-02-22  airtime_purchase    completed        17930
6   2020-01-01  airtime_purchase    completed        24370
7   2020-01-20  customer_transfer   completed        8735
8   2020-03-12  deposit_funds       completed        1065
9   2020-03-20  airtime_purchase    completed        27170

我想要的输出是为所有 customer_transfers、airtime_purchases、deposit_funds 和取款生成一个随机数,如下所示。

trans_date  trans_details           trans_status    amount_transacted
0   2020-02-27  customer_transfer   completed        3015
1   2020-03-03  airtime_purchase    completed        1495
2   2020-01-02  withdrawal          completed        1595
3   2020-01-01  withdrawal          completed        2600
4   2020-02-18  airtime_purchase    completed        2890
5   2020-02-22  airtime_purchase    completed        930
6   2020-01-01  airtime_purchase    completed        370
7   2020-01-20  customer_transfer   completed        9635
8   2020-03-12  deposit_funds       completed        5005
9   2020-03-20  airtime_purchase    completed        2817

我想你可以简单地做:

def create_trans_dataset(num=1):
    output=[
            {"trans_date": np.random.randint(0,100),
             "trans_details":np.random.choice(["airtime_purchase",
                                               "customer_transfer",
                                               "deposit_funds",
                                               "withdrawal"],
                                              p=[0.2, 0.2, 0.2, 0.4]),
             "trans_status": np.random.choice(["completed", "reversed",
                                               "procesing"],
                                               p=[0.9, 0.05, 0.05])
           }
            for x in range(num)
          ]
    return output

trans_dataset = pd.DataFrame(create_trans_dataset(num=100))
trans_dataset['original_trans_details'] = trans_dataset['trans_details'].copy()

count = trans_dataset.trans_details.value_counts()
trans_dataset.loc[trans_dataset.trans_details!='airtime_purchase','trans_details'] = np.random.randint(100, 35000, count.sum()-count['airtime_purchase'])
trans_dataset.loc[trans_dataset.trans_details=='airtime_purchase','trans_details'] = np.random.randint(5, 5000, count['airtime_purchase'])

这会为 customer_transfer、deposit_funds 生成随机数,提取 100-35000 之间的所有不同和随机数 airtime_purchase 在 5-5000 之间的所有不同