如何合并 Python 中某些列的值

How to Merge Values from Some Columns in Python

我在下面的 Python 中有一个数据框:

import pandas as pd
df = pd.DataFrame({
    'CRDACCT_DLQ_CYC_1_MNTH_AGO' : [3, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'], 
    'CRDACCT_DLQ_CYC_2_MNTH_AGO': [4, 3, 3, 3, 3, 3, 2, 0, 5, 4, 3, 2, 0, 2, 2, 2, 2, 2, 2, 0, 2, 2, 0, 2], 
    'CRDACCT_DLQ_CYC_3_MNTH_AGO': [8, 7, 6, 5, 4, 3, 2, 'F', 'F', 0, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'F', 'C', 'C', 'F', 'F'], 
    'CRDACCT_DLQ_CYC_4_MNTH_AGO' : [0, 2, 'F', 'F', 'C', 'C', 'C', 'C', 0, 2, 0, 2, 0, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'F', 'C', 'F'], 
    'CRDACCT_DLQ_CYC_5_MNTH_AGO' : [2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'], 
    'CRDACCT_DLQ_CYC_6_MNTH_AGO' : [2, 2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 0, 2, 0, 2, 0], 
    'CRDACCT_DLQ_CYC_7_MNTH_AGO' : [3, 3, 2, 'C', 'C', 'C', 'F', 0, 6, 5, 4, 3, 2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'], 
    'CRDACCT_DLQ_CYC_8_MNTH_AGO' : [5, 4, 4, 3, 3, 2, 3, 2, 2, 2, 1, 2, 0, 2, 'C', 'C', 0, 2, 2, 2, 'C', 'C', 0, 'Z'], 
    'CRDACCT_DLQ_CYC_9_MNTH_AGO' : [2, 2, 'C', 0, 2, 0, 2, 'C', 'C', 'C', 'C', 'C', 0, 3, 2, 'C', 'F', 'C', 'F', 'F', 'F', 'F', 'F', 'F'], 
    'CRDACCT_DLQ_CYC_10_MNTH_AGO' : [5, 4, 3, 2, 3, 2, 0, 2, 0, 2, 'C', 'C', 'F', 2, 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'C'], 
    'CRDACCT_DLQ_CYC_11_MNTH_AGO' : [4, 3, 2, 'F', 2, 0, 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z'], 
    'CRDACCT_DLQ_CYC_12_MNTH_AGO' : ['F', 8, 7, 6, 5, 4, 3, 2, 'C', 'C', 'C', 0, 2, 'C', 'C', 0, 2, 0, 3, 2, 'C', 'C', 'F', 2]
})

df.head()

我想创建一个新列,其中包含 CRDACCT_DLQ_CYC_1_MNTH_AGO、CRDACCT_DLQ_CYC_2_MNTH_AGO、.....、CRDACCT_DLQ_CYC_12_MNTH_AGO 的合并值 .假设新列命名为 HISTORY_DLQ.

如果我打印那个新列,预期结果如下所示

print(df['HISTORY_DLQ'])

#Output consists 24 rows of merging values of each column CRDACCT_DLQ_CYC_1_MNTH_AGO,..., CRDACCT_DLQ_CYC_12_MNTH_AGO.
[34802235254F,237222342438, C36FC224C327,...,C2FFC0CZFCZ2]

将您的列转换为字符串,然后连接每行列:

df['HISTORY_DLQ'] = df.astype(str).apply(''.join, axis=1)
print(df['HISTORY_DLQ'])

# Output:
0     34802235254F
1     237222342438
2     C36FC224C327
3     C35FCCC302F6
4     C34CCCC32325
5     C33CCCC20204
6     C22CCCF320Z3
7     C0FCCC02C2Z2
8     C5F0CC62C0ZC
9     C402CC52C2ZC
10    C3C0CC41CCZC
11    C2C2CC32CCZ0
12    C0C0CC200FZ2
13    C2C2CC2232ZC
14    C2CCCCCC2FZC
15    C2CCCCCCCFZ0
16    C2CCCCC0FFZ2
17    C2CCCCC2CFZ0
18    C2CCCCC2FFZ3
19    C0FCC0C2FFZ2
20    C2CCC2CCFFZC
21    C2CFC0CCFFZC
22    C0FCC2C0FFZF
23    C2FFC0CZFCZ2
dtype: object

您可以将列转换为字符串,然后使用 .sum() 跨列连接每一行的字符串,如下所示:

df['HISTORY_DLQ'] = df.astype(str).sum(axis=1)
字符串上的

.sum() 就像您对 'abc' + 'def' 所做的那样,并将字符串连接成 'abcdef'。在 axis=1 上使用它时,它适用于跨列的每一行。这样,就达到了我们想要的结果。

如果您的数据框包含您不想合并其值的其他列,您可以在我们应用上述逻辑之前通过 .filter() 仅过滤相关列:

df['HISTORY_DLQ'] = df.filter(regex=r'CRDACCT_DLQ_CYC_\d+_MNTH_AGO').astype(str).sum(axis=1)

在这里,我们使用正则表达式 r'CRDACCT_DLQ_CYC_\d+_MNTH_AGO' 过滤以 CRDACCT_DLQ_CYC_ 开头后跟一位或多位数字 \d+ 的列名,然后是 _MNTH_AGO 表示第 12 位列。

结果:

print(df['HISTORY_DLQ'])

0     34802235254F
1     237222342438
2     C36FC224C327
3     C35FCCC302F6
4     C34CCCC32325
5     C33CCCC20204
6     C22CCCF320Z3
7     C0FCCC02C2Z2
8     C5F0CC62C0ZC
9     C402CC52C2ZC
10    C3C0CC41CCZC
11    C2C2CC32CCZ0
12    C0C0CC200FZ2
13    C2C2CC2232ZC
14    C2CCCCCC2FZC
15    C2CCCCCCCFZ0
16    C2CCCCC0FFZ2
17    C2CCCCC2CFZ0
18    C2CCCCC2FFZ3
19    C0FCC0C2FFZ2
20    C2CCC2CCFFZC
21    C2CFC0CCFFZC
22    C0FCC2C0FFZF
23    C2FFC0CZFCZ2
Name: HISTORY_DLQ, dtype: object