如何合并 Python 中某些列的值
How to Merge Values from Some Columns in Python
我在下面的 Python 中有一个数据框:
import pandas as pd
df = pd.DataFrame({
'CRDACCT_DLQ_CYC_1_MNTH_AGO' : [3, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'CRDACCT_DLQ_CYC_2_MNTH_AGO': [4, 3, 3, 3, 3, 3, 2, 0, 5, 4, 3, 2, 0, 2, 2, 2, 2, 2, 2, 0, 2, 2, 0, 2],
'CRDACCT_DLQ_CYC_3_MNTH_AGO': [8, 7, 6, 5, 4, 3, 2, 'F', 'F', 0, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'F', 'C', 'C', 'F', 'F'],
'CRDACCT_DLQ_CYC_4_MNTH_AGO' : [0, 2, 'F', 'F', 'C', 'C', 'C', 'C', 0, 2, 0, 2, 0, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'F', 'C', 'F'],
'CRDACCT_DLQ_CYC_5_MNTH_AGO' : [2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'CRDACCT_DLQ_CYC_6_MNTH_AGO' : [2, 2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 0, 2, 0, 2, 0],
'CRDACCT_DLQ_CYC_7_MNTH_AGO' : [3, 3, 2, 'C', 'C', 'C', 'F', 0, 6, 5, 4, 3, 2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'CRDACCT_DLQ_CYC_8_MNTH_AGO' : [5, 4, 4, 3, 3, 2, 3, 2, 2, 2, 1, 2, 0, 2, 'C', 'C', 0, 2, 2, 2, 'C', 'C', 0, 'Z'],
'CRDACCT_DLQ_CYC_9_MNTH_AGO' : [2, 2, 'C', 0, 2, 0, 2, 'C', 'C', 'C', 'C', 'C', 0, 3, 2, 'C', 'F', 'C', 'F', 'F', 'F', 'F', 'F', 'F'],
'CRDACCT_DLQ_CYC_10_MNTH_AGO' : [5, 4, 3, 2, 3, 2, 0, 2, 0, 2, 'C', 'C', 'F', 2, 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'C'],
'CRDACCT_DLQ_CYC_11_MNTH_AGO' : [4, 3, 2, 'F', 2, 0, 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z'],
'CRDACCT_DLQ_CYC_12_MNTH_AGO' : ['F', 8, 7, 6, 5, 4, 3, 2, 'C', 'C', 'C', 0, 2, 'C', 'C', 0, 2, 0, 3, 2, 'C', 'C', 'F', 2]
})
df.head()
我想创建一个新列,其中包含 CRDACCT_DLQ_CYC_1_MNTH_AGO、CRDACCT_DLQ_CYC_2_MNTH_AGO、.....、CRDACCT_DLQ_CYC_12_MNTH_AGO 的合并值 .假设新列命名为 HISTORY_DLQ.
如果我打印那个新列,预期结果如下所示:
print(df['HISTORY_DLQ'])
#Output consists 24 rows of merging values of each column CRDACCT_DLQ_CYC_1_MNTH_AGO,..., CRDACCT_DLQ_CYC_12_MNTH_AGO.
[34802235254F,237222342438, C36FC224C327,...,C2FFC0CZFCZ2]
将您的列转换为字符串,然后连接每行列:
df['HISTORY_DLQ'] = df.astype(str).apply(''.join, axis=1)
print(df['HISTORY_DLQ'])
# Output:
0 34802235254F
1 237222342438
2 C36FC224C327
3 C35FCCC302F6
4 C34CCCC32325
5 C33CCCC20204
6 C22CCCF320Z3
7 C0FCCC02C2Z2
8 C5F0CC62C0ZC
9 C402CC52C2ZC
10 C3C0CC41CCZC
11 C2C2CC32CCZ0
12 C0C0CC200FZ2
13 C2C2CC2232ZC
14 C2CCCCCC2FZC
15 C2CCCCCCCFZ0
16 C2CCCCC0FFZ2
17 C2CCCCC2CFZ0
18 C2CCCCC2FFZ3
19 C0FCC0C2FFZ2
20 C2CCC2CCFFZC
21 C2CFC0CCFFZC
22 C0FCC2C0FFZF
23 C2FFC0CZFCZ2
dtype: object
您可以将列转换为字符串,然后使用 .sum()
跨列连接每一行的字符串,如下所示:
df['HISTORY_DLQ'] = df.astype(str).sum(axis=1)
字符串上的 .sum()
就像您对 'abc' + 'def'
所做的那样,并将字符串连接成 'abcdef'
。在 axis=1
上使用它时,它适用于跨列的每一行。这样,就达到了我们想要的结果。
如果您的数据框包含您不想合并其值的其他列,您可以在我们应用上述逻辑之前通过 .filter()
仅过滤相关列:
df['HISTORY_DLQ'] = df.filter(regex=r'CRDACCT_DLQ_CYC_\d+_MNTH_AGO').astype(str).sum(axis=1)
在这里,我们使用正则表达式 r'CRDACCT_DLQ_CYC_\d+_MNTH_AGO'
过滤以 CRDACCT_DLQ_CYC_
开头后跟一位或多位数字 \d+
的列名,然后是 _MNTH_AGO
表示第 12 位列。
结果:
print(df['HISTORY_DLQ'])
0 34802235254F
1 237222342438
2 C36FC224C327
3 C35FCCC302F6
4 C34CCCC32325
5 C33CCCC20204
6 C22CCCF320Z3
7 C0FCCC02C2Z2
8 C5F0CC62C0ZC
9 C402CC52C2ZC
10 C3C0CC41CCZC
11 C2C2CC32CCZ0
12 C0C0CC200FZ2
13 C2C2CC2232ZC
14 C2CCCCCC2FZC
15 C2CCCCCCCFZ0
16 C2CCCCC0FFZ2
17 C2CCCCC2CFZ0
18 C2CCCCC2FFZ3
19 C0FCC0C2FFZ2
20 C2CCC2CCFFZC
21 C2CFC0CCFFZC
22 C0FCC2C0FFZF
23 C2FFC0CZFCZ2
Name: HISTORY_DLQ, dtype: object
我在下面的 Python 中有一个数据框:
import pandas as pd
df = pd.DataFrame({
'CRDACCT_DLQ_CYC_1_MNTH_AGO' : [3, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'CRDACCT_DLQ_CYC_2_MNTH_AGO': [4, 3, 3, 3, 3, 3, 2, 0, 5, 4, 3, 2, 0, 2, 2, 2, 2, 2, 2, 0, 2, 2, 0, 2],
'CRDACCT_DLQ_CYC_3_MNTH_AGO': [8, 7, 6, 5, 4, 3, 2, 'F', 'F', 0, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'F', 'C', 'C', 'F', 'F'],
'CRDACCT_DLQ_CYC_4_MNTH_AGO' : [0, 2, 'F', 'F', 'C', 'C', 'C', 'C', 0, 2, 0, 2, 0, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'F', 'C', 'F'],
'CRDACCT_DLQ_CYC_5_MNTH_AGO' : [2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'CRDACCT_DLQ_CYC_6_MNTH_AGO' : [2, 2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 0, 2, 0, 2, 0],
'CRDACCT_DLQ_CYC_7_MNTH_AGO' : [3, 3, 2, 'C', 'C', 'C', 'F', 0, 6, 5, 4, 3, 2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'CRDACCT_DLQ_CYC_8_MNTH_AGO' : [5, 4, 4, 3, 3, 2, 3, 2, 2, 2, 1, 2, 0, 2, 'C', 'C', 0, 2, 2, 2, 'C', 'C', 0, 'Z'],
'CRDACCT_DLQ_CYC_9_MNTH_AGO' : [2, 2, 'C', 0, 2, 0, 2, 'C', 'C', 'C', 'C', 'C', 0, 3, 2, 'C', 'F', 'C', 'F', 'F', 'F', 'F', 'F', 'F'],
'CRDACCT_DLQ_CYC_10_MNTH_AGO' : [5, 4, 3, 2, 3, 2, 0, 2, 0, 2, 'C', 'C', 'F', 2, 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'C'],
'CRDACCT_DLQ_CYC_11_MNTH_AGO' : [4, 3, 2, 'F', 2, 0, 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z'],
'CRDACCT_DLQ_CYC_12_MNTH_AGO' : ['F', 8, 7, 6, 5, 4, 3, 2, 'C', 'C', 'C', 0, 2, 'C', 'C', 0, 2, 0, 3, 2, 'C', 'C', 'F', 2]
})
df.head()
我想创建一个新列,其中包含 CRDACCT_DLQ_CYC_1_MNTH_AGO、CRDACCT_DLQ_CYC_2_MNTH_AGO、.....、CRDACCT_DLQ_CYC_12_MNTH_AGO 的合并值 .假设新列命名为 HISTORY_DLQ.
如果我打印那个新列,预期结果如下所示:
print(df['HISTORY_DLQ'])
#Output consists 24 rows of merging values of each column CRDACCT_DLQ_CYC_1_MNTH_AGO,..., CRDACCT_DLQ_CYC_12_MNTH_AGO.
[34802235254F,237222342438, C36FC224C327,...,C2FFC0CZFCZ2]
将您的列转换为字符串,然后连接每行列:
df['HISTORY_DLQ'] = df.astype(str).apply(''.join, axis=1)
print(df['HISTORY_DLQ'])
# Output:
0 34802235254F
1 237222342438
2 C36FC224C327
3 C35FCCC302F6
4 C34CCCC32325
5 C33CCCC20204
6 C22CCCF320Z3
7 C0FCCC02C2Z2
8 C5F0CC62C0ZC
9 C402CC52C2ZC
10 C3C0CC41CCZC
11 C2C2CC32CCZ0
12 C0C0CC200FZ2
13 C2C2CC2232ZC
14 C2CCCCCC2FZC
15 C2CCCCCCCFZ0
16 C2CCCCC0FFZ2
17 C2CCCCC2CFZ0
18 C2CCCCC2FFZ3
19 C0FCC0C2FFZ2
20 C2CCC2CCFFZC
21 C2CFC0CCFFZC
22 C0FCC2C0FFZF
23 C2FFC0CZFCZ2
dtype: object
您可以将列转换为字符串,然后使用 .sum()
跨列连接每一行的字符串,如下所示:
df['HISTORY_DLQ'] = df.astype(str).sum(axis=1)
字符串上的 .sum()
就像您对 'abc' + 'def'
所做的那样,并将字符串连接成 'abcdef'
。在 axis=1
上使用它时,它适用于跨列的每一行。这样,就达到了我们想要的结果。
如果您的数据框包含您不想合并其值的其他列,您可以在我们应用上述逻辑之前通过 .filter()
仅过滤相关列:
df['HISTORY_DLQ'] = df.filter(regex=r'CRDACCT_DLQ_CYC_\d+_MNTH_AGO').astype(str).sum(axis=1)
在这里,我们使用正则表达式 r'CRDACCT_DLQ_CYC_\d+_MNTH_AGO'
过滤以 CRDACCT_DLQ_CYC_
开头后跟一位或多位数字 \d+
的列名,然后是 _MNTH_AGO
表示第 12 位列。
结果:
print(df['HISTORY_DLQ'])
0 34802235254F
1 237222342438
2 C36FC224C327
3 C35FCCC302F6
4 C34CCCC32325
5 C33CCCC20204
6 C22CCCF320Z3
7 C0FCCC02C2Z2
8 C5F0CC62C0ZC
9 C402CC52C2ZC
10 C3C0CC41CCZC
11 C2C2CC32CCZ0
12 C0C0CC200FZ2
13 C2C2CC2232ZC
14 C2CCCCCC2FZC
15 C2CCCCCCCFZ0
16 C2CCCCC0FFZ2
17 C2CCCCC2CFZ0
18 C2CCCCC2FFZ3
19 C0FCC0C2FFZ2
20 C2CCC2CCFFZC
21 C2CFC0CCFFZC
22 C0FCC2C0FFZF
23 C2FFC0CZFCZ2
Name: HISTORY_DLQ, dtype: object