将重新编码模式应用于许多列
Apply recode pattern to many columns
我有一个包含以下列的数据框:
Name, Year, V1, V2, V5, V10, V12...
此 Table 包含大约 40 个 Vx 变量。这些变量的值可以是 1-5。我想重新编码它们
1-3 = 0 and
4-5 = 1
我知道如何像这样替换一列的数据
Table['V1_F'] = Table['V1'].apply(lambda x: 0 if x <4 else 1)
但我不知道如何在多个列上有效地应用它,或者现在是否有办法为每一列编写这个替换代码?
最好的做法是“对除 Name
和 Year
.
之外的所有列执行此操作
欢迎任何帮助。
获取所有列名到变量并比较布尔掩码,然后通过转换为整数将 True/False
转换为 1/0
:
cols = Table.columns.difference(['Name','Year'])
Table[cols] = (Table[cols] >= 4).astype(int)
或通过numpy.where
:
Table[cols] = np.where(Table[cols] < 4, 0, 1)
下面列出了两种可能的解决方案
- applymap 如果需要更复杂的功能
- 你的逻辑是二进制的,二进制真值矩阵并变回整数表示
df = pd.DataFrame({**{"Name":np.random.choice(["this","that","other"],15),"Year":np.random.choice(range(1990,2021),15)},
**{f"V{i}":np.random.randint(1,5,15) for i in range(10)}})
df2 = df.copy()
# solution 1
df.loc[:,[c for c in df.columns if c.startswith("V")]] = df.loc[:,[c for c in df.columns if c.startswith("V")]].applymap(lambda v: 0 if v<=3 else 1)
# solution 2
df2.loc[:,[c for c in df2.columns if c.startswith("V")]] = (df2.loc[:,[c for c in df2.columns if c.startswith("V")]]<=3).astype(int)
Name
Year
V0
V1
V2
V3
V4
V5
V6
V7
V8
V9
this
1998
0
1
0
0
1
0
0
0
0
0
that
2010
1
0
0
0
0
1
0
0
1
0
this
2004
0
0
0
0
1
0
0
1
0
0
this
1992
0
1
1
0
0
1
0
0
1
1
this
1990
0
0
1
0
0
0
0
0
0
1
this
2020
0
0
1
1
0
1
0
1
0
1
this
2016
0
1
0
0
0
0
1
0
1
0
other
1997
1
0
0
0
1
1
0
0
1
0
that
2000
1
0
1
0
0
1
1
0
0
0
that
2020
0
0
1
0
1
0
0
0
0
1
that
1991
0
0
0
0
0
0
1
0
0
1
other
2015
0
0
0
0
0
0
1
1
0
0
this
2020
0
0
0
1
0
0
0
0
0
0
other
2005
1
0
0
0
1
0
1
0
0
0
other
2008
1
0
0
0
0
0
1
0
0
0
我有一个包含以下列的数据框:
Name, Year, V1, V2, V5, V10, V12...
此 Table 包含大约 40 个 Vx 变量。这些变量的值可以是 1-5。我想重新编码它们
1-3 = 0 and
4-5 = 1
我知道如何像这样替换一列的数据
Table['V1_F'] = Table['V1'].apply(lambda x: 0 if x <4 else 1)
但我不知道如何在多个列上有效地应用它,或者现在是否有办法为每一列编写这个替换代码?
最好的做法是“对除 Name
和 Year
.
欢迎任何帮助。
获取所有列名到变量并比较布尔掩码,然后通过转换为整数将 True/False
转换为 1/0
:
cols = Table.columns.difference(['Name','Year'])
Table[cols] = (Table[cols] >= 4).astype(int)
或通过numpy.where
:
Table[cols] = np.where(Table[cols] < 4, 0, 1)
下面列出了两种可能的解决方案
- applymap 如果需要更复杂的功能
- 你的逻辑是二进制的,二进制真值矩阵并变回整数表示
df = pd.DataFrame({**{"Name":np.random.choice(["this","that","other"],15),"Year":np.random.choice(range(1990,2021),15)},
**{f"V{i}":np.random.randint(1,5,15) for i in range(10)}})
df2 = df.copy()
# solution 1
df.loc[:,[c for c in df.columns if c.startswith("V")]] = df.loc[:,[c for c in df.columns if c.startswith("V")]].applymap(lambda v: 0 if v<=3 else 1)
# solution 2
df2.loc[:,[c for c in df2.columns if c.startswith("V")]] = (df2.loc[:,[c for c in df2.columns if c.startswith("V")]]<=3).astype(int)
Name | Year | V0 | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 |
---|---|---|---|---|---|---|---|---|---|---|---|
this | 1998 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
that | 2010 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
this | 2004 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
this | 1992 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
this | 1990 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
this | 2020 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
this | 2016 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
other | 1997 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |
that | 2000 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
that | 2020 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
that | 1991 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
other | 2015 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
this | 2020 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
other | 2005 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 |
other | 2008 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |