如何在 pandas 数据框中结合条件格式和 str.contains 来创建新列?
How to combine conditional formatting and str.contains in pandas dataframe to create new column?
我尝试根据新列中的文本在 pandas 数据框中添加新列,例如,这是我的数据:
>>> data
No Description
1 Extention Slack 1 Month
2 Extention Slack 1 Year
3 Slack 6 Month
4 Slack 1 Year
我需要的是
No Description M M+1 M+2 M+3 M+4 M+5 M+6 ... M+11
1 Extention Slack 1 Month 1 0 0 0 0 0 0 0
2 Extention Slack 1 Year 1 1 1 1 1 1 1 1
3 Slack 6 Month 1 1 1 1 1 1 0 0
4 Slack 3 Month 1 1 1 0 0 0 0 0
我做的是
import numpy as np
data['M'] = np.where(data['Description'].str.contains('1 Year'), 1, 0)
我该怎么做?
从“描述”列中,您想根据 {time} {time_label}
部分(如 1 Year
或 1 Month
推断在 12 个月内在何处填充 1 或 0。
这里有一种方法可以做你想做的事:
# create two temporary columns
# time: holds the numeric value associated with time_label (month or year)
df['time'], df['time_label'] = df.Description.str.split().apply(lambda x: pd.Series(x[-2:])).values.T
# define the numeric equivalent of Month and Year
mapping = {"Month":1, "Year":12}
for month in range(12):
# if is only here to pretty print M, M+1, M+2, ...
# you can remove it if you accept M+0, M+1, ...
if month == 0:
df["M"] = np.where(df.time.astype(int)*df.time_label.map(mapping) >= month+1, 1, 0)
else:
df["M"+"+"+str(month)] = np.where(df.time.astype(int)*df.time_label.map(mapping) >= month+1, 1, 0)
一个完全可重现的例子:
import pandas as pd
import numpy as np
from StringIO import StringIO
data = """
No Description
1 "Extention Slack 1 Month"
2 "Extention Slack 1 Year"
3 "Slack 6 Month"
4 "Slack 3 Month"
"""
# StringIO(data) : to simulate reading the data
# change df with your dataframe
df = pd.read_table(StringIO(data), sep="\s+")
# create two temporary columns
# time: holds the numeric value associated with time_label (month or year)
df['time'], df['time_label'] = df.Description.str.split().apply(lambda x: pd.Series(x[-2:])).values.T
# define the numeric equivalent of Month and Year
mapping = {"Month":1, "Year":12}
for month in range(12):
# if is only here to pretty print M, M+1, M+2, ...
if month == 0:
df["M"] = np.where(df.time.astype(int)*df.time_label.map(mapping) >= month+1, 1, 0)
else:
df["M"+"+"+str(month)] = np.where(df.time.astype(int)*df.time_label.map(mapping) >= month+1, 1, 0)
# remove temporary columns
df.drop(['time','time_label'], axis=1, inplace=True)
print(df)
输出:
No Description M M+1 M+2 M+3 M+4 M+5 M+6 M+7 M+8 \
0 1 Extention Slack 1 Month 1 0 0 0 0 0 0 0 0
1 2 Extention Slack 1 Year 1 1 1 1 1 1 1 1 1
2 3 Slack 6 Month 1 1 1 1 1 1 0 0 0
3 4 Slack 3 Month 1 1 1 0 0 0 0 0 0
M+9 M+10 M+11
0 0 0 0
1 1 1 1
2 0 0 0
3 0 0 0
我尝试根据新列中的文本在 pandas 数据框中添加新列,例如,这是我的数据:
>>> data
No Description
1 Extention Slack 1 Month
2 Extention Slack 1 Year
3 Slack 6 Month
4 Slack 1 Year
我需要的是
No Description M M+1 M+2 M+3 M+4 M+5 M+6 ... M+11
1 Extention Slack 1 Month 1 0 0 0 0 0 0 0
2 Extention Slack 1 Year 1 1 1 1 1 1 1 1
3 Slack 6 Month 1 1 1 1 1 1 0 0
4 Slack 3 Month 1 1 1 0 0 0 0 0
我做的是
import numpy as np
data['M'] = np.where(data['Description'].str.contains('1 Year'), 1, 0)
我该怎么做?
从“描述”列中,您想根据 {time} {time_label}
部分(如 1 Year
或 1 Month
推断在 12 个月内在何处填充 1 或 0。
这里有一种方法可以做你想做的事:
# create two temporary columns
# time: holds the numeric value associated with time_label (month or year)
df['time'], df['time_label'] = df.Description.str.split().apply(lambda x: pd.Series(x[-2:])).values.T
# define the numeric equivalent of Month and Year
mapping = {"Month":1, "Year":12}
for month in range(12):
# if is only here to pretty print M, M+1, M+2, ...
# you can remove it if you accept M+0, M+1, ...
if month == 0:
df["M"] = np.where(df.time.astype(int)*df.time_label.map(mapping) >= month+1, 1, 0)
else:
df["M"+"+"+str(month)] = np.where(df.time.astype(int)*df.time_label.map(mapping) >= month+1, 1, 0)
一个完全可重现的例子:
import pandas as pd
import numpy as np
from StringIO import StringIO
data = """
No Description
1 "Extention Slack 1 Month"
2 "Extention Slack 1 Year"
3 "Slack 6 Month"
4 "Slack 3 Month"
"""
# StringIO(data) : to simulate reading the data
# change df with your dataframe
df = pd.read_table(StringIO(data), sep="\s+")
# create two temporary columns
# time: holds the numeric value associated with time_label (month or year)
df['time'], df['time_label'] = df.Description.str.split().apply(lambda x: pd.Series(x[-2:])).values.T
# define the numeric equivalent of Month and Year
mapping = {"Month":1, "Year":12}
for month in range(12):
# if is only here to pretty print M, M+1, M+2, ...
if month == 0:
df["M"] = np.where(df.time.astype(int)*df.time_label.map(mapping) >= month+1, 1, 0)
else:
df["M"+"+"+str(month)] = np.where(df.time.astype(int)*df.time_label.map(mapping) >= month+1, 1, 0)
# remove temporary columns
df.drop(['time','time_label'], axis=1, inplace=True)
print(df)
输出:
No Description M M+1 M+2 M+3 M+4 M+5 M+6 M+7 M+8 \
0 1 Extention Slack 1 Month 1 0 0 0 0 0 0 0 0
1 2 Extention Slack 1 Year 1 1 1 1 1 1 1 1 1
2 3 Slack 6 Month 1 1 1 1 1 1 0 0 0
3 4 Slack 3 Month 1 1 1 0 0 0 0 0 0
M+9 M+10 M+11
0 0 0 0
1 1 1 1
2 0 0 0
3 0 0 0