如何在 pandas 数据框中结合条件格式和 str.contains 来创建新列？

Question

我尝试根据新列中的文本在 pandas 数据框中添加新列，例如，这是我的数据：

 >>> data

 No    Description
 1     Extention Slack 1 Month
 2     Extention Slack 1 Year
 3     Slack 6 Month
 4     Slack 1 Year

我需要的是

 No    Description                 M    M+1   M+2  M+3  M+4   M+5  M+6 ... M+11
 1     Extention Slack 1 Month    1    0     0    0    0     0    0       0
 2     Extention Slack 1 Year     1    1     1    1    1     1    1       1
 3     Slack 6 Month              1    1     1    1    1     1    0       0
 4     Slack 3 Month              1    1     1    0    0     0    0       0

我做的是

import numpy as np
data['M'] = np.where(data['Description'].str.contains('1 Year'), 1, 0)

我该怎么做？

Answer 1

从“描述”列中，您想根据 {time} {time_label} 部分（如 1 Year 或 1 Month 推断在 12 个月内在何处填充 1 或 0。

这里有一种方法可以做你想做的事：

# create two temporary columns 
# time: holds the numeric value associated with time_label (month or year)
df['time'], df['time_label'] = df.Description.str.split().apply(lambda x: pd.Series(x[-2:])).values.T

# define the numeric equivalent of Month and Year 
mapping = {"Month":1, "Year":12}

for month in range(12):
    # if is only here to pretty print M, M+1, M+2, ...
    # you can remove it if you accept M+0, M+1, ...  
    if month == 0:
        df["M"] = np.where(df.time.astype(int)*df.time_label.map(mapping) >= month+1, 1, 0)
    else:
        df["M"+"+"+str(month)] = np.where(df.time.astype(int)*df.time_label.map(mapping) >= month+1, 1, 0)

一个完全可重现的例子：

import pandas as pd 
import numpy as np 
from StringIO import StringIO

data = """
 No    Description
 1     "Extention Slack 1 Month"
 2     "Extention Slack 1 Year"
 3     "Slack 6 Month"
 4     "Slack 3 Month"
"""
# StringIO(data) : to simulate reading the data 
# change df with your dataframe 
df = pd.read_table(StringIO(data), sep="\s+")

# create two temporary columns 
# time: holds the numeric value associated with time_label (month or year)
df['time'], df['time_label'] = df.Description.str.split().apply(lambda x: pd.Series(x[-2:])).values.T

# define the numeric equivalent of Month and Year 
mapping = {"Month":1, "Year":12}

for month in range(12):
    # if is only here to pretty print M, M+1, M+2, ... 
    if month == 0:
        df["M"] = np.where(df.time.astype(int)*df.time_label.map(mapping) >= month+1, 1, 0)
    else:
        df["M"+"+"+str(month)] = np.where(df.time.astype(int)*df.time_label.map(mapping) >= month+1, 1, 0)

# remove temporary columns 
df.drop(['time','time_label'], axis=1, inplace=True)

print(df)

输出：

   No              Description  M  M+1  M+2  M+3  M+4  M+5  M+6  M+7  M+8  \
0   1  Extention Slack 1 Month  1    0    0    0    0    0    0    0    0   
1   2   Extention Slack 1 Year  1    1    1    1    1    1    1    1    1   
2   3            Slack 6 Month  1    1    1    1    1    1    0    0    0   
3   4            Slack 3 Month  1    1    1    0    0    0    0    0    0   

   M+9  M+10  M+11  
0    0     0     0  
1    1     1     1  
2    0     0     0  
3    0     0     0

如何在 pandas 数据框中结合条件格式和 str.contains 来创建新列？

How to combine conditional formatting and str.contains in pandas dataframe to create new column?

conditional

python-3.x

pandas

anaconda