Pandas: 根据条件插入行?

Pandas: Inserting rows based on conditions?

想象一下下面的数据集。 输入数据集

id      status   type  location     bb_count      vo_count   tv_count
123     open      r     hongkong       1             0          4      
456     open      r     hongkong       1             7          2
456     closed    p     India          0             6          1

输出数据集 如果任何(bb_count、tv_count、vo_count)大于 0,我需要插入一行产品类型。

id      status   type  location        product      
123     open      r     hongkong       bb            
123     open      r     hongkong       tv          
456     open      r     hongkong       bb             
456     open      r     hongkong       vo          
456     open      r     hongkong       tv             
456     closed    p     India          vo            
456     closed    p     India          rv             

我尝试了什么:

def insert_row(df):
    if df["bb_count"] > 0:
        print("inserting bb row")
    if df["tv_count"] > 0:
        print("inserting tv row")
    if df["vo_count"] > 0:
        print("inserting vo row")

df.apply(insert_row, axis=1)

但我没有得到准确的输出。

您根本没有在函数中更改数据框。您只是在打印一些语句。你真的不需要自定义函数来完成你想做的事情。

尝试:

  1. melt 创建所需结构的数据框。
  2. 过滤以保留值大于 0 的行。
  3. Re-format 根据需要添加“产品”列(删除“_count”)。
melted = df.melt(["id", "status", "type", "location"],
                 ["bb_count","vo_count","tv_count"],
                 var_name="product")
output = melted[melted["value"].gt(0)].drop("value",axis=1)
output["product"] = output["product"].str.replace("_count","")
                                         .replace({"bb": "broadband",
                                                   "vo":"fixedvoice",
                                                   "tv":"television"})

>>> output
    id  status type  location     product
0  123    open    r  hongkong   broadband
1  456    open    r  hongkong   broadband
4  456    open    r  hongkong  fixedvoice
5  456  closed    p     India  fixedvoice
6  123    open    r  hongkong  television
7  456    open    r  hongkong  television
8  456  closed    p     India  television

您的问题似乎是您正在检查整个列的值。尝试类似的东西:

def insert_row(df):
   for i in range(len(df)):
      if df["bb_count"][i] > 0:
         '''inserting bb row'''
      if df["tv_count"][i] > 0:
         '''inserting tv row'''
      if df["vo_count"][i] > 0:
         '''inserting vo row'''

      # continue with rest of function

另一种不用熔化来保持行有序的方法:

mapping = {'bb': 'broadband', 'vo': 'fixedvoice', 'tv': 'television'}

out = (
    df.set_index(['id', 'status', 'type', 'location'])
      .rename(columns=lambda x: x.split('_')[0])
      .rename(columns=mapping)
      .rename_axis(columns='product')
      .stack().loc[lambda x: x > 0]
      .index.to_frame(index=False)
)

输出:

>>> out
    id  status type  location     product
0  123    open    r  hongkong   broadband
1  123    open    r  hongkong  television
2  456    open    r  hongkong   broadband
3  456    open    r  hongkong  fixedvoice
4  456    open    r  hongkong  television
5  456  closed    p     India  fixedvoice
6  456  closed    p     India  television

Step-by-step:

>>> out = df.set_index(['id', 'status', 'type', 'location'])
                          bb_count  vo_count  tv_count
id  status type location                              
123 open   r    hongkong         1         0         4
456 open   r    hongkong         1         7         2
    closed p    India            0         6         1

>>> out = out.rename(columns=lambda x: x.split('_')[0])
                          bb  vo  tv
id  status type location            
123 open   r    hongkong   1   0   4
456 open   r    hongkong   1   7   2
    closed p    India      0   6   1

>>> out = out.rename(columns=mapping)
                          broadband  fixedvoice  television
id  status type location                                   
123 open   r    hongkong          1           0           4
456 open   r    hongkong          1           7           2
    closed p    India             0           6           1

>>> out = out.rename_axis(columns='product')
product                   broadband  fixedvoice  television
id  status type location                                   
123 open   r    hongkong          1           0           4
456 open   r    hongkong          1           7           2
    closed p    India             0           6           1

>>> out = out.stack().loc[lambda x: x > 0]
id   status  type  location  product   
123  open    r     hongkong  broadband     1
                             television    4
456  open    r     hongkong  broadband     1
                             fixedvoice    7
                             television    2
     closed  p     India     fixedvoice    6
                             television    1
dtype: int64

>>> out = out.index.to_frame(index=False)
    id  status type  location     product
0  123    open    r  hongkong   broadband
1  123    open    r  hongkong  television
2  456    open    r  hongkong   broadband
3  456    open    r  hongkong  fixedvoice
4  456    open    r  hongkong  television
5  456  closed    p     India  fixedvoice
6  456  closed    p     India  television

不是完美的编码,但它有效,你明白了:

df = pd.DataFrame(
    [
        ["open", "r", "Hongkong", 1, 0, 4],
        ["open", "r", "Hongkong", 1, 7, 2],
        ["closed", "p", "India", 0, 6, 1],
    ]
)

df_bb = df.iloc[:,[0,1,2,3]].rename(columns={0:"status", 1:"type", 2:"location", 3:"count"})
df_vo = df.iloc[:,[0,1,2,4]].rename(columns={0:"status", 1:"type", 2:"location", 4:"count"})
df_tv = df.iloc[:,[0,1,2,5]].rename(columns={0:"status", 1:"type", 2:"location", 5:"count"})

df_bb["product"] = df_bb["count"].map(lambda x: "bb" if x > 0 else 0)
df_vo["product"] = df_vo["count"].map(lambda x: "vo" if x > 0 else 0)
df_tv["product"] = df_tv["count"].map(lambda x: "tv" if x > 0 else 0)

df_combined = pd.concat(
    [df_bb, df_vo, df_tv]
)

df_final = df_combined[df_combined["product"] != 0].iloc[:,[0,1,2,4]]