Pandas: 根据条件插入行?
Pandas: Inserting rows based on conditions?
想象一下下面的数据集。
输入数据集
id status type location bb_count vo_count tv_count
123 open r hongkong 1 0 4
456 open r hongkong 1 7 2
456 closed p India 0 6 1
输出数据集
如果任何(bb_count、tv_count、vo_count)大于 0,我需要插入一行产品类型。
id status type location product
123 open r hongkong bb
123 open r hongkong tv
456 open r hongkong bb
456 open r hongkong vo
456 open r hongkong tv
456 closed p India vo
456 closed p India rv
我尝试了什么:
def insert_row(df):
if df["bb_count"] > 0:
print("inserting bb row")
if df["tv_count"] > 0:
print("inserting tv row")
if df["vo_count"] > 0:
print("inserting vo row")
df.apply(insert_row, axis=1)
但我没有得到准确的输出。
您根本没有在函数中更改数据框。您只是在打印一些语句。你真的不需要自定义函数来完成你想做的事情。
尝试:
melt
创建所需结构的数据框。
- 过滤以保留值大于 0 的行。
- Re-format 根据需要添加“产品”列(删除“_count”)。
melted = df.melt(["id", "status", "type", "location"],
["bb_count","vo_count","tv_count"],
var_name="product")
output = melted[melted["value"].gt(0)].drop("value",axis=1)
output["product"] = output["product"].str.replace("_count","")
.replace({"bb": "broadband",
"vo":"fixedvoice",
"tv":"television"})
>>> output
id status type location product
0 123 open r hongkong broadband
1 456 open r hongkong broadband
4 456 open r hongkong fixedvoice
5 456 closed p India fixedvoice
6 123 open r hongkong television
7 456 open r hongkong television
8 456 closed p India television
您的问题似乎是您正在检查整个列的值。尝试类似的东西:
def insert_row(df):
for i in range(len(df)):
if df["bb_count"][i] > 0:
'''inserting bb row'''
if df["tv_count"][i] > 0:
'''inserting tv row'''
if df["vo_count"][i] > 0:
'''inserting vo row'''
# continue with rest of function
另一种不用熔化来保持行有序的方法:
mapping = {'bb': 'broadband', 'vo': 'fixedvoice', 'tv': 'television'}
out = (
df.set_index(['id', 'status', 'type', 'location'])
.rename(columns=lambda x: x.split('_')[0])
.rename(columns=mapping)
.rename_axis(columns='product')
.stack().loc[lambda x: x > 0]
.index.to_frame(index=False)
)
输出:
>>> out
id status type location product
0 123 open r hongkong broadband
1 123 open r hongkong television
2 456 open r hongkong broadband
3 456 open r hongkong fixedvoice
4 456 open r hongkong television
5 456 closed p India fixedvoice
6 456 closed p India television
Step-by-step:
>>> out = df.set_index(['id', 'status', 'type', 'location'])
bb_count vo_count tv_count
id status type location
123 open r hongkong 1 0 4
456 open r hongkong 1 7 2
closed p India 0 6 1
>>> out = out.rename(columns=lambda x: x.split('_')[0])
bb vo tv
id status type location
123 open r hongkong 1 0 4
456 open r hongkong 1 7 2
closed p India 0 6 1
>>> out = out.rename(columns=mapping)
broadband fixedvoice television
id status type location
123 open r hongkong 1 0 4
456 open r hongkong 1 7 2
closed p India 0 6 1
>>> out = out.rename_axis(columns='product')
product broadband fixedvoice television
id status type location
123 open r hongkong 1 0 4
456 open r hongkong 1 7 2
closed p India 0 6 1
>>> out = out.stack().loc[lambda x: x > 0]
id status type location product
123 open r hongkong broadband 1
television 4
456 open r hongkong broadband 1
fixedvoice 7
television 2
closed p India fixedvoice 6
television 1
dtype: int64
>>> out = out.index.to_frame(index=False)
id status type location product
0 123 open r hongkong broadband
1 123 open r hongkong television
2 456 open r hongkong broadband
3 456 open r hongkong fixedvoice
4 456 open r hongkong television
5 456 closed p India fixedvoice
6 456 closed p India television
不是完美的编码,但它有效,你明白了:
df = pd.DataFrame(
[
["open", "r", "Hongkong", 1, 0, 4],
["open", "r", "Hongkong", 1, 7, 2],
["closed", "p", "India", 0, 6, 1],
]
)
df_bb = df.iloc[:,[0,1,2,3]].rename(columns={0:"status", 1:"type", 2:"location", 3:"count"})
df_vo = df.iloc[:,[0,1,2,4]].rename(columns={0:"status", 1:"type", 2:"location", 4:"count"})
df_tv = df.iloc[:,[0,1,2,5]].rename(columns={0:"status", 1:"type", 2:"location", 5:"count"})
df_bb["product"] = df_bb["count"].map(lambda x: "bb" if x > 0 else 0)
df_vo["product"] = df_vo["count"].map(lambda x: "vo" if x > 0 else 0)
df_tv["product"] = df_tv["count"].map(lambda x: "tv" if x > 0 else 0)
df_combined = pd.concat(
[df_bb, df_vo, df_tv]
)
df_final = df_combined[df_combined["product"] != 0].iloc[:,[0,1,2,4]]
想象一下下面的数据集。 输入数据集
id status type location bb_count vo_count tv_count
123 open r hongkong 1 0 4
456 open r hongkong 1 7 2
456 closed p India 0 6 1
输出数据集 如果任何(bb_count、tv_count、vo_count)大于 0,我需要插入一行产品类型。
id status type location product
123 open r hongkong bb
123 open r hongkong tv
456 open r hongkong bb
456 open r hongkong vo
456 open r hongkong tv
456 closed p India vo
456 closed p India rv
我尝试了什么:
def insert_row(df):
if df["bb_count"] > 0:
print("inserting bb row")
if df["tv_count"] > 0:
print("inserting tv row")
if df["vo_count"] > 0:
print("inserting vo row")
df.apply(insert_row, axis=1)
但我没有得到准确的输出。
您根本没有在函数中更改数据框。您只是在打印一些语句。你真的不需要自定义函数来完成你想做的事情。
尝试:
melt
创建所需结构的数据框。- 过滤以保留值大于 0 的行。
- Re-format 根据需要添加“产品”列(删除“_count”)。
melted = df.melt(["id", "status", "type", "location"],
["bb_count","vo_count","tv_count"],
var_name="product")
output = melted[melted["value"].gt(0)].drop("value",axis=1)
output["product"] = output["product"].str.replace("_count","")
.replace({"bb": "broadband",
"vo":"fixedvoice",
"tv":"television"})
>>> output
id status type location product
0 123 open r hongkong broadband
1 456 open r hongkong broadband
4 456 open r hongkong fixedvoice
5 456 closed p India fixedvoice
6 123 open r hongkong television
7 456 open r hongkong television
8 456 closed p India television
您的问题似乎是您正在检查整个列的值。尝试类似的东西:
def insert_row(df):
for i in range(len(df)):
if df["bb_count"][i] > 0:
'''inserting bb row'''
if df["tv_count"][i] > 0:
'''inserting tv row'''
if df["vo_count"][i] > 0:
'''inserting vo row'''
# continue with rest of function
另一种不用熔化来保持行有序的方法:
mapping = {'bb': 'broadband', 'vo': 'fixedvoice', 'tv': 'television'}
out = (
df.set_index(['id', 'status', 'type', 'location'])
.rename(columns=lambda x: x.split('_')[0])
.rename(columns=mapping)
.rename_axis(columns='product')
.stack().loc[lambda x: x > 0]
.index.to_frame(index=False)
)
输出:
>>> out
id status type location product
0 123 open r hongkong broadband
1 123 open r hongkong television
2 456 open r hongkong broadband
3 456 open r hongkong fixedvoice
4 456 open r hongkong television
5 456 closed p India fixedvoice
6 456 closed p India television
Step-by-step:
>>> out = df.set_index(['id', 'status', 'type', 'location'])
bb_count vo_count tv_count
id status type location
123 open r hongkong 1 0 4
456 open r hongkong 1 7 2
closed p India 0 6 1
>>> out = out.rename(columns=lambda x: x.split('_')[0])
bb vo tv
id status type location
123 open r hongkong 1 0 4
456 open r hongkong 1 7 2
closed p India 0 6 1
>>> out = out.rename(columns=mapping)
broadband fixedvoice television
id status type location
123 open r hongkong 1 0 4
456 open r hongkong 1 7 2
closed p India 0 6 1
>>> out = out.rename_axis(columns='product')
product broadband fixedvoice television
id status type location
123 open r hongkong 1 0 4
456 open r hongkong 1 7 2
closed p India 0 6 1
>>> out = out.stack().loc[lambda x: x > 0]
id status type location product
123 open r hongkong broadband 1
television 4
456 open r hongkong broadband 1
fixedvoice 7
television 2
closed p India fixedvoice 6
television 1
dtype: int64
>>> out = out.index.to_frame(index=False)
id status type location product
0 123 open r hongkong broadband
1 123 open r hongkong television
2 456 open r hongkong broadband
3 456 open r hongkong fixedvoice
4 456 open r hongkong television
5 456 closed p India fixedvoice
6 456 closed p India television
不是完美的编码,但它有效,你明白了:
df = pd.DataFrame(
[
["open", "r", "Hongkong", 1, 0, 4],
["open", "r", "Hongkong", 1, 7, 2],
["closed", "p", "India", 0, 6, 1],
]
)
df_bb = df.iloc[:,[0,1,2,3]].rename(columns={0:"status", 1:"type", 2:"location", 3:"count"})
df_vo = df.iloc[:,[0,1,2,4]].rename(columns={0:"status", 1:"type", 2:"location", 4:"count"})
df_tv = df.iloc[:,[0,1,2,5]].rename(columns={0:"status", 1:"type", 2:"location", 5:"count"})
df_bb["product"] = df_bb["count"].map(lambda x: "bb" if x > 0 else 0)
df_vo["product"] = df_vo["count"].map(lambda x: "vo" if x > 0 else 0)
df_tv["product"] = df_tv["count"].map(lambda x: "tv" if x > 0 else 0)
df_combined = pd.concat(
[df_bb, df_vo, df_tv]
)
df_final = df_combined[df_combined["product"] != 0].iloc[:,[0,1,2,4]]