Pandas 按分隔符将单元格拆分为新行
Pandas split cell by delimiter to new rows
我有很多行如下图所示。我正在尝试为非常长的“区域”列中的每个值创建一个新行。每个新行都会复制前三列的数据,并在该列中保留下一个“区域”值。
row example
Desired output
我相信 pandas 中的 explode 方法是我需要的,但我的数据并没有按照我预期的方式从列表中爆发出来。
#Constants and Public Variables
df = pd.read_excel("input.xlsx", sheet_name=0, usecols='D,G,H,K')
df = df.dropna()
new_header = df.iloc[0] #grab the first row for the header
df = df[1:] #take the data less the header row
df.columns = new_header #set the header row as the df header
zones = [['CA2,SW1,SW3,SW2,STH2,STH3,STH0,NTH0,DR1,DR2,MID1,MID2,MID3,NE1,NE2,NE3,NTH1,NTH2,NTH3,PLN1,PLN2,PLN3,NW1,NW2'],['CA2,SW1,SW3,SW2,STH2,STH3,STH0,NTH0,DR1,DR2,MID1,MID2,MID3,NE1,NE2,NE3,NTH1,NTH2,NTH3,PLN1,PLN2,PLN3,NW1,NW2'],['CA2,SW1,SW3,SW2,STH2,STH3,STH0,NTH0,DR1,DR2,MID1,MID2,MID3,NE1,NE2,NE3,NTH1,NTH2,NTH3,PLN1,PLN2,PLN3,NW1,NW2']]
replace_values = ['All Zones', 'All Zones ', 'all']
df = df.replace(to_replace=replace_values, value=zones)
df = df.explode("ZONES")
df.to_csv("outout.csv")
试试这个:
import pandas as pd
id = [3609112]
reg_price = [3.99]
promo_price = [3.99]
zones = ["CA2,SW1,SW3,SW2"]
df = pd.DataFrame(id, columns=['id'])
df['reg_price'] = reg_price
df['promo_price'] = promo_price
df['zones'] = zones
def convert_to_list(row):
arr = row.split(',')
l = [x for x in arr]
return l
df['zones'] = df['zones'].apply(convert_to_list)
print(df.explode('zones'))
我有很多行如下图所示。我正在尝试为非常长的“区域”列中的每个值创建一个新行。每个新行都会复制前三列的数据,并在该列中保留下一个“区域”值。
row example
Desired output
我相信 pandas 中的 explode 方法是我需要的,但我的数据并没有按照我预期的方式从列表中爆发出来。
#Constants and Public Variables
df = pd.read_excel("input.xlsx", sheet_name=0, usecols='D,G,H,K')
df = df.dropna()
new_header = df.iloc[0] #grab the first row for the header
df = df[1:] #take the data less the header row
df.columns = new_header #set the header row as the df header
zones = [['CA2,SW1,SW3,SW2,STH2,STH3,STH0,NTH0,DR1,DR2,MID1,MID2,MID3,NE1,NE2,NE3,NTH1,NTH2,NTH3,PLN1,PLN2,PLN3,NW1,NW2'],['CA2,SW1,SW3,SW2,STH2,STH3,STH0,NTH0,DR1,DR2,MID1,MID2,MID3,NE1,NE2,NE3,NTH1,NTH2,NTH3,PLN1,PLN2,PLN3,NW1,NW2'],['CA2,SW1,SW3,SW2,STH2,STH3,STH0,NTH0,DR1,DR2,MID1,MID2,MID3,NE1,NE2,NE3,NTH1,NTH2,NTH3,PLN1,PLN2,PLN3,NW1,NW2']]
replace_values = ['All Zones', 'All Zones ', 'all']
df = df.replace(to_replace=replace_values, value=zones)
df = df.explode("ZONES")
df.to_csv("outout.csv")
试试这个:
import pandas as pd
id = [3609112]
reg_price = [3.99]
promo_price = [3.99]
zones = ["CA2,SW1,SW3,SW2"]
df = pd.DataFrame(id, columns=['id'])
df['reg_price'] = reg_price
df['promo_price'] = promo_price
df['zones'] = zones
def convert_to_list(row):
arr = row.split(',')
l = [x for x in arr]
return l
df['zones'] = df['zones'].apply(convert_to_list)
print(df.explode('zones'))