在 pandas 数据框中向左移动某些行
Shifting certain rows to the left in pandas dataframe
我有一个 pandas 一些运动数据的数据库。这些列是姓名、年龄、出生城市、出生国家、新秀、体重和问题。对于美国玩家,原始数据的出生城市为 "City,State",因此当我使用逗号分隔符时,结果是两个变量。所以现在所有的美国玩家都转移了,我需要做一个 "Problem" 变量来解释多余的部分。
我怎样才能在数千次观察中只将美国人移到左边?谢谢!
我有什么(请原谅table格式):
Name Age BirthCity BirthCountry Rookie Weight Problem
Frank 32 Seattle WA USA N 200
Jake 24 Geneva Switzerland Y 210
期望:
Name Age BirthCity BirthCountry Rookie Weight
Frank 32 Seattle USA N 200
Jake 24 Geneva Switzerland Y 210
没那么简单:
#get all rows by mask
mask = df['Rookie'] == 'USA'
c = ['BirthCountry','Rookie','Weight','Problem']
#shift columns, but necessary converting to strings
df.loc[mask, c] = df.loc[mask, c].astype(str).shift(-1, axis=1)
#converting column Weight to float and then int
df['Weight'] = df['Weight'].astype(float).astype(int)
#remove column Problem
df = df.drop('Problem', axis=1)
print (df)
Name Age BirthCity BirthCountry Rookie Weight
0 Frank 32 Seattle USA N 200
1 Jake 24 Geneva Switzerland Y 210
一种方法是先有选择地删除第 3 列(记住 Python 先算 0)列,同时添加一个额外的列 NaN
。然后删除最后的Problem
系列。
# df, start with this dataframe
#
# Name Age BirthCity BirthCountry Rookie Weight Problem
# 0 Frank 32 Seattle WA USA N 200.0
# 1 Jake 24 Geneva Switzerland Y 210 NaN
def shifter(row):
return np.hstack((np.delete(np.array(row), [3]), [np.nan]))
mask = df['Rookie'] == 'USA'
df.loc[mask, :] = df.loc[mask, :].apply(shifter, axis=1)
df = df.drop(['Problem'], axis=1)
# Name Age BirthCity BirthCountry Rookie Weight
# 0 Frank 32 Seattle USA N 200
# 1 Jake 24 Geneva Switzerland Y 210
我有一个 pandas 一些运动数据的数据库。这些列是姓名、年龄、出生城市、出生国家、新秀、体重和问题。对于美国玩家,原始数据的出生城市为 "City,State",因此当我使用逗号分隔符时,结果是两个变量。所以现在所有的美国玩家都转移了,我需要做一个 "Problem" 变量来解释多余的部分。
我怎样才能在数千次观察中只将美国人移到左边?谢谢!
我有什么(请原谅table格式):
Name Age BirthCity BirthCountry Rookie Weight Problem
Frank 32 Seattle WA USA N 200
Jake 24 Geneva Switzerland Y 210
期望:
Name Age BirthCity BirthCountry Rookie Weight
Frank 32 Seattle USA N 200
Jake 24 Geneva Switzerland Y 210
没那么简单:
#get all rows by mask
mask = df['Rookie'] == 'USA'
c = ['BirthCountry','Rookie','Weight','Problem']
#shift columns, but necessary converting to strings
df.loc[mask, c] = df.loc[mask, c].astype(str).shift(-1, axis=1)
#converting column Weight to float and then int
df['Weight'] = df['Weight'].astype(float).astype(int)
#remove column Problem
df = df.drop('Problem', axis=1)
print (df)
Name Age BirthCity BirthCountry Rookie Weight
0 Frank 32 Seattle USA N 200
1 Jake 24 Geneva Switzerland Y 210
一种方法是先有选择地删除第 3 列(记住 Python 先算 0)列,同时添加一个额外的列 NaN
。然后删除最后的Problem
系列。
# df, start with this dataframe
#
# Name Age BirthCity BirthCountry Rookie Weight Problem
# 0 Frank 32 Seattle WA USA N 200.0
# 1 Jake 24 Geneva Switzerland Y 210 NaN
def shifter(row):
return np.hstack((np.delete(np.array(row), [3]), [np.nan]))
mask = df['Rookie'] == 'USA'
df.loc[mask, :] = df.loc[mask, :].apply(shifter, axis=1)
df = df.drop(['Problem'], axis=1)
# Name Age BirthCity BirthCountry Rookie Weight
# 0 Frank 32 Seattle USA N 200
# 1 Jake 24 Geneva Switzerland Y 210