使用 pandas select 多列和多列中的 fillna() 的替代方法
An alternative way to select multiple columns and fillna() in multiple columns using pandas
我正在尝试 select 三列 ["attacktype1","attacktype2","attacktype3"],其数据类型是数据框中的整数,使用 pandas 和想将 na(0) 填入这些列并将这些列合计为一个新列。["Total_attacks"]
数据集可以从以下网址下载:
点击[这里]https://s3.amazonaws.com/datasetsgun/data/terror.csv
我曾尝试一次将 fillna(0) 应用于一列,然后将它们合计为一个新的单列。
我的第一种方式:
da1 = pd.read_csv('terror.csv', sep = ',', header=0 , encoding='latin' , na_values=['Missing', ' '])
da1.head()
#Handling missing values
da1['attacktype3'] = da1['attacktype3'].fillna(0)
da1['attacktype2'] = da1['attacktype2'].fillna(0)
da1['attacktype1'] = da1['attacktype1'].fillna(0)
da1['total_attacks'] = da1['attacktype3'] + da1['attacktype2'] + da1['attacktype1']
#country_txt is a column which consists of different countries.Want to find "Total_atacks" only for India. Therefore, the condition applied is country_txt=='India'.
a1 = da1.query("country_txt=='India'").agg({'total_attacks':np.sum})
print(a1)
我的第二种方法(不起作用):
da1 = pd.read_csv('terror.csv', sep = ',', header=0 , encoding='latin' , na_values=['Missing', ' '])
da1.head()
#Handling missing values
check1=Df.country_txt=="India"
store=Df[["attacktype1","attacktype2","attacktype3"]].apply(lambda x:x.fillna(0))
Total_attack=Df.loc[check1,store].sum(axis=1)
print(Total_attack)
I want to apply fillna(0) to multiple columns in a single line and also total those columns in an alternate and effective way.
The error that I get when I use my second way is:
ValueError: Cannot index with multidimensional key
首先按 boolean indexing
with DataFrame.loc
and then replace missing values by DataFrame.fillna
过滤:
check1 = Df.country_txt == "India"
cols = ["attacktype1","attacktype2","attacktype3"]
Df['Total_attack'] = Df.loc[check1, cols].fillna(0).sum(axis=1)
对于标量,一个数输出加sum
:
Total_attack = Df['Total_attack'].sum()
print (Total_attack)
35065.0
我正在尝试 select 三列 ["attacktype1","attacktype2","attacktype3"],其数据类型是数据框中的整数,使用 pandas 和想将 na(0) 填入这些列并将这些列合计为一个新列。["Total_attacks"]
数据集可以从以下网址下载: 点击[这里]https://s3.amazonaws.com/datasetsgun/data/terror.csv
我曾尝试一次将 fillna(0) 应用于一列,然后将它们合计为一个新的单列。
我的第一种方式:
da1 = pd.read_csv('terror.csv', sep = ',', header=0 , encoding='latin' , na_values=['Missing', ' '])
da1.head()
#Handling missing values
da1['attacktype3'] = da1['attacktype3'].fillna(0)
da1['attacktype2'] = da1['attacktype2'].fillna(0)
da1['attacktype1'] = da1['attacktype1'].fillna(0)
da1['total_attacks'] = da1['attacktype3'] + da1['attacktype2'] + da1['attacktype1']
#country_txt is a column which consists of different countries.Want to find "Total_atacks" only for India. Therefore, the condition applied is country_txt=='India'.
a1 = da1.query("country_txt=='India'").agg({'total_attacks':np.sum})
print(a1)
我的第二种方法(不起作用):
da1 = pd.read_csv('terror.csv', sep = ',', header=0 , encoding='latin' , na_values=['Missing', ' '])
da1.head()
#Handling missing values
check1=Df.country_txt=="India"
store=Df[["attacktype1","attacktype2","attacktype3"]].apply(lambda x:x.fillna(0))
Total_attack=Df.loc[check1,store].sum(axis=1)
print(Total_attack)
I want to apply fillna(0) to multiple columns in a single line and also total those columns in an alternate and effective way.
The error that I get when I use my second way is:
ValueError: Cannot index with multidimensional key
首先按 boolean indexing
with DataFrame.loc
and then replace missing values by DataFrame.fillna
过滤:
check1 = Df.country_txt == "India"
cols = ["attacktype1","attacktype2","attacktype3"]
Df['Total_attack'] = Df.loc[check1, cols].fillna(0).sum(axis=1)
对于标量,一个数输出加sum
:
Total_attack = Df['Total_attack'].sum()
print (Total_attack)
35065.0