如何根据某些条件减去 Pandas 数据框的行?
How to subtract rows of a Pandas dataframe based upon some conditions?
我正在对 this 数据集进行分析。
使用下面的代码后,我得到了数据的清理版本。
covid_df.drop(columns = ["Sno", "Time"], inplace = True)
covid_df["State/UnionTerritory"] = covid_df["State/UnionTerritory"].replace({
"Bihar****": "Bihar",
"Maharashtra***": "Maharashtra",
"Madhya Pradesh***": "Madhya Pradesh",
"Karanataka": "Karnataka",
"Telangana": "Telengana",
"Himanchal Pradesh": "Himachal Pradesh",
"Dadra and Nagar Haveli": "Dadra and Nagar Haveli and Daman and Diu",
"Daman & Diu": "Dadra and Nagar Haveli and Daman and Diu"
})
invalid_states = ["Cases being reassigned to states", "Unassigned"]
for invalid_state in invalid_states:
invalid_state_index = covid_df.loc[covid_df["State/UnionTerritory"] == invalid_state, :].index
covid_df.drop(index = invalid_state_index, inplace = True)
covid_df = covid_df.groupby(["State/UnionTerritory", "Date"], as_index = False).sum()
covid_df["Date"] = pd.to_datetime(covid_df["Date"])
covid_df.sort_values(by = ["State/UnionTerritory", "Date"], inplace = True)
此清理后的数据具有每个 State/UnionTerritory 每个 日期 的累积案例。如何提取每个 State/UnionTerritory 的每日新病例?
这是我试过的:
daily_cases_data = [list(covid_df.iloc[0, 2:])]
for index in range(1, covid_df.shape[0]):
previous_row = covid_df.iloc[index - 1, :]
current_row = covid_df.iloc[index, :]
if previous_row["State/UnionTerritory"] == current_row["State/UnionTerritory"]:
daily_cases_data.append(list(current_row[2:] - previous_row[2:]))
else:
daily_cases_data.append(list(current_row[2:]))
有没有更有效的方法?
编辑后的答案:使用groupby.shift
如此处所示:
df = pd.DataFrame(
{
'state': ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'c'],
'cumul': [1, 2, 5, 3, 4, 5, 8, 9, 9]
}
)
df['quantity'] = df['cumul'] - df.groupby('state')['cumul'].shift()
上一个回答:
您可以使用 shift
。
例如:
df = pd.DataFrame({'cumul': [0, 2, 3, 5, 7]})
df['quantity'] = df - df.shift(1)
quantity
将是:
quantity
0 NaN
1 2.0
2 1.0
3 2.0
4 2.0
然后您可以 fillna
或仅将数量中的第零个值更改为累积中的第零个值。
编辑:首先应用您的条件来准备数据框:-)
回答我自己的问题感觉很奇怪,但这对我有用
grouped_df = covid_df.groupby("State/UnionTerritory")
daily_cases_df = pd.DataFrame()
for state in covid_df["State/UnionTerritory"].unique():
group = grouped_df.get_group(state)
cases_group = group.iloc[:, 2:] - group.shift(1).iloc[:, 2:]
cases_group.iloc[0, :] = group.iloc[0, 2:]
group = pd.concat([group.iloc[:, :2], cases_group], axis = 1)
daily_cases_df = pd.concat([daily_cases_df, group])
我正在对 this 数据集进行分析。
使用下面的代码后,我得到了数据的清理版本。
covid_df.drop(columns = ["Sno", "Time"], inplace = True)
covid_df["State/UnionTerritory"] = covid_df["State/UnionTerritory"].replace({
"Bihar****": "Bihar",
"Maharashtra***": "Maharashtra",
"Madhya Pradesh***": "Madhya Pradesh",
"Karanataka": "Karnataka",
"Telangana": "Telengana",
"Himanchal Pradesh": "Himachal Pradesh",
"Dadra and Nagar Haveli": "Dadra and Nagar Haveli and Daman and Diu",
"Daman & Diu": "Dadra and Nagar Haveli and Daman and Diu"
})
invalid_states = ["Cases being reassigned to states", "Unassigned"]
for invalid_state in invalid_states:
invalid_state_index = covid_df.loc[covid_df["State/UnionTerritory"] == invalid_state, :].index
covid_df.drop(index = invalid_state_index, inplace = True)
covid_df = covid_df.groupby(["State/UnionTerritory", "Date"], as_index = False).sum()
covid_df["Date"] = pd.to_datetime(covid_df["Date"])
covid_df.sort_values(by = ["State/UnionTerritory", "Date"], inplace = True)
此清理后的数据具有每个 State/UnionTerritory 每个 日期 的累积案例。如何提取每个 State/UnionTerritory 的每日新病例?
这是我试过的:
daily_cases_data = [list(covid_df.iloc[0, 2:])]
for index in range(1, covid_df.shape[0]):
previous_row = covid_df.iloc[index - 1, :]
current_row = covid_df.iloc[index, :]
if previous_row["State/UnionTerritory"] == current_row["State/UnionTerritory"]:
daily_cases_data.append(list(current_row[2:] - previous_row[2:]))
else:
daily_cases_data.append(list(current_row[2:]))
有没有更有效的方法?
编辑后的答案:使用groupby.shift
如此处所示:
df = pd.DataFrame(
{
'state': ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'c'],
'cumul': [1, 2, 5, 3, 4, 5, 8, 9, 9]
}
)
df['quantity'] = df['cumul'] - df.groupby('state')['cumul'].shift()
上一个回答:
您可以使用 shift
。
例如:
df = pd.DataFrame({'cumul': [0, 2, 3, 5, 7]})
df['quantity'] = df - df.shift(1)
quantity
将是:
quantity
0 NaN
1 2.0
2 1.0
3 2.0
4 2.0
然后您可以 fillna
或仅将数量中的第零个值更改为累积中的第零个值。
编辑:首先应用您的条件来准备数据框:-)
回答我自己的问题感觉很奇怪,但这对我有用
grouped_df = covid_df.groupby("State/UnionTerritory")
daily_cases_df = pd.DataFrame()
for state in covid_df["State/UnionTerritory"].unique():
group = grouped_df.get_group(state)
cases_group = group.iloc[:, 2:] - group.shift(1).iloc[:, 2:]
cases_group.iloc[0, :] = group.iloc[0, 2:]
group = pd.concat([group.iloc[:, :2], cases_group], axis = 1)
daily_cases_df = pd.concat([daily_cases_df, group])