保留数据框中不均匀行的最后 n 个实数值?
Keep the last n real values of uneven rows in a dataframe?
我正在收集一段时间内的心率值。每个受试者收集数据的时间长短各不相同。我想 table 收集最后 2 秒的数据。
import pandas as pd
import numpy as np
#example data
example_s = [["4/20/21 4:20", 302, 0, 0, 1, 2, 3],
["2/17/21 9:20",135, 1, 1.4, 8, 10, np.NaN, np.NaN],
["2/17/21 9:20", 111, 5, 5,1, np.NaN, np.NaN,np.NaN, np.NaN]]
example_s_table = pd.DataFrame(example_s,columns=['Date_Time','CID', 0, 1, 2, 3, 4, 5, 6])
desired_outcome = [["4/20/21 4:20",302,1, 2, 3],
["2/17/21 9:20",135, 1.4, 8, 10 ],
["2/17/21 9:20",111, 5, 5,1 ]]
desired_outcome_table = pd.DataFrame(desired_outcome,columns=['Date_Time','CID', "Second 1", "Second 2", "Second 3"])
我可以看到如何从此处显示的示例中收集数据的单个实例,但想知道如何快速将多个值添加到我的 table:
desired_outcome_table["Last Second"]=example_s_table.iloc[:,1:].ffill(axis=1).iloc[:, -1]
尝试:
df = example_s_table.copy()
df = df.set_index(['Date_Time', 'CID'])
df_out = df.mask(df.eq(0))\
.apply(lambda x: pd.Series(x.dropna().tail(3).values), axis=1)\
.rename(columns = lambda x: f'Second {x+1}')
df_out['Last Second'] = df_out['Second 3']
print(df_out.reset_index())
输出:
Date_Time CID Second 1 Second 2 Second 3 Last Second
0 4/20/21 4:20 302 1.0 2.0 3.0 3.0
1 2/17/21 9:20 135 1.4 8.0 10.0 10.0
2 2/17/21 9:20 111 5.0 5.0 1.0 1.0
我正在收集一段时间内的心率值。每个受试者收集数据的时间长短各不相同。我想 table 收集最后 2 秒的数据。
import pandas as pd
import numpy as np
#example data
example_s = [["4/20/21 4:20", 302, 0, 0, 1, 2, 3],
["2/17/21 9:20",135, 1, 1.4, 8, 10, np.NaN, np.NaN],
["2/17/21 9:20", 111, 5, 5,1, np.NaN, np.NaN,np.NaN, np.NaN]]
example_s_table = pd.DataFrame(example_s,columns=['Date_Time','CID', 0, 1, 2, 3, 4, 5, 6])
desired_outcome = [["4/20/21 4:20",302,1, 2, 3],
["2/17/21 9:20",135, 1.4, 8, 10 ],
["2/17/21 9:20",111, 5, 5,1 ]]
desired_outcome_table = pd.DataFrame(desired_outcome,columns=['Date_Time','CID', "Second 1", "Second 2", "Second 3"])
我可以看到如何从此处显示的示例中收集数据的单个实例,但想知道如何快速将多个值添加到我的 table:
desired_outcome_table["Last Second"]=example_s_table.iloc[:,1:].ffill(axis=1).iloc[:, -1]
尝试:
df = example_s_table.copy()
df = df.set_index(['Date_Time', 'CID'])
df_out = df.mask(df.eq(0))\
.apply(lambda x: pd.Series(x.dropna().tail(3).values), axis=1)\
.rename(columns = lambda x: f'Second {x+1}')
df_out['Last Second'] = df_out['Second 3']
print(df_out.reset_index())
输出:
Date_Time CID Second 1 Second 2 Second 3 Last Second
0 4/20/21 4:20 302 1.0 2.0 3.0 3.0
1 2/17/21 9:20 135 1.4 8.0 10.0 10.0
2 2/17/21 9:20 111 5.0 5.0 1.0 1.0