Pandas 忽略缺失日期以找到百分位数

Question

我有一个数据框。我正在尝试查找日期时间的百分位数。我正在使用函数：

数据框：

student, attempts, time
student 1,14, 9/3/2019  12:32:32 AM
student 2,2, 9/3/2019  9:37:14 PM
student 3, 5
student 4, 16, 9/5/2019  8:58:14 PM

studentInfo2 = [14, 4, Timestamp('2019-09-04 00:26:36')]
data['time'] = pd.to_datetime(data['time_0001'], errors='coerce')
perc1_first = stats.percentileofscore(data['time'].notnull(), student2Info[2], 'rank')

其中 student2Info[2] 保存特定学生的日期时间。当我尝试这样做时，出现错误：

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

关于如何在列中缺少时间的情况下正确计算百分位数的任何想法？

Answer 1

您需要将时间戳转换为percentileofscore 可以理解的单位。此外，pd.DataFrame.notnull() return 是一个布尔列表，您可以使用它来过滤 DataFrame，它不会 return 过滤列表，所以我已经为您更新了它。这是一个工作示例：

import pandas as pd
import scipy.stats as stats

data = pd.DataFrame.from_dict({
    "student": [1, 2, 3, 4],
    "attempts": [14, 2, 5, 16],
    "time_0001": [
        "9/3/2019  12:32:32 AM",
        "9/3/2019  9:37:14 PM",
        "",
        "9/5/2019  8:58:14 PM"
    ]
})

student2Info = [14, 4, pd.Timestamp('2019-09-04 00:26:36')]
data['time'] = pd.to_datetime(data['time_0001'], errors='coerce')
perc1_first = stats.percentileofscore(data[data['time'].notnull()].time.transform(pd.Timestamp.toordinal), student2Info[2].toordinal(), 'rank')
print(perc1_first)  #-> 66.66666666666667

Pandas 忽略缺失日期以找到百分位数

Pandas ignore missing dates to find percentiles

python

percentile

pandas