Pandas 忽略缺失日期以找到百分位数
Pandas ignore missing dates to find percentiles
我有一个数据框。我正在尝试查找日期时间的百分位数。我正在使用函数:
数据框:
student, attempts, time
student 1,14, 9/3/2019 12:32:32 AM
student 2,2, 9/3/2019 9:37:14 PM
student 3, 5
student 4, 16, 9/5/2019 8:58:14 PM
studentInfo2 = [14, 4, Timestamp('2019-09-04 00:26:36')]
data['time'] = pd.to_datetime(data['time_0001'], errors='coerce')
perc1_first = stats.percentileofscore(data['time'].notnull(), student2Info[2], 'rank')
其中 student2Info[2] 保存特定学生的日期时间。当我尝试这样做时,出现错误:
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
关于如何在列中缺少时间的情况下正确计算百分位数的任何想法?
您需要将时间戳转换为percentileofscore
可以理解的单位。此外,pd.DataFrame.notnull()
return 是一个布尔列表,您可以使用它来过滤 DataFrame
,它不会 return 过滤列表,所以我已经为您更新了它。这是一个工作示例:
import pandas as pd
import scipy.stats as stats
data = pd.DataFrame.from_dict({
"student": [1, 2, 3, 4],
"attempts": [14, 2, 5, 16],
"time_0001": [
"9/3/2019 12:32:32 AM",
"9/3/2019 9:37:14 PM",
"",
"9/5/2019 8:58:14 PM"
]
})
student2Info = [14, 4, pd.Timestamp('2019-09-04 00:26:36')]
data['time'] = pd.to_datetime(data['time_0001'], errors='coerce')
perc1_first = stats.percentileofscore(data[data['time'].notnull()].time.transform(pd.Timestamp.toordinal), student2Info[2].toordinal(), 'rank')
print(perc1_first) #-> 66.66666666666667
我有一个数据框。我正在尝试查找日期时间的百分位数。我正在使用函数:
数据框:
student, attempts, time
student 1,14, 9/3/2019 12:32:32 AM
student 2,2, 9/3/2019 9:37:14 PM
student 3, 5
student 4, 16, 9/5/2019 8:58:14 PM
studentInfo2 = [14, 4, Timestamp('2019-09-04 00:26:36')]
data['time'] = pd.to_datetime(data['time_0001'], errors='coerce')
perc1_first = stats.percentileofscore(data['time'].notnull(), student2Info[2], 'rank')
其中 student2Info[2] 保存特定学生的日期时间。当我尝试这样做时,出现错误:
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
关于如何在列中缺少时间的情况下正确计算百分位数的任何想法?
您需要将时间戳转换为percentileofscore
可以理解的单位。此外,pd.DataFrame.notnull()
return 是一个布尔列表,您可以使用它来过滤 DataFrame
,它不会 return 过滤列表,所以我已经为您更新了它。这是一个工作示例:
import pandas as pd
import scipy.stats as stats
data = pd.DataFrame.from_dict({
"student": [1, 2, 3, 4],
"attempts": [14, 2, 5, 16],
"time_0001": [
"9/3/2019 12:32:32 AM",
"9/3/2019 9:37:14 PM",
"",
"9/5/2019 8:58:14 PM"
]
})
student2Info = [14, 4, pd.Timestamp('2019-09-04 00:26:36')]
data['time'] = pd.to_datetime(data['time_0001'], errors='coerce')
perc1_first = stats.percentileofscore(data[data['time'].notnull()].time.transform(pd.Timestamp.toordinal), student2Info[2].toordinal(), 'rank')
print(perc1_first) #-> 66.66666666666667