使用 R 中的数据 table 计算数字序列的分位数的问题
Problem with calculating quantile for a number series using data table in R
我需要计算数据 table R 中数据 table 中每一行的数字序列的分位数。
Table:
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
11.7 10.7 10.8 11.8 12.2 13.8 7.0 10.2 11.2 6.8 7.4 9.1 9.5 9.4 9.3 15.6 11.3 13.0 10.9 10.5
NA NA 9.5 11.3 16.6 12.2 NA NA 69.6 NA NA 12.4 10.8 10.5 8.8 9.9 NA 7.7 12.1 NA
9.1 8.7 29.9 23.1 18.3 23.5 21.5 23.0 18.2 28.8 39.9 16.4 16.9 23.4 18.8 31.9 26.2 22.4 29.2 25.2
14.7 17.5 21.1 19.4 20.0 14.5 14.1 12.6 9.9 12.6 6.4 9.6 18.5 14.3 26.2 10.7 6.4 6.9 7.1 9.0
我想为上面 Table 的每一行计算分位数。请在下面查看我的代码,但我需要为每一行放置值,如 "Output".
所示
year_cols <- c(2000:2019)
Table[, c("10","25","50","75","100") := quantile(.SD, na.rm = TRUE, c(0.1,0.25,0.5,0.75,1.0)), .SDcols = as.character(year_cols)]
我如何计算每一行的分位数,如下所示,或者如果有人可以帮助修改我的代码,我将不胜感激,以便我可以使用数据 table R.[=14 显示每一行的分位数=]
Output:
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 10% 25% 50% 75% 100%
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
11.7 10.7 10.8 11.8 12.2 13.8 7.0 10.2 11.2 6.8 7.4 9.1 9.5 9.4 9.3 15.6 11.3 13.0 10.9 10.5 7.36 9.37 10.75 11.72 15.60
NA NA 9.5 11.3 16.6 12.2 NA NA 69.6 NA NA 12.4 10.8 10.5 8.8 9.9 NA 7.7 12.1 NA
9.1 8.7 29.9 23.1 18.3 23.5 21.5 23.0 18.2 28.8 39.9 16.4 16.9 23.4 18.8 31.9 26.2 22.4 29.2 25.2
14.7 17.5 21.1 19.4 20.0 14.5 14.1 12.6 9.9 12.6 6.4 9.6 18.5 14.3 26.2 10.7 6.4 6.9 7.1 9.0
一个选项是按行分组
year_cols <- as.character(2000:2019)
Table[, c("10%", "25%", "50%", "75%", "100%") :=
as.list(quantile(unlist(.SD), na.rm = TRUE,
c(0.1,0.25,0.5,0.75,1.0))), by = seq_len(nrow(Table)),
.SDcols = year_cols]
Table
# 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 10%
#1: NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#2: 11.7 10.7 10.8 11.8 12.2 13.8 7.0 10.2 11.2 6.8 7.4 9.1 9.5 9.4 9.3 15.6 11.3 13.0 10.9 10.5 7.36
#3: NA NA 9.5 11.3 16.6 12.2 NA NA 69.6 NA NA 12.4 10.8 10.5 8.8 9.9 NA 7.7 12.1 NA 8.87
#4: 9.1 8.7 29.9 23.1 18.3 23.5 21.5 23.0 18.2 28.8 39.9 16.4 16.9 23.4 18.8 31.9 26.2 22.4 29.2 25.2 15.67
#5: 14.7 17.5 21.1 19.4 20.0 14.5 14.1 12.6 9.9 12.6 6.4 9.6 18.5 14.3 26.2 10.7 6.4 6.9 7.1 9.0 6.85
# 25% 50% 75% 100%
#1: NA NA NA NA
#2: 9.375 10.75 11.725 15.6
#3: 9.800 11.05 12.250 69.6
#4: 18.275 23.05 26.850 39.9
#5: 9.450 13.35 17.750 26.2
另一种方法是 rowQuantiles
从 matrixStats
转换为 matrix
library(matrixStats)
Table[, c("10%", "25%", "50%", "75%", "100%") :=
as.data.frame(rowQuantiles(as.matrix(.SD), na.rm = TRUE,
probs = c(0.1,0.25,0.5,0.75,1.0))), .SDcols = as.character(year_cols)]
Table
# 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 10%
#1: NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#2: 11.7 10.7 10.8 11.8 12.2 13.8 7.0 10.2 11.2 6.8 7.4 9.1 9.5 9.4 9.3 15.6 11.3 13.0 10.9 10.5 7.36
#3: NA NA 9.5 11.3 16.6 12.2 NA NA 69.6 NA NA 12.4 10.8 10.5 8.8 9.9 NA 7.7 12.1 NA 8.87
#4: 9.1 8.7 29.9 23.1 18.3 23.5 21.5 23.0 18.2 28.8 39.9 16.4 16.9 23.4 18.8 31.9 26.2 22.4 29.2 25.2 15.67
#5: 14.7 17.5 21.1 19.4 20.0 14.5 14.1 12.6 9.9 12.6 6.4 9.6 18.5 14.3 26.2 10.7 6.4 6.9 7.1 9.0 6.85
# 25% 50% 75% 100%
#1: NA NA NA NA
#2: 9.375 10.75 11.725 15.6
#3: 9.800 11.05 12.250 69.6
#4: 18.275 23.05 26.850 39.9
#5: 9.450 13.35 17.750 26.2
数据
Table <- structure(list(`2000` = c(NA, 11.7, NA, 9.1, 14.7), `2001` = c(NA,
10.7, NA, 8.7, 17.5), `2002` = c(NA, 10.8, 9.5, 29.9, 21.1),
`2003` = c(NA, 11.8, 11.3, 23.1, 19.4), `2004` = c(NA, 12.2,
16.6, 18.3, 20), `2005` = c(NA, 13.8, 12.2, 23.5, 14.5),
`2006` = c(NA, 7, NA, 21.5, 14.1), `2007` = c(NA, 10.2, NA,
23, 12.6), `2008` = c(NA, 11.2, 69.6, 18.2, 9.9), `2009` = c(NA,
6.8, NA, 28.8, 12.6), `2010` = c(NA, 7.4, NA, 39.9, 6.4),
`2011` = c(NA, 9.1, 12.4, 16.4, 9.6), `2012` = c(NA, 9.5,
10.8, 16.9, 18.5), `2013` = c(NA, 9.4, 10.5, 23.4, 14.3),
`2014` = c(NA, 9.3, 8.8, 18.8, 26.2), `2015` = c(NA, 15.6,
9.9, 31.9, 10.7), `2016` = c(NA, 11.3, NA, 26.2, 6.4), `2017` = c(NA,
13, 7.7, 22.4, 6.9), `2018` = c(NA, 10.9, 12.1, 29.2, 7.1
), `2019` = c(NA, 10.5, NA, 25.2, 9)), class = c("data.table",
"data.frame"), row.names = c(NA, -5L))
我需要计算数据 table R 中数据 table 中每一行的数字序列的分位数。
Table:
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
11.7 10.7 10.8 11.8 12.2 13.8 7.0 10.2 11.2 6.8 7.4 9.1 9.5 9.4 9.3 15.6 11.3 13.0 10.9 10.5
NA NA 9.5 11.3 16.6 12.2 NA NA 69.6 NA NA 12.4 10.8 10.5 8.8 9.9 NA 7.7 12.1 NA
9.1 8.7 29.9 23.1 18.3 23.5 21.5 23.0 18.2 28.8 39.9 16.4 16.9 23.4 18.8 31.9 26.2 22.4 29.2 25.2
14.7 17.5 21.1 19.4 20.0 14.5 14.1 12.6 9.9 12.6 6.4 9.6 18.5 14.3 26.2 10.7 6.4 6.9 7.1 9.0
我想为上面 Table 的每一行计算分位数。请在下面查看我的代码,但我需要为每一行放置值,如 "Output".
所示year_cols <- c(2000:2019)
Table[, c("10","25","50","75","100") := quantile(.SD, na.rm = TRUE, c(0.1,0.25,0.5,0.75,1.0)), .SDcols = as.character(year_cols)]
我如何计算每一行的分位数,如下所示,或者如果有人可以帮助修改我的代码,我将不胜感激,以便我可以使用数据 table R.[=14 显示每一行的分位数=]
Output:
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 10% 25% 50% 75% 100%
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
11.7 10.7 10.8 11.8 12.2 13.8 7.0 10.2 11.2 6.8 7.4 9.1 9.5 9.4 9.3 15.6 11.3 13.0 10.9 10.5 7.36 9.37 10.75 11.72 15.60
NA NA 9.5 11.3 16.6 12.2 NA NA 69.6 NA NA 12.4 10.8 10.5 8.8 9.9 NA 7.7 12.1 NA
9.1 8.7 29.9 23.1 18.3 23.5 21.5 23.0 18.2 28.8 39.9 16.4 16.9 23.4 18.8 31.9 26.2 22.4 29.2 25.2
14.7 17.5 21.1 19.4 20.0 14.5 14.1 12.6 9.9 12.6 6.4 9.6 18.5 14.3 26.2 10.7 6.4 6.9 7.1 9.0
一个选项是按行分组
year_cols <- as.character(2000:2019)
Table[, c("10%", "25%", "50%", "75%", "100%") :=
as.list(quantile(unlist(.SD), na.rm = TRUE,
c(0.1,0.25,0.5,0.75,1.0))), by = seq_len(nrow(Table)),
.SDcols = year_cols]
Table
# 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 10%
#1: NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#2: 11.7 10.7 10.8 11.8 12.2 13.8 7.0 10.2 11.2 6.8 7.4 9.1 9.5 9.4 9.3 15.6 11.3 13.0 10.9 10.5 7.36
#3: NA NA 9.5 11.3 16.6 12.2 NA NA 69.6 NA NA 12.4 10.8 10.5 8.8 9.9 NA 7.7 12.1 NA 8.87
#4: 9.1 8.7 29.9 23.1 18.3 23.5 21.5 23.0 18.2 28.8 39.9 16.4 16.9 23.4 18.8 31.9 26.2 22.4 29.2 25.2 15.67
#5: 14.7 17.5 21.1 19.4 20.0 14.5 14.1 12.6 9.9 12.6 6.4 9.6 18.5 14.3 26.2 10.7 6.4 6.9 7.1 9.0 6.85
# 25% 50% 75% 100%
#1: NA NA NA NA
#2: 9.375 10.75 11.725 15.6
#3: 9.800 11.05 12.250 69.6
#4: 18.275 23.05 26.850 39.9
#5: 9.450 13.35 17.750 26.2
另一种方法是 rowQuantiles
从 matrixStats
转换为 matrix
library(matrixStats)
Table[, c("10%", "25%", "50%", "75%", "100%") :=
as.data.frame(rowQuantiles(as.matrix(.SD), na.rm = TRUE,
probs = c(0.1,0.25,0.5,0.75,1.0))), .SDcols = as.character(year_cols)]
Table
# 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 10%
#1: NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#2: 11.7 10.7 10.8 11.8 12.2 13.8 7.0 10.2 11.2 6.8 7.4 9.1 9.5 9.4 9.3 15.6 11.3 13.0 10.9 10.5 7.36
#3: NA NA 9.5 11.3 16.6 12.2 NA NA 69.6 NA NA 12.4 10.8 10.5 8.8 9.9 NA 7.7 12.1 NA 8.87
#4: 9.1 8.7 29.9 23.1 18.3 23.5 21.5 23.0 18.2 28.8 39.9 16.4 16.9 23.4 18.8 31.9 26.2 22.4 29.2 25.2 15.67
#5: 14.7 17.5 21.1 19.4 20.0 14.5 14.1 12.6 9.9 12.6 6.4 9.6 18.5 14.3 26.2 10.7 6.4 6.9 7.1 9.0 6.85
# 25% 50% 75% 100%
#1: NA NA NA NA
#2: 9.375 10.75 11.725 15.6
#3: 9.800 11.05 12.250 69.6
#4: 18.275 23.05 26.850 39.9
#5: 9.450 13.35 17.750 26.2
数据
Table <- structure(list(`2000` = c(NA, 11.7, NA, 9.1, 14.7), `2001` = c(NA,
10.7, NA, 8.7, 17.5), `2002` = c(NA, 10.8, 9.5, 29.9, 21.1),
`2003` = c(NA, 11.8, 11.3, 23.1, 19.4), `2004` = c(NA, 12.2,
16.6, 18.3, 20), `2005` = c(NA, 13.8, 12.2, 23.5, 14.5),
`2006` = c(NA, 7, NA, 21.5, 14.1), `2007` = c(NA, 10.2, NA,
23, 12.6), `2008` = c(NA, 11.2, 69.6, 18.2, 9.9), `2009` = c(NA,
6.8, NA, 28.8, 12.6), `2010` = c(NA, 7.4, NA, 39.9, 6.4),
`2011` = c(NA, 9.1, 12.4, 16.4, 9.6), `2012` = c(NA, 9.5,
10.8, 16.9, 18.5), `2013` = c(NA, 9.4, 10.5, 23.4, 14.3),
`2014` = c(NA, 9.3, 8.8, 18.8, 26.2), `2015` = c(NA, 15.6,
9.9, 31.9, 10.7), `2016` = c(NA, 11.3, NA, 26.2, 6.4), `2017` = c(NA,
13, 7.7, 22.4, 6.9), `2018` = c(NA, 10.9, 12.1, 29.2, 7.1
), `2019` = c(NA, 10.5, NA, 25.2, 9)), class = c("data.table",
"data.frame"), row.names = c(NA, -5L))