在 R 中分配数字并总结滑动 window 中的计数

Assigning the numbers and summarising the number of counts in a sliding window in R

我有一个 df 看起来像这样:

 df <- (c( "P", "S", "E", "G", "R", "Q", "P", "S", "P", "S", "P", "S", "P", "T", "E", "R", "A", "P", "A", 
"S", "E", "E", "E", "F", "Q", "F", "L", "R", "C", "Q", "Q", "C", 
"Q", "A", "E", "A", "K", "C", "P", "K", "L", "L", "P", "C", "L"))

和一个看起来像这样的 df1

df1
    1     2     3     4     5     
A   0.375 0.000 0.250 0.250 0.125 
C   0.200 0.000 0.600 0.000 0.000 
D   0.000 0.500 0.000 0.400 0.500 
E   0.225 0.250 0.125 0.125 0.000 
F   0.000 0.000 0.000 0.000 0.000 
G   0.000 0.400 0.250 0.000 0.125 
H   0.500 0.000 0.300 0.020 0.000 
I   0.000 0.000 0.000 0.000 0.300 
K   0.000 0.280 0.000 0.125 0.000 
L   0.000 0.000 0.125 0.125 0.125 
M   0.600 0.700 0.000 0.030 0.000 
N   0.000 0.000 0.030 0.000 0.500 
P   0.000 0.000 0.000 0.125 0.125 
Q   0.400 0.165 0.125 0.000 0.250 
R   0.030 0.000 0.125 0.500 0.125 
S   0.350 0.450 0.400 0.000 0.125 
T   0.000 0.000 0.000 0.125 0.000 
V   0.625 0.125 0.400 0.525 0.100 
W   0.400 0.300 0.000 0.000 0.000 
Y   0.125 0.000 0.000 0.000 0.000 
NIL    NA    NA    NA    NA    NA   

dput(df1)
    structure(c(0.375, 0.200, 0, 0.225, 0, 0, 0.5, 0, 0, 0, 0.6, 0, 0, 0.4, 
    0.03, 0.35, 0, 0.625, 0.4, 0.125, NA, 0, 0, 0.5, 0.25, 0, 0.4, 0, 0, 0.28, 
    0, 0.7, 0, 0, 0.165, 0, 0.45, 0, 0.125, 0.3, 0, NA, 0.25, 0.6, 0, 0.125, 
    0, 0.25, 0.3, 0, 0, 0.125, 0, 0.03, 0, 0.125, 0.125, 0.4, 0, 0.4, 0, 0, 
    NA, 0.25, 0, 0.4, 0.125, 0, 0, 0.02, 0, 0.125, 0.125, 0.03, 0, 0.125, 
    0, 0.5, 0, 0.125, 0.125, 0, 0, NA, 0.125, 0, 0.5, 0, 0, 0.125, 0, 
    0.3, 0, 0.125, 0, 0.5, 0.125, 0.25, 0.125, 0.125, 0, 0.1, 0, 0, NA), .Dim = c(21L, 5L), .Dimnames = list(
        c("A", "C", "D", "E", "F", "G", "H", "I", "K", "L", "M", 
        "N", "P", "Q", "R", "S", "T", "V", "W", "Y", "NIL"), c("1", 
        "2", "3", "4", "5")))

我想分配 df1df 的号码。 df1的列号(共5个)指的是字母位置。我想创建一个 5 的滑动 window 来分配 df1 中的数字,然后对结果求和并遍历整个 df.

例如:

first 5 letters of `df`: PSEGR
assign numbers from `df1`: 0+0.45+0.125+0+0.125
summary of the first 5 numbers: 0.7
the next step:
letters from df: SEGRQ
assign numbers from `df1`:0.35+0.25+0.25+0.5+0.25
summary: 1.6 etc.

我尝试了以下代码:

sliding_window_df <- rollapply(df, function(x) df1[cbind(match(x, rownames(df1)), 1:ncol(df1))],k=5, align="left", sum)

我收到这个错误:

Error in trunc(width) : non-numeric argument to mathematical function

您是否建议使用比 rollapply 更合适的函数?

尝试在此处使用 sapply 而不是滚动操作:

n <- 1:ncol(df1)
sapply(seq_along(df), function(x) 
       sum(df1[cbind(match(df[x:(x+4)], rownames(df1)),n)], na.rm = TRUE))

# [1] 0.700 1.600 0.875 0.375 0.320 1.050 0.575 1.000 0.575 0.875
#[11] 0.575 0.600 0.750 0.750 0.725 0.405 0.625 0.525 1.075 0.850
#[21] 0.850 0.475 0.475 0.415 1.025 0.375 0.850 0.155 0.740 1.290
#[31] 0.775 0.865 0.775 1.000 0.350 1.380 0.250 0.450 0.655 0.250
#[41] 0.125 0.725 0.125 0.200 0.000