计算 data.table 中每个单元格的字符

counting the characters of each cell in data.table

所以我为所有 R 爱好者准备了这个谜语:

library(data.table)
set.seed(666)
res<-data.table(NULL)
for(i in 1:10){
  res<-rbind(res,data.table(a=i,b=paste0(letters[sample(1:i)],collapse = "")))
}
res<-res[sample(10)]

导致:

>res
       a          b
   1:  1          a
   2:  9  dhgcbeifa
   3:  3        cba
   4:  7    gcafdeb
   5:  6     eacdfb
   6:  8   dacbfehg
   7: 10 fehjaigcbd
   8:  4       dacb
   9:  5      daecb
  10:  2         ba

但是 案例A

 >t(apply(res,1,nchar))
      a  b
 [1,] 2  1
 [2,] 2  9
 [3,] 2  3
 [4,] 2  7
 [5,] 2  6
 [6,] 2  8
 [7,] 2 10
 [8,] 2  4
 [9,] 2  5
[10,] 2  2

但是情况B

  >res[,lapply(.SD, nchar)]

     a  b
  1: 1  1
  2: 1  9
  3: 1  3
  4: 1  7
  5: 1  6
  6: 1  8
  7: 2 10
  8: 1  4
  9: 1  5
 10: 1  2

我的问题是为什么案例A中第a列的2是错误的?

当您将 res 强制转换为矩阵(使用应用时的第一个操作)时,您会得到:

as.matrix(res)
#-------------------
      a    b           
 [1,] " 7" "eafdgcb"   
 [2,] " 2" "ab"        
 [3,] " 8" "efcbdhga"  
 [4,] " 1" "a"         
 [5,] "10" "hdeifajgbc"
 [6,] " 4" "dbac"      
 [7,] " 5" "daecb"     
 [8,] " 6" "eadbfc"    
 [9,] " 9" "chfdbiaeg" 
[10,] " 3" "acb" 

这是一个从 res$a 到 as.matrix 的转换问题。字符串用空格填充到最宽的显示宽度。

您可以找到此行为的详细解释 here