根据变量在 data.frame 的每个块中提取数据

Question

我正在尝试为我的数据的每个块（层）提取第一条记录。我想提取每个块中负值 (Mag) 的第一次出现以及相应的时间。然后我想比较每个块中的 "times" 并找到最小值和最大值。（这是第一件事）

我一直来到某个时候但卡住了。任何帮助，包括缩短代码，将不胜感激。谢谢！

# to make sample data
data_neg<-seq(-0.98,-1,length=300)
data_pos<-seq(0.98,1,length=300)
time<-seq(1,54,length=600)

# binding those neg and pos numbers together
tot_num<- data.frame(c(rep(time, times=4)),c(rep(cbind(data_pos,data_neg),times=4)))    
colnames(tot_num)=c("time","Mag")

# split data into chunks
n <- 1:4  
dfchunk<- split(tot_num, factor(sort(rank(row.names(tot_num))%%n)))
ext_fsw<-lapply(dfchunk[],function(x)with(x,x[Mag<0,,drop=TRUE])) 
# here I want to exctract first appearance of negative value of Mag in each chunk together with corresponding time.

作为我问题的第二部分在@zx8754 建议后我尝试读取我的真实数据选择第一次出现的负值进行循环并绘制结果后。但是我意识到在我的真实数据中有这样的 N.A 个值（我从我的文件夹中读取了 11 个数据，你可以看到下面的代码...）

   X1      X2
1 27.45 -0.0111
2 43.29 -0.9746
3 32.49 -0.9807
4 28.08 -0.0538
5 28.44 -0.0669
 X1      X2
1 28.71 -0.0834
2 43.29 -0.9736
3 32.49 -0.9521
4 29.16 -0.0032
5 29.70 -0.0469
 X1      X2
1 30.06 -0.0112
2 43.29 -0.9724
3 35.37 -0.0448
4 33.03 -0.0308
5 31.59 -0.0055
 X1      X2
1 35.19 -0.0476
2 43.29 -0.9712
3 39.42 -0.0171
4 40.50 -0.0143
5 36.18 -0.0395
 X1      X2
1    NA      NA
2    NA      NA
3    NA      NA
4 50.85 -0.0371
5    NA      NA
   X1  X2
   1 NA      NA
2    NA      NA
3    NA      NA
4    NA      NA
5    NA      NA
   X1 X2
1    NA      NA
2    NA      NA
3    NA      NA
4    NA      NA
5    NA      NA
     X1     X2
1    NA     NA
2    NA     NA
3 49.77 -3e-04
4    NA     NA
5    NA     NA
     X1      X2
1    NA      NA
2    NA      NA
3    NA      NA
4 43.02 -0.0465
5 45.99 -0.9793
     X1      X2
1    NA      NA
2 37.98 -0.0005
3 45.18 -0.9784
4    NA      NA
5 45.09 -0.0551
     X1      X2
1    NA      NA
2    NA      NA
3 36.90 -0.0148
4 46.17 -0.9813
5    NA      NA

这里是循环读取我的数据

data.list <- dir(pattern = "*.avgm",full.names = FALSE) # creates the list    of all the csv files in the directory

a<-1:length(data.list)
for(k in 1:length(data.list)){
data1_stt<- read.table(data.list[k],colClasses="numeric",skip=0,   fill=FALSE, sep = "", quote="\"'", dec=".", as.is = TRUE, strip.white=FALSE)
StrL1<-data1_stt[,10]
time<-data1_stt[,1]*10^-3
tot_num<- data.frame(time,StrL1)
colnames(tot_num)=c("time","Mag")
n <- 5  # split data into chunks
dfchunk<- split(tot_num, factor(sort(rank(row.names(tot_num))%%n)))
ext_fsw<-lapply(dfchunk,function(x)x[which(x$Mag<0)[1],])#which - gives the index where the conditions is TRUE, then take the 1st value [1], pass it to x as index for rownumber.
x.n <- data.frame(matrix(unlist(ext_fsw),nrow=5, byrow=T))
print(x.n)
curr<-rep(c(8,7,6,5,4,3.6,3.8,4.2,4.4,4.6,4.8),each=5)
plot(curr,x.n,pch = 20) 
}

简而言之，我任务的第二步是读取所有数据并将其绘制为每个当前值。但我没有这样做。很抱歉，我无法将可重现的示例放在这里。由于数据中有 N.A 个值，因此总长度在负值方面有所不同。

Answer 1

试试这个：

ext_fsw<-lapply(dfchunk,function(x)
  x[which(x$Mag<0)[1],]
  )

which - 给出条件为 TRUE 的索引，然后取第一个值 [1]，将其作为 rownumber 的索引传递给 x。

根据变量在 data.frame 的每个块中提取数据

Extract data in every chunk of data.frame depending on variable

row

r

extract

dataframe