R 从 header 创建一个新列并使用循环重新排序 table
R Create a new column from a header and reorder table using loop
我想寻求帮助以从 header
中提取信息
我有一个 table 在一个文件(下面的示例)中有数百行和 1000 列(相等),我想做一个循环来从 header 部分(新列)并对行中的值重新排序。
R2n_19970919__105056604_2_BF.MER_A123_DAY_00.nc <- c(0.09,0.09,0.08,0.08,0.06,0.07,0.09,0.08,0.08,"NA")
R2n_19970920__105056604_2_BF.MER_A123_DAY_00.nc <- c(0.08,0.08,0.08,0.07,"NA",0.05,0.08,0.08,0.08,"NA")
R2n_19970921__105056604_2_BF.MER_A123_DAY_00.nc <- c(0.07,"NA",0.08,"NA","NA",0.07,0.06,"NA",0.08,"NA")
data <- data.frame(R2n_19970919__105056604_2_BF.MER_A123_DAY_00.nc,R2n_19970920__105056604_2_BF.MER_A123_DAY_00.nc,R2n_19970921__105056604_2_BF.MER_A123_DAY_00.nc)
如何做到最好?将不胜感激。
这是我的预期结果:
R2n_19970919__105056604_2_BF.MER_A123_DAY_00.nc = 1997/09/19.
Date R2n.nc
1997/09/19 0.09
1997/09/19 0.09
1997/09/19 0.08
1997/09/19 0.08
1997/09/19 0.06
1997/09/19 0.07
1997/09/19 0.09
1997/09/19 0.08
1997/09/19 0.08
1997/09/19 NA
1999/09/20 0.08
1999/09/20 0.08
1999/09/20 0.08
1999/09/20 0.07
1999/09/20 NA
1999/09/20 0.05
1999/09/20 0.08
1999/09/20 0.08
1999/09/20 0.08
1999/09/20 NA
2001/09/21 ...
.
.
.
这是一个解决方案,使用@Roman Luštrik 建议的技巧:
library(stringr) # str_sub() function
library(reshape2) # melt() function
# Modify columns names (if date information is always at the same position)
names(data) = paste0(str_sub(names(data), 5,8), "-", str_sub(names(data), 9,10), "-",str_sub(names(data), 11, 12))
data$id = seq(1,nrow(data))
# Melt the data
data_melt = melt(data, id = "id")
> data_melt
id variable value
1 1 1997-09-19 0.09
2 2 1997-09-19 0.09
3 3 1997-09-19 0.08
4 4 1997-09-19 0.08
5 5 1997-09-19 0.06
...
library(anytime)
df <- stack(data)
df$ind <- anydate(substr(df$ind, 5, 12))
head(df)
## values ind
## 1 0.09 1997-09-19
## 2 0.09 1997-09-19
## 3 0.08 1997-09-19
## 4 0.08 1997-09-19
## 5 0.06 1997-09-19
## 6 0.07 1997-09-19
虽然我可能会这样做:
library(anytime)
library(dplyr)
tbl_df(data) %>%
stack() %>%
mutate(ind=anydate(substr(ind, 5, 12)))
## # A tibble: 30 × 2
## values ind
## <chr> <date>
## 1 0.09 1997-09-19
## 2 0.09 1997-09-19
## 3 0.08 1997-09-19
## 4 0.08 1997-09-19
## 5 0.06 1997-09-19
## 6 0.07 1997-09-19
## 7 0.09 1997-09-19
## 8 0.08 1997-09-19
## 9 0.08 1997-09-19
## 10 NA 1997-09-19
## # ... with 20 more rows
相反。
我想寻求帮助以从 header
中提取信息我有一个 table 在一个文件(下面的示例)中有数百行和 1000 列(相等),我想做一个循环来从 header 部分(新列)并对行中的值重新排序。
R2n_19970919__105056604_2_BF.MER_A123_DAY_00.nc <- c(0.09,0.09,0.08,0.08,0.06,0.07,0.09,0.08,0.08,"NA")
R2n_19970920__105056604_2_BF.MER_A123_DAY_00.nc <- c(0.08,0.08,0.08,0.07,"NA",0.05,0.08,0.08,0.08,"NA")
R2n_19970921__105056604_2_BF.MER_A123_DAY_00.nc <- c(0.07,"NA",0.08,"NA","NA",0.07,0.06,"NA",0.08,"NA")
data <- data.frame(R2n_19970919__105056604_2_BF.MER_A123_DAY_00.nc,R2n_19970920__105056604_2_BF.MER_A123_DAY_00.nc,R2n_19970921__105056604_2_BF.MER_A123_DAY_00.nc)
如何做到最好?将不胜感激。
这是我的预期结果:
R2n_19970919__105056604_2_BF.MER_A123_DAY_00.nc = 1997/09/19.
Date R2n.nc
1997/09/19 0.09
1997/09/19 0.09
1997/09/19 0.08
1997/09/19 0.08
1997/09/19 0.06
1997/09/19 0.07
1997/09/19 0.09
1997/09/19 0.08
1997/09/19 0.08
1997/09/19 NA
1999/09/20 0.08
1999/09/20 0.08
1999/09/20 0.08
1999/09/20 0.07
1999/09/20 NA
1999/09/20 0.05
1999/09/20 0.08
1999/09/20 0.08
1999/09/20 0.08
1999/09/20 NA
2001/09/21 ...
.
.
.
这是一个解决方案,使用@Roman Luštrik 建议的技巧:
library(stringr) # str_sub() function
library(reshape2) # melt() function
# Modify columns names (if date information is always at the same position)
names(data) = paste0(str_sub(names(data), 5,8), "-", str_sub(names(data), 9,10), "-",str_sub(names(data), 11, 12))
data$id = seq(1,nrow(data))
# Melt the data
data_melt = melt(data, id = "id")
> data_melt
id variable value
1 1 1997-09-19 0.09
2 2 1997-09-19 0.09
3 3 1997-09-19 0.08
4 4 1997-09-19 0.08
5 5 1997-09-19 0.06
...
library(anytime)
df <- stack(data)
df$ind <- anydate(substr(df$ind, 5, 12))
head(df)
## values ind
## 1 0.09 1997-09-19
## 2 0.09 1997-09-19
## 3 0.08 1997-09-19
## 4 0.08 1997-09-19
## 5 0.06 1997-09-19
## 6 0.07 1997-09-19
虽然我可能会这样做:
library(anytime)
library(dplyr)
tbl_df(data) %>%
stack() %>%
mutate(ind=anydate(substr(ind, 5, 12)))
## # A tibble: 30 × 2
## values ind
## <chr> <date>
## 1 0.09 1997-09-19
## 2 0.09 1997-09-19
## 3 0.08 1997-09-19
## 4 0.08 1997-09-19
## 5 0.06 1997-09-19
## 6 0.07 1997-09-19
## 7 0.09 1997-09-19
## 8 0.08 1997-09-19
## 9 0.08 1997-09-19
## 10 NA 1997-09-19
## # ... with 20 more rows
相反。