如何对文本文件中的两个不同时期进行子集化?
How to subset two different periods in a text file?
我有一个包含两年数据的文本文件,我想从中提取两个不同的时期(每年 7 月到 9 月)
读取文件:
wg=read.table("C:\Users\ERIE.txt", sep ='' , header =TRUE)
head(wg)
Year day hour mint valu1 valu2 date
105169 2008 1 7 30 0.045 0.014 2008-01-01
105217 2008 2 7 30 0.046 0.015 2008-01-02
105265 2008 3 7 30 0.043 0.013 2008-01-03
现在子集:
wg= subset(wg, wg$date >= "2008-07-01" & wg$date <= "2008-09-30" & wg$date >= "2009-07-01" & wg$date <= "2009-09-30")
> wg
[1] Year day hour mint valu1 valu2 date
<0 rows> (or 0-length row.names)
wg= subset(wg, date >= "2008-07-01" & date <= "2008-09-30" & date >= "2009-07-01" & date <= "2009-09-30")
> wg
[1] Year day hour mint valu1 valu2 date
<0 rows> (or 0-length row.names)
知道为什么它不起作用
#one way is to use `filter` from `dplyr` package (and assuming Date is already in Date format)
library(dplyr)
wg %>%
filter(year %in% c(2008,2009) & months(Date) %in% c("July","August","September")
#If you want to stick to subset, replace second & with |:
subset(wg, date >= "2008-07-01" & date <= "2008-09-30" | date >= "2009-07-01" & date <= "2009-09-30")
Edit:正如 Metrics 所指出的,您的代码中有一个 &
而不是 |
。这是一个可重现的示例,显示如何 select 或使用 subset
:
排除范围内的日期
> mydat <- data.frame(dat = seq(as.Date("2015-01-01"), as.Date("2015-01-15"), by = "days"), x = "x")
> mydat
dat x
1 2015-01-01 x
2 2015-01-02 x
3 2015-01-03 x
4 2015-01-04 x
5 2015-01-05 x
6 2015-01-06 x
7 2015-01-07 x
8 2015-01-08 x
9 2015-01-09 x
10 2015-01-10 x
11 2015-01-11 x
12 2015-01-12 x
13 2015-01-13 x
14 2015-01-14 x
15 2015-01-15 x
> subset(mydat, (dat >= "2015-01-05" & dat <= "2015-01-08") | (dat >= "2015-01-11" & dat <= "2015-01-13"))
dat x
5 2015-01-05 x
6 2015-01-06 x
7 2015-01-07 x
8 2015-01-08 x
11 2015-01-11 x
12 2015-01-12 x
13 2015-01-13 x
> subset(mydat, !((dat >= "2015-01-05" & dat <= "2015-01-08") | (dat >= "2015-01-11" & dat <= "2015-01-13")))
dat x
1 2015-01-01 x
2 2015-01-02 x
3 2015-01-03 x
4 2015-01-04 x
9 2015-01-09 x
10 2015-01-10 x
14 2015-01-14 x
15 2015-01-15 x
我有一个包含两年数据的文本文件,我想从中提取两个不同的时期(每年 7 月到 9 月)
读取文件:
wg=read.table("C:\Users\ERIE.txt", sep ='' , header =TRUE)
head(wg)
Year day hour mint valu1 valu2 date
105169 2008 1 7 30 0.045 0.014 2008-01-01
105217 2008 2 7 30 0.046 0.015 2008-01-02
105265 2008 3 7 30 0.043 0.013 2008-01-03
现在子集:
wg= subset(wg, wg$date >= "2008-07-01" & wg$date <= "2008-09-30" & wg$date >= "2009-07-01" & wg$date <= "2009-09-30")
> wg
[1] Year day hour mint valu1 valu2 date
<0 rows> (or 0-length row.names)
wg= subset(wg, date >= "2008-07-01" & date <= "2008-09-30" & date >= "2009-07-01" & date <= "2009-09-30")
> wg
[1] Year day hour mint valu1 valu2 date
<0 rows> (or 0-length row.names)
知道为什么它不起作用
#one way is to use `filter` from `dplyr` package (and assuming Date is already in Date format)
library(dplyr)
wg %>%
filter(year %in% c(2008,2009) & months(Date) %in% c("July","August","September")
#If you want to stick to subset, replace second & with |:
subset(wg, date >= "2008-07-01" & date <= "2008-09-30" | date >= "2009-07-01" & date <= "2009-09-30")
Edit:正如 Metrics 所指出的,您的代码中有一个 &
而不是 |
。这是一个可重现的示例,显示如何 select 或使用 subset
:
> mydat <- data.frame(dat = seq(as.Date("2015-01-01"), as.Date("2015-01-15"), by = "days"), x = "x")
> mydat
dat x
1 2015-01-01 x
2 2015-01-02 x
3 2015-01-03 x
4 2015-01-04 x
5 2015-01-05 x
6 2015-01-06 x
7 2015-01-07 x
8 2015-01-08 x
9 2015-01-09 x
10 2015-01-10 x
11 2015-01-11 x
12 2015-01-12 x
13 2015-01-13 x
14 2015-01-14 x
15 2015-01-15 x
> subset(mydat, (dat >= "2015-01-05" & dat <= "2015-01-08") | (dat >= "2015-01-11" & dat <= "2015-01-13"))
dat x
5 2015-01-05 x
6 2015-01-06 x
7 2015-01-07 x
8 2015-01-08 x
11 2015-01-11 x
12 2015-01-12 x
13 2015-01-13 x
> subset(mydat, !((dat >= "2015-01-05" & dat <= "2015-01-08") | (dat >= "2015-01-11" & dat <= "2015-01-13")))
dat x
1 2015-01-01 x
2 2015-01-02 x
3 2015-01-03 x
4 2015-01-04 x
9 2015-01-09 x
10 2015-01-10 x
14 2015-01-14 x
15 2015-01-15 x