通过重复值和何时有断点创建具有条件的新列
Create new column with condition by repeat values and when have break points
我的数据是大约40只动物(ids)通过遥测定位,我已经规定了3个区域。第一个是AR
,这里是繁殖区,AM
迁徙,AA
是饲养区。所有动物的第一个位置是 AR
。但有时动物还处于繁殖期(在AR
),但可以出去AM
几次,然后又回到AR
。只有当动物只有 AM
时它们才开始迁徙,直到到达觅食区 AA
。所以,它们从AR
出发,然后开始迁徙AM
,然后到达觅食区AA
。
我正在尝试创建一个新的专栏,其中包含一些我还不知道如何操作的条件,
例如我有这个数据框
id area
2304 AR
2304 AR
2304 AR
2304 AM #this AM for example, can repeat until 20 times and then came back to AR
2304 AM
2304 AR
2304 AR
2304 AR
2304 AM
2304 AM
2304 AM
2304 AM
2304 ...
2304 AM
2304 AM
2304 AM
2304 AA
2304 AA
2304 ...
2304 AA
所以,当有 AR x 次,之后有一次或直到 20 AM 回来有 AR,我想要一个新的 AR 专栏。
到有 AM x 次并且只有 AM,没有回到 AR 的那一刻,我想要带有 AM 的新专栏。像这样:
AA 没问题,AA = AA 总是
我预料到了:
id area fixed_area
2304 AR AR
2304 AR AR
2304 AR AR
2304 AM AR #this AM for example, can repeat until 20 times and then came back to AR
2304 AM AR
2304 AR AR
2304 AR AR
2304 AR AR
2304 AM AM
2304 AM AM
2304 AM AM
2304 AM AM
2304 ... ...
2304 AM AM
2304 AM AM
2304 AM AM
2304 AA AA
2304 AA AA
2304 ... ...
2304 AA AA
我试过这个:
但是 AA
不见了,也许问题是因为需要对每只动物进行这种分离 (id)
> table(df$area)
AA AM AR
31460 39101 28820
class(df$area)
[1] "character"
> idx <- with(rle(as.character(df$area)), rep(seq_along(lengths), lengths))
> df$fixed_area <- with(df, replace(area, idx < max(idx[area == 'AM']), 'AR'))
> table(df$fixed_area)
AM AR
145 99236
>
在此之后我输入了数据框,但是我的数据框有超过 90.000 行,所以我只复制了头部值
> dput(head(df))
structure(list(DeployID = c("111868_16", "111868_16", "111868_16",
"111868_16", "111868_16", "111868_16"), Start = structure(c(1477323868,
1477323946, 1477324002, 1477324044, 1477324260, 1477324480), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), End = structure(c(1477323944, 1477324000,
1477324042, 1477324170, 1477324458, 1477324542), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), What = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = c("Dive", "Message", "Surface"), class = "factor"),
Shape = structure(c(2L, 4L, 3L, 2L, 2L, 2L), .Label = c("",
"Square", "U", "V"), class = "factor"), DepthMean = c(14.5,
16.5, 13, 14.5, 11, 12.5), DurationMean = c(76, 54, 40, 126,
198, 62), DepthMin = c(14.5, 16.5, 13, 14.5, 11, 12.5), DepthMax = c(14.5,
16.5, 13, 14.5, 11, 12.5), depth_range = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("shallow", "deep"), class = c("ordered",
"factor")), MidTime = structure(c(1477323906, 1477323973,
1477324022, 1477324107, 1477324359, 1477324511), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), year = c(2016, 2016, 2016, 2016,
2016, 2016), id = c("111868_16", "111868_16", "111868_16",
"111868_16", "111868_16", "111868_16"), segmentid = c("111868_16",
"111868_16", "111868_16", "111868_16", "111868_16", "111868_16"
), mu.x = c(-4446545.25191192, -4446557.10576816, -4446565.77504969,
-4446580.81370994, -4446625.40007808, -4446652.29459533),
mu.y = c(-2305423.86124176, -2305461.88537725, -2305489.69364377,
-2305537.93137917, -2305680.93056743, -2305767.17264774),
lon = c(-39.9439956132156, -39.944102098218, -39.944179975699,
-39.9443150702825, -39.9447155964422, -39.9449571940013),
lat = c(-20.3985940756941, -20.3989161274532, -20.3991516537744,
-20.3995602097098, -20.4007713539709, -20.4015017842338),
lq_closest_filt = c(7L, 7L, 7L, 7L, 7L, 7L), dt_closest_filt = c(0.0516666666666667,
0.0702777777777778, 0.0838888888888889, 0.1075, 0.1775, 0.219722222222222
), dist_closest_filt = c(0.103680210832692, 0.141026573116106,
0.168339162761167, 0.215717097671267, 0.356168027785347,
0.440874049523752), rel.angle = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), speed = c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), depth_bin = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("(0,50]", "(50,100]", "(100,150]",
"(150,200]", "(200,250]", "(250,300]", "(300,350]", "(350,400]",
"(400,450]", "(450,500]", "(500,550]", "(550,600]", "(600,650]",
"(650,700]"), class = "factor"), bat = structure(list(depth = c(-59L,
-59L, -59L, -59L, -59L, -59L)), row.names = c(NA, 6L), class = "data.frame"),
area = c("AR", "AR", "AR", "AR", "AR", "AR")), row.names = c(NA,
6L), class = "data.frame")
有人知道如何解决这个问题吗?谢谢!
听起来您可能需要一些规则来决定哪些行 AM
变为 AR
。
- 如果连续
AM
个数<20
- 如果下面的目的地是不是
AA
一种方法是使用 rle
添加与这两个规则相关的列。对于重复序列中连续值的数量,一列将具有 lengths
。另一列将包含 "next" 区域。这将决定目的地是回到繁殖区,还是继续到觅食区。
最后,您可以使用条件语句,将 AM
到 AR
的那些行更改为满足这些条件:
- 当前
area
是 AM
- 下一个
area
之后是不是AA
- 重复值的个数小于20
代码如下:
df_rle <- rle(df$area)
df2 <- cbind(df, next_area = with(df_rle, rep(c(values[-1], NA), lengths)),
count = with(df_rle, rep(lengths, lengths)))
df2$area <- ifelse(with(df2, area == "AM" & next_area != "AA" & count < 20),
"AR", df2$area)
我的数据是大约40只动物(ids)通过遥测定位,我已经规定了3个区域。第一个是AR
,这里是繁殖区,AM
迁徙,AA
是饲养区。所有动物的第一个位置是 AR
。但有时动物还处于繁殖期(在AR
),但可以出去AM
几次,然后又回到AR
。只有当动物只有 AM
时它们才开始迁徙,直到到达觅食区 AA
。所以,它们从AR
出发,然后开始迁徙AM
,然后到达觅食区AA
。
我正在尝试创建一个新的专栏,其中包含一些我还不知道如何操作的条件, 例如我有这个数据框
id area
2304 AR
2304 AR
2304 AR
2304 AM #this AM for example, can repeat until 20 times and then came back to AR
2304 AM
2304 AR
2304 AR
2304 AR
2304 AM
2304 AM
2304 AM
2304 AM
2304 ...
2304 AM
2304 AM
2304 AM
2304 AA
2304 AA
2304 ...
2304 AA
所以,当有 AR x 次,之后有一次或直到 20 AM 回来有 AR,我想要一个新的 AR 专栏。 到有 AM x 次并且只有 AM,没有回到 AR 的那一刻,我想要带有 AM 的新专栏。像这样:
AA 没问题,AA = AA 总是
我预料到了:
id area fixed_area
2304 AR AR
2304 AR AR
2304 AR AR
2304 AM AR #this AM for example, can repeat until 20 times and then came back to AR
2304 AM AR
2304 AR AR
2304 AR AR
2304 AR AR
2304 AM AM
2304 AM AM
2304 AM AM
2304 AM AM
2304 ... ...
2304 AM AM
2304 AM AM
2304 AM AM
2304 AA AA
2304 AA AA
2304 ... ...
2304 AA AA
我试过这个:
但是 AA
不见了,也许问题是因为需要对每只动物进行这种分离 (id)
> table(df$area)
AA AM AR
31460 39101 28820
class(df$area)
[1] "character"
> idx <- with(rle(as.character(df$area)), rep(seq_along(lengths), lengths))
> df$fixed_area <- with(df, replace(area, idx < max(idx[area == 'AM']), 'AR'))
> table(df$fixed_area)
AM AR
145 99236
>
在此之后我输入了数据框,但是我的数据框有超过 90.000 行,所以我只复制了头部值
> dput(head(df))
structure(list(DeployID = c("111868_16", "111868_16", "111868_16",
"111868_16", "111868_16", "111868_16"), Start = structure(c(1477323868,
1477323946, 1477324002, 1477324044, 1477324260, 1477324480), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), End = structure(c(1477323944, 1477324000,
1477324042, 1477324170, 1477324458, 1477324542), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), What = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = c("Dive", "Message", "Surface"), class = "factor"),
Shape = structure(c(2L, 4L, 3L, 2L, 2L, 2L), .Label = c("",
"Square", "U", "V"), class = "factor"), DepthMean = c(14.5,
16.5, 13, 14.5, 11, 12.5), DurationMean = c(76, 54, 40, 126,
198, 62), DepthMin = c(14.5, 16.5, 13, 14.5, 11, 12.5), DepthMax = c(14.5,
16.5, 13, 14.5, 11, 12.5), depth_range = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("shallow", "deep"), class = c("ordered",
"factor")), MidTime = structure(c(1477323906, 1477323973,
1477324022, 1477324107, 1477324359, 1477324511), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), year = c(2016, 2016, 2016, 2016,
2016, 2016), id = c("111868_16", "111868_16", "111868_16",
"111868_16", "111868_16", "111868_16"), segmentid = c("111868_16",
"111868_16", "111868_16", "111868_16", "111868_16", "111868_16"
), mu.x = c(-4446545.25191192, -4446557.10576816, -4446565.77504969,
-4446580.81370994, -4446625.40007808, -4446652.29459533),
mu.y = c(-2305423.86124176, -2305461.88537725, -2305489.69364377,
-2305537.93137917, -2305680.93056743, -2305767.17264774),
lon = c(-39.9439956132156, -39.944102098218, -39.944179975699,
-39.9443150702825, -39.9447155964422, -39.9449571940013),
lat = c(-20.3985940756941, -20.3989161274532, -20.3991516537744,
-20.3995602097098, -20.4007713539709, -20.4015017842338),
lq_closest_filt = c(7L, 7L, 7L, 7L, 7L, 7L), dt_closest_filt = c(0.0516666666666667,
0.0702777777777778, 0.0838888888888889, 0.1075, 0.1775, 0.219722222222222
), dist_closest_filt = c(0.103680210832692, 0.141026573116106,
0.168339162761167, 0.215717097671267, 0.356168027785347,
0.440874049523752), rel.angle = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), speed = c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), depth_bin = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("(0,50]", "(50,100]", "(100,150]",
"(150,200]", "(200,250]", "(250,300]", "(300,350]", "(350,400]",
"(400,450]", "(450,500]", "(500,550]", "(550,600]", "(600,650]",
"(650,700]"), class = "factor"), bat = structure(list(depth = c(-59L,
-59L, -59L, -59L, -59L, -59L)), row.names = c(NA, 6L), class = "data.frame"),
area = c("AR", "AR", "AR", "AR", "AR", "AR")), row.names = c(NA,
6L), class = "data.frame")
有人知道如何解决这个问题吗?谢谢!
听起来您可能需要一些规则来决定哪些行 AM
变为 AR
。
- 如果连续
AM
个数<20 - 如果下面的目的地是不是
AA
一种方法是使用 rle
添加与这两个规则相关的列。对于重复序列中连续值的数量,一列将具有 lengths
。另一列将包含 "next" 区域。这将决定目的地是回到繁殖区,还是继续到觅食区。
最后,您可以使用条件语句,将 AM
到 AR
的那些行更改为满足这些条件:
- 当前
area
是AM
- 下一个
area
之后是不是AA
- 重复值的个数小于20
代码如下:
df_rle <- rle(df$area)
df2 <- cbind(df, next_area = with(df_rle, rep(c(values[-1], NA), lengths)),
count = with(df_rle, rep(lengths, lengths)))
df2$area <- ifelse(with(df2, area == "AM" & next_area != "AA" & count < 20),
"AR", df2$area)