R:只保留值与另一列中的值不同的行
R: only keep rows whose values differ from values in another column
我只想保留第 1 列中最后两个字母(州缩写)与第 3 列中最后两个字母不同的行
countyname fipscounty neighborname fipsneighbor
1 Archuleta County, CO 8007 Rio Grande County, CO 8105
2 Archuleta County, CO 8007 Rio Arriba County, NM 35039
3 Archuleta County, CO 8007 San Juan County, NM 35045
在第 1 行中,两个县都在科罗拉多州。在第 2 行和第 3 行中,第一个县位于 CO,第二个县位于 NM。我只想保留第 2 行和第 3 行,使其看起来像这样:
countyname fipscounty neighborname fipsneighbor
2 Archuleta County, CO 8007 Rio Arriba County, NM 35039
3 Archuleta County, CO 8007 San Juan County, NM 35045
我该怎么做?
我们可以使用 str_sub
比较每列中的最后 2 个字符,并且仅 return 州缩写不匹配的行。
library(tidyverse)
df %>%
filter(str_sub(countyname, start= -2) != (str_sub(neighborname, start= -2)))
输出
countyname fipscounty neighborname fipsneighbor
1 Archuleta County, CO 8007 Rio Arriba County, NM 35039
2 Archuleta County, CO 8007 San Juan County, NM 35045
或者在 base R 中,我们可以使用 sub
对每列中的最后 2 个字符进行子集化,然后过滤数据帧。
df[sub('.*(?=.{2}$)', '', df$countyname, perl=T) !=
sub('.*(?=.{2}$)', '', df$neighborname, perl=T),]
或使用 substr
的另一个选项(虽然更冗长):
df[substr(df$countyname, nchar(df$countyname)-1, nchar(df$countyname)) !=
substr(df$neighborname, nchar(df$neighborname)-1, nchar(df$neighborname)),]
数据
df <- structure(list(countyname = c("Archuleta County, CO", "Archuleta County, CO",
"Archuleta County, CO"), fipscounty = c(8007L, 8007L, 8007L),
neighborname = c("Rio Grande County, CO", "Rio Arriba County, NM",
"San Juan County, NM"), fipsneighbor = c(8105L, 35039L, 35045L
)), class = "data.frame", row.names = c(NA, -3L))
我只想保留第 1 列中最后两个字母(州缩写)与第 3 列中最后两个字母不同的行
countyname fipscounty neighborname fipsneighbor
1 Archuleta County, CO 8007 Rio Grande County, CO 8105
2 Archuleta County, CO 8007 Rio Arriba County, NM 35039
3 Archuleta County, CO 8007 San Juan County, NM 35045
在第 1 行中,两个县都在科罗拉多州。在第 2 行和第 3 行中,第一个县位于 CO,第二个县位于 NM。我只想保留第 2 行和第 3 行,使其看起来像这样:
countyname fipscounty neighborname fipsneighbor
2 Archuleta County, CO 8007 Rio Arriba County, NM 35039
3 Archuleta County, CO 8007 San Juan County, NM 35045
我该怎么做?
我们可以使用 str_sub
比较每列中的最后 2 个字符,并且仅 return 州缩写不匹配的行。
library(tidyverse)
df %>%
filter(str_sub(countyname, start= -2) != (str_sub(neighborname, start= -2)))
输出
countyname fipscounty neighborname fipsneighbor
1 Archuleta County, CO 8007 Rio Arriba County, NM 35039
2 Archuleta County, CO 8007 San Juan County, NM 35045
或者在 base R 中,我们可以使用 sub
对每列中的最后 2 个字符进行子集化,然后过滤数据帧。
df[sub('.*(?=.{2}$)', '', df$countyname, perl=T) !=
sub('.*(?=.{2}$)', '', df$neighborname, perl=T),]
或使用 substr
的另一个选项(虽然更冗长):
df[substr(df$countyname, nchar(df$countyname)-1, nchar(df$countyname)) !=
substr(df$neighborname, nchar(df$neighborname)-1, nchar(df$neighborname)),]
数据
df <- structure(list(countyname = c("Archuleta County, CO", "Archuleta County, CO",
"Archuleta County, CO"), fipscounty = c(8007L, 8007L, 8007L),
neighborname = c("Rio Grande County, CO", "Rio Arriba County, NM",
"San Juan County, NM"), fipsneighbor = c(8105L, 35039L, 35045L
)), class = "data.frame", row.names = c(NA, -3L))