删除值与列 header 相同的重复行
Remove duplicate rows which have values as that of column header
我的数据有点像这样:
+--------+--------+--------+
| region | name | salary |
+--------+--------+--------+
| west | raj | 100 |
| north | simran | 150 |
| region | name | salary |
| east | prem | 250 |
| region | name | salary |
| south | preeti | 200 |
+--------+--------+--------+
我的列 header 的名称在第 3 行和第 5 行中重复。如何使用 R 删除第 3 行和第 5 行并保留列 header我的输出看起来像这样:
+--------+--------+--------+
| region | name | salary |
+--------+--------+--------+
| west | raj | 100 |
| north | simran | 150 |
| east | prem | 250 |
| south | preeti | 200 |
+--------+--------+--------+
假设我的原始数据有太多行,我不想简单地 select 行号并使用命令 Data[-c(3, 5), ]
删除它们
使用带过滤器的 str_detect() 删除那些行。
library(tidyverse)
df <- tibble(
region = c("west", "north", "region", "east","region","south"),
name = c("raj", "simran","name","prem", "name","preeti"),
salary = c("100","150","salary","250","salary","200")
)
df_2 <- df %>%
filter(!str_detect(salary,"[Aa-zZ]"))
df_2
或者您可以使用基数 R
df_2 <- df[-grep("[Aa-zZ]",df$salary),]
df_2
这是一个简单的解决方案
x <- data.frame(x =c("a", "b", "c", "x"), z = c("a", "b", "c", "z"))
## identify rows which match colnames
matched <- apply(x,1, function(i) i[1] %in% colnames(x) && i[2] %in% colnames(x))
## Take the inverse of the match
x[!matched,]
假设,salary
是一个数字字段,你可以简单地这样做 -
# assuming df is your dataframe
clean_df <- df[!is.na(as.numeric(df$salary)), ]
我的数据有点像这样:
+--------+--------+--------+
| region | name | salary |
+--------+--------+--------+
| west | raj | 100 |
| north | simran | 150 |
| region | name | salary |
| east | prem | 250 |
| region | name | salary |
| south | preeti | 200 |
+--------+--------+--------+
我的列 header 的名称在第 3 行和第 5 行中重复。如何使用 R 删除第 3 行和第 5 行并保留列 header我的输出看起来像这样:
+--------+--------+--------+
| region | name | salary |
+--------+--------+--------+
| west | raj | 100 |
| north | simran | 150 |
| east | prem | 250 |
| south | preeti | 200 |
+--------+--------+--------+
假设我的原始数据有太多行,我不想简单地 select 行号并使用命令 Data[-c(3, 5), ]
删除它们使用带过滤器的 str_detect() 删除那些行。
library(tidyverse)
df <- tibble(
region = c("west", "north", "region", "east","region","south"),
name = c("raj", "simran","name","prem", "name","preeti"),
salary = c("100","150","salary","250","salary","200")
)
df_2 <- df %>%
filter(!str_detect(salary,"[Aa-zZ]"))
df_2
或者您可以使用基数 R
df_2 <- df[-grep("[Aa-zZ]",df$salary),]
df_2
这是一个简单的解决方案
x <- data.frame(x =c("a", "b", "c", "x"), z = c("a", "b", "c", "z"))
## identify rows which match colnames
matched <- apply(x,1, function(i) i[1] %in% colnames(x) && i[2] %in% colnames(x))
## Take the inverse of the match
x[!matched,]
假设,salary
是一个数字字段,你可以简单地这样做 -
# assuming df is your dataframe
clean_df <- df[!is.na(as.numeric(df$salary)), ]