如何交换 R 中两列子集中的值?
How to swap the values in a two column subset in R?
我在 R 中有一个数据框 df,其中包含性别列和年龄列。在数据清理过程中,我注意到一些值的年龄和性别被翻转了,所以数据看起来像这样:
SequenceNo Sex Age
1. sequence1 Male 65
2. sequence2 Female 45
3. sequence3 21 Male
4. sequence4 Female 12
我知道我可以逐行手动分配正确的值来修复数据,但是是否有一种简单、统一的解决方案可以只翻转数据不匹配的行?
我们可以创建一个逻辑索引(使用 grepl
只检查数字(\d+
- 对于数字浮点数,使用 [0-9.]+
并假设有负值 -?
) 从字符串的开始 (^
) 到结束 ($
) 或反转 \D
任何非数字或可以使用 as.numeric/as.integer
并检查 NA
元素使用 is.na
) 并通过交换列名来交换它,然后使用 type.convert
更改列的类型
i1 <- grepl("^-?[0-9.]+$", df$Sex)
df[i1, c("Sex", "Age")] <- df[i1, c("Age", "Sex")]
df <- type.convert(df, as.is = TRUE)
-输出
> df
SequenceNo Sex Age
1. sequence1 Male 65
2. sequence2 Female 45
3. sequence3 Male 21
4. sequence4 Female 12
> str(df)
'data.frame': 4 obs. of 3 variables:
$ SequenceNo: chr "sequence1" "sequence2" "sequence3" "sequence4"
$ Sex : chr "Male" "Female" "Male" "Female"
$ Age : int 65 45 21 12
数据
df <- structure(list(SequenceNo = c("sequence1", "sequence2", "sequence3",
"sequence4"), Sex = c("Male", "Female", "21", "Female"), Age = c("65",
"45", "Male", "12")), class = "data.frame", row.names = c("1.",
"2.", "3.", "4."))
更新: 以避免 NA
(感谢 rjen):
我们创建一个 helper
列,而不是像答案 1 中那样做:
library(tidyverse)
df %>%
mutate(helper = paste0(Sex, Age),
Age = parse_number(helper),
Sex = str_replace_all(helper, "[:digit:]", "")) %>%
select(-helper)
SequenceNo Sex Age
1. sequence1 Male 65
2. sequence2 Female 45
3. sequence3 Male 21
4. sequence4 Female 12
第一个回答:
使用 paste0(Sex, Age)
我们合并两列:
- 我们将所有数字替换为空,即删除所有数字
- 我们只提取数字
library(tidyverse)
df %>%
mutate(Sex = str_replace_all(paste0(Sex, Age), "[:digit:]", "")) %>%
mutate(Age = parse_number(paste0(Sex, Age)))
SequenceNo Sex Age
1 sequence1 Male 65
2 sequence2 Female 45
3 sequence3 Male NA
4 sequence4 Female 12
使用 if_else()
的方法。
library(dplyr)
df %>%
mutate(SexNew = if_else(Sex %in% c('Male', 'Female'), Sex, Age),
Age = if_else(Age %in% 1:120, Age, Sex)) %>%
select(-Sex, Sex = SexNew)
# SequenceNo Age Sex
# 1 sequence1 65 Male
# 2 sequence2 45 Female
# 3 sequence3 21 Male
# 4 sequence4 12 Female
我在 R 中有一个数据框 df,其中包含性别列和年龄列。在数据清理过程中,我注意到一些值的年龄和性别被翻转了,所以数据看起来像这样:
SequenceNo Sex Age
1. sequence1 Male 65
2. sequence2 Female 45
3. sequence3 21 Male
4. sequence4 Female 12
我知道我可以逐行手动分配正确的值来修复数据,但是是否有一种简单、统一的解决方案可以只翻转数据不匹配的行?
我们可以创建一个逻辑索引(使用 grepl
只检查数字(\d+
- 对于数字浮点数,使用 [0-9.]+
并假设有负值 -?
) 从字符串的开始 (^
) 到结束 ($
) 或反转 \D
任何非数字或可以使用 as.numeric/as.integer
并检查 NA
元素使用 is.na
) 并通过交换列名来交换它,然后使用 type.convert
i1 <- grepl("^-?[0-9.]+$", df$Sex)
df[i1, c("Sex", "Age")] <- df[i1, c("Age", "Sex")]
df <- type.convert(df, as.is = TRUE)
-输出
> df
SequenceNo Sex Age
1. sequence1 Male 65
2. sequence2 Female 45
3. sequence3 Male 21
4. sequence4 Female 12
> str(df)
'data.frame': 4 obs. of 3 variables:
$ SequenceNo: chr "sequence1" "sequence2" "sequence3" "sequence4"
$ Sex : chr "Male" "Female" "Male" "Female"
$ Age : int 65 45 21 12
数据
df <- structure(list(SequenceNo = c("sequence1", "sequence2", "sequence3",
"sequence4"), Sex = c("Male", "Female", "21", "Female"), Age = c("65",
"45", "Male", "12")), class = "data.frame", row.names = c("1.",
"2.", "3.", "4."))
更新: 以避免 NA
(感谢 rjen):
我们创建一个 helper
列,而不是像答案 1 中那样做:
library(tidyverse)
df %>%
mutate(helper = paste0(Sex, Age),
Age = parse_number(helper),
Sex = str_replace_all(helper, "[:digit:]", "")) %>%
select(-helper)
SequenceNo Sex Age
1. sequence1 Male 65
2. sequence2 Female 45
3. sequence3 Male 21
4. sequence4 Female 12
第一个回答:
使用 paste0(Sex, Age)
我们合并两列:
- 我们将所有数字替换为空,即删除所有数字
- 我们只提取数字
library(tidyverse)
df %>%
mutate(Sex = str_replace_all(paste0(Sex, Age), "[:digit:]", "")) %>%
mutate(Age = parse_number(paste0(Sex, Age)))
SequenceNo Sex Age
1 sequence1 Male 65
2 sequence2 Female 45
3 sequence3 Male NA
4 sequence4 Female 12
使用 if_else()
的方法。
library(dplyr)
df %>%
mutate(SexNew = if_else(Sex %in% c('Male', 'Female'), Sex, Age),
Age = if_else(Age %in% 1:120, Age, Sex)) %>%
select(-Sex, Sex = SexNew)
# SequenceNo Age Sex
# 1 sequence1 65 Male
# 2 sequence2 45 Female
# 3 sequence3 21 Male
# 4 sequence4 12 Female