ifelse() 命令问题,其中 2 个不同数据帧中看起来相同的 2 列未被识别为相同
Issue with ifelse() command, where 2 columns in 2 different data frames that look the same are not identified as identical
我在使用 ifelse() 命令时遇到问题,其中 2 个不同数据框中看起来相同的 2 列未被识别为相同。我可以使用任何指南来解决此问题,以便代码将数据帧相互比较并生成适当的输出,而不必自己输入 material by hand/typing 文本。
这是我的 2 个起始数据集,df_1
和 df_2
:
> df_1
DV_name
1 submission_time_minutes
2 submission_time_minutes
3 WC
4 WC
5 Analytic_z_score
6 Analytic_z_score
7 Clout_z_score
8 Clout_z_score
9 Authentic_z_score
10 Authentic_z_score
11 Tone_z_score
12 Tone_z_score
13 submission_time_minutes
14 submission_time_minutes
15 WC
16 WC
17 Analytic_z_score
18 Analytic_z_score
19 Clout_z_score
20 Clout_z_score
21 Authentic_z_score
22 Authentic_z_score
23 Tone_z_score
24 Tone_z_score
25 submission_time_minutes
26 submission_time_minutes
27 WC
28 WC
29 Analytic_z_score
30 Analytic_z_score
31 Clout_z_score
32 Clout_z_score
33 Authentic_z_score
34 Authentic_z_score
35 Tone_z_score
36 Tone_z_score
37 submission_time_minutes
38 submission_time_minutes
39 WC
40 WC
41 Analytic_z_score
42 Analytic_z_score
43 Clout_z_score
44 Clout_z_score
45 Authentic_z_score
46 Authentic_z_score
47 Tone_z_score
48 Tone_z_score
> df_2
Variable_analyses Variable_label
1 submission_time_minutes Submission time in minutes
2 WC Word count
3 Analytic_z_score Analytic score
4 Clout_z_score Clout score
5 Authentic_z_score Authentic score
6 Tone_z_score Tone score
我想创建列 df_1$Variable_label
,派生自 df_2$Variable_analyses
,基于 df_1$DV_name
和 df_2$Variable_analyses
之间的匹配 material。
这是成功的 长路:
> ## long way
>
> ### creates Variable_label
> # ---- NOTE: does not directly extract Variable_label from df_2 and insert it into df_1
> # ---- NOTE: based on df_1$Variable_label
> df_1$Variable_label <-
+ ifelse(df_1$DV_name == "submission_time_minutes", "Submission time in minutes",
+ ifelse(df_1$DV_name == "WC", "Word count",
+ ifelse(df_1$DV_name == "Analytic_z_score", "Analytic score",
+ ifelse(df_1$DV_name == "Clout_z_score", "Clout score",
+ ifelse(df_1$DV_name == "Authentic_z_score", "Authentic score",
+ ifelse(df_1$DV_name == "Tone_z_score", "Tone score", NA
+ ))))))
>
> ### displays df
> # ---- NOTE: displays df with created variable in desired output form
> df_1
DV_name Variable_label
1 submission_time_minutes Submission time in minutes
2 submission_time_minutes Submission time in minutes
3 WC Word count
4 WC Word count
5 Analytic_z_score Analytic score
6 Analytic_z_score Analytic score
7 Clout_z_score Clout score
8 Clout_z_score Clout score
9 Authentic_z_score Authentic score
10 Authentic_z_score Authentic score
11 Tone_z_score Tone score
12 Tone_z_score Tone score
13 submission_time_minutes Submission time in minutes
14 submission_time_minutes Submission time in minutes
15 WC Word count
16 WC Word count
17 Analytic_z_score Analytic score
18 Analytic_z_score Analytic score
19 Clout_z_score Clout score
20 Clout_z_score Clout score
21 Authentic_z_score Authentic score
22 Authentic_z_score Authentic score
23 Tone_z_score Tone score
24 Tone_z_score Tone score
25 submission_time_minutes Submission time in minutes
26 submission_time_minutes Submission time in minutes
27 WC Word count
28 WC Word count
29 Analytic_z_score Analytic score
30 Analytic_z_score Analytic score
31 Clout_z_score Clout score
32 Clout_z_score Clout score
33 Authentic_z_score Authentic score
34 Authentic_z_score Authentic score
35 Tone_z_score Tone score
36 Tone_z_score Tone score
37 submission_time_minutes Submission time in minutes
38 submission_time_minutes Submission time in minutes
39 WC Word count
40 WC Word count
41 Analytic_z_score Analytic score
42 Analytic_z_score Analytic score
43 Clout_z_score Clout score
44 Clout_z_score Clout score
45 Authentic_z_score Authentic score
46 Authentic_z_score Authentic score
47 Tone_z_score Tone score
48 Tone_z_score Tone score
我想使用 ifelse() 命令更快地完成此任务并引用数据集,这就是我所说的 快捷方式。但是当我这样做时,它不起作用,产生了不良结果。
我首先创建了一个变量来去除列 df_1$DV_name
和 df_2$Variable_analyses
中的不可见字符。
### creates matching variables, which removes some invisible characters from data
# ---- NOTE: for df_1$DV_name, creating df_1$DV_name_for_matching
df_1$DV_name_for_matching <-
as.character(str_remove_all(df_1$DV_name, "[^A-z|0-9|[:punct:]|_|\s]"))
# ---- NOTE: for df_2$Variable_analyses, creating
df_2$Variable_analyses_for_matching <-
as.character(str_remove_all(df_2$Variable_analyses, "[^A-z|0-9|[:punct:]|_|\s]"))
然后我使用新变量 df_1$DV_name_for_matching
和 df_2$Variable_analyses_for_matching
作为 ifelse() 命令的基础:
### uses ifelse to complete matching task
df_1[["Variable_label"]] <-
ifelse(((df_1[["DV_name_for_matching"]]) == (df_2[["Variable_analyses_for_matching"]])), df_2[["Variable_label"]], NA)
这不会产生所需的输出(请参见上文)。相反,我得到这个输出:
### displays df
# ---- NOTE: displays df, quick way does not work, not desired output
df_1
我不确定为什么 快捷方式 不起作用。请告诉我如何才能快速工作。
仅供参考,我在 2013 Intel Macbook Pro 上使用 RStudio。
谢谢。
这是我用来创建 post
的代码
# creates df_1$Variable_label
# ---- NOTE: column(s) with values to be transfered - df_2$Variable_label
# ---- NOTE: column(s) for matching - df_1$DV_name, df_2$Variable_analyses
## displays data frames
df_1
df_2
## quick way
# ---- NOTE: quick way does not work
### creates matching variables, which removes some invisible characters from data
# ---- NOTE: for df_1$DV_name, creating df_1$DV_name_for_matching
df_1$DV_name_for_matching <-
as.character(str_remove_all(df_1$DV_name, "[^A-z|0-9|[:punct:]|_|\s]"))
# ---- NOTE: for df_2$Variable_analyses, creating
df_2$Variable_analyses_for_matching <-
as.character(str_remove_all(df_2$Variable_analyses, "[^A-z|0-9|[:punct:]|_|\s]"))
### uses ifelse to complete matching task
df_1[["Variable_label"]] <-
ifelse(((df_1[["DV_name_for_matching"]]) == (df_2[["Variable_analyses_for_matching"]])), df_2[["Variable_label"]], NA)
### displays df
# ---- NOTE: displays df, quick way does not work, not desired output
df_1
## long way
### creates Variable_label
# ---- NOTE: does not directly extract Variable_label from df_2 and insert it into df_1
# ---- NOTE: based on df_1$Variable_label
df_1$Variable_label <-
ifelse(df_1$DV_name == "submission_time_minutes", "Submission time in minutes",
ifelse(df_1$DV_name == "WC", "Word count",
ifelse(df_1$DV_name == "Analytic_z_score", "Analytic score",
ifelse(df_1$DV_name == "Clout_z_score", "Clout score",
ifelse(df_1$DV_name == "Authentic_z_score", "Authentic score",
ifelse(df_1$DV_name == "Tone_z_score", "Tone score", NA
))))))
### displays df
# ---- NOTE: displays df with created variable in desired output form
df_1
我相信你可以做到 left_join()
。
library(tidyverse)
left_join(df_1, df_2, by = c("DV_name" = "Variable_analyses"))
我在使用 ifelse() 命令时遇到问题,其中 2 个不同数据框中看起来相同的 2 列未被识别为相同。我可以使用任何指南来解决此问题,以便代码将数据帧相互比较并生成适当的输出,而不必自己输入 material by hand/typing 文本。
这是我的 2 个起始数据集,df_1
和 df_2
:
> df_1
DV_name
1 submission_time_minutes
2 submission_time_minutes
3 WC
4 WC
5 Analytic_z_score
6 Analytic_z_score
7 Clout_z_score
8 Clout_z_score
9 Authentic_z_score
10 Authentic_z_score
11 Tone_z_score
12 Tone_z_score
13 submission_time_minutes
14 submission_time_minutes
15 WC
16 WC
17 Analytic_z_score
18 Analytic_z_score
19 Clout_z_score
20 Clout_z_score
21 Authentic_z_score
22 Authentic_z_score
23 Tone_z_score
24 Tone_z_score
25 submission_time_minutes
26 submission_time_minutes
27 WC
28 WC
29 Analytic_z_score
30 Analytic_z_score
31 Clout_z_score
32 Clout_z_score
33 Authentic_z_score
34 Authentic_z_score
35 Tone_z_score
36 Tone_z_score
37 submission_time_minutes
38 submission_time_minutes
39 WC
40 WC
41 Analytic_z_score
42 Analytic_z_score
43 Clout_z_score
44 Clout_z_score
45 Authentic_z_score
46 Authentic_z_score
47 Tone_z_score
48 Tone_z_score
> df_2
Variable_analyses Variable_label
1 submission_time_minutes Submission time in minutes
2 WC Word count
3 Analytic_z_score Analytic score
4 Clout_z_score Clout score
5 Authentic_z_score Authentic score
6 Tone_z_score Tone score
我想创建列 df_1$Variable_label
,派生自 df_2$Variable_analyses
,基于 df_1$DV_name
和 df_2$Variable_analyses
之间的匹配 material。
这是成功的 长路:
> ## long way
>
> ### creates Variable_label
> # ---- NOTE: does not directly extract Variable_label from df_2 and insert it into df_1
> # ---- NOTE: based on df_1$Variable_label
> df_1$Variable_label <-
+ ifelse(df_1$DV_name == "submission_time_minutes", "Submission time in minutes",
+ ifelse(df_1$DV_name == "WC", "Word count",
+ ifelse(df_1$DV_name == "Analytic_z_score", "Analytic score",
+ ifelse(df_1$DV_name == "Clout_z_score", "Clout score",
+ ifelse(df_1$DV_name == "Authentic_z_score", "Authentic score",
+ ifelse(df_1$DV_name == "Tone_z_score", "Tone score", NA
+ ))))))
>
> ### displays df
> # ---- NOTE: displays df with created variable in desired output form
> df_1
DV_name Variable_label
1 submission_time_minutes Submission time in minutes
2 submission_time_minutes Submission time in minutes
3 WC Word count
4 WC Word count
5 Analytic_z_score Analytic score
6 Analytic_z_score Analytic score
7 Clout_z_score Clout score
8 Clout_z_score Clout score
9 Authentic_z_score Authentic score
10 Authentic_z_score Authentic score
11 Tone_z_score Tone score
12 Tone_z_score Tone score
13 submission_time_minutes Submission time in minutes
14 submission_time_minutes Submission time in minutes
15 WC Word count
16 WC Word count
17 Analytic_z_score Analytic score
18 Analytic_z_score Analytic score
19 Clout_z_score Clout score
20 Clout_z_score Clout score
21 Authentic_z_score Authentic score
22 Authentic_z_score Authentic score
23 Tone_z_score Tone score
24 Tone_z_score Tone score
25 submission_time_minutes Submission time in minutes
26 submission_time_minutes Submission time in minutes
27 WC Word count
28 WC Word count
29 Analytic_z_score Analytic score
30 Analytic_z_score Analytic score
31 Clout_z_score Clout score
32 Clout_z_score Clout score
33 Authentic_z_score Authentic score
34 Authentic_z_score Authentic score
35 Tone_z_score Tone score
36 Tone_z_score Tone score
37 submission_time_minutes Submission time in minutes
38 submission_time_minutes Submission time in minutes
39 WC Word count
40 WC Word count
41 Analytic_z_score Analytic score
42 Analytic_z_score Analytic score
43 Clout_z_score Clout score
44 Clout_z_score Clout score
45 Authentic_z_score Authentic score
46 Authentic_z_score Authentic score
47 Tone_z_score Tone score
48 Tone_z_score Tone score
我想使用 ifelse() 命令更快地完成此任务并引用数据集,这就是我所说的 快捷方式。但是当我这样做时,它不起作用,产生了不良结果。
我首先创建了一个变量来去除列 df_1$DV_name
和 df_2$Variable_analyses
中的不可见字符。
### creates matching variables, which removes some invisible characters from data
# ---- NOTE: for df_1$DV_name, creating df_1$DV_name_for_matching
df_1$DV_name_for_matching <-
as.character(str_remove_all(df_1$DV_name, "[^A-z|0-9|[:punct:]|_|\s]"))
# ---- NOTE: for df_2$Variable_analyses, creating
df_2$Variable_analyses_for_matching <-
as.character(str_remove_all(df_2$Variable_analyses, "[^A-z|0-9|[:punct:]|_|\s]"))
然后我使用新变量 df_1$DV_name_for_matching
和 df_2$Variable_analyses_for_matching
作为 ifelse() 命令的基础:
### uses ifelse to complete matching task
df_1[["Variable_label"]] <-
ifelse(((df_1[["DV_name_for_matching"]]) == (df_2[["Variable_analyses_for_matching"]])), df_2[["Variable_label"]], NA)
这不会产生所需的输出(请参见上文)。相反,我得到这个输出:
### displays df
# ---- NOTE: displays df, quick way does not work, not desired output
df_1
我不确定为什么 快捷方式 不起作用。请告诉我如何才能快速工作。
仅供参考,我在 2013 Intel Macbook Pro 上使用 RStudio。
谢谢。
这是我用来创建 post
的代码
# creates df_1$Variable_label
# ---- NOTE: column(s) with values to be transfered - df_2$Variable_label
# ---- NOTE: column(s) for matching - df_1$DV_name, df_2$Variable_analyses
## displays data frames
df_1
df_2
## quick way
# ---- NOTE: quick way does not work
### creates matching variables, which removes some invisible characters from data
# ---- NOTE: for df_1$DV_name, creating df_1$DV_name_for_matching
df_1$DV_name_for_matching <-
as.character(str_remove_all(df_1$DV_name, "[^A-z|0-9|[:punct:]|_|\s]"))
# ---- NOTE: for df_2$Variable_analyses, creating
df_2$Variable_analyses_for_matching <-
as.character(str_remove_all(df_2$Variable_analyses, "[^A-z|0-9|[:punct:]|_|\s]"))
### uses ifelse to complete matching task
df_1[["Variable_label"]] <-
ifelse(((df_1[["DV_name_for_matching"]]) == (df_2[["Variable_analyses_for_matching"]])), df_2[["Variable_label"]], NA)
### displays df
# ---- NOTE: displays df, quick way does not work, not desired output
df_1
## long way
### creates Variable_label
# ---- NOTE: does not directly extract Variable_label from df_2 and insert it into df_1
# ---- NOTE: based on df_1$Variable_label
df_1$Variable_label <-
ifelse(df_1$DV_name == "submission_time_minutes", "Submission time in minutes",
ifelse(df_1$DV_name == "WC", "Word count",
ifelse(df_1$DV_name == "Analytic_z_score", "Analytic score",
ifelse(df_1$DV_name == "Clout_z_score", "Clout score",
ifelse(df_1$DV_name == "Authentic_z_score", "Authentic score",
ifelse(df_1$DV_name == "Tone_z_score", "Tone score", NA
))))))
### displays df
# ---- NOTE: displays df with created variable in desired output form
df_1
我相信你可以做到 left_join()
。
library(tidyverse)
left_join(df_1, df_2, by = c("DV_name" = "Variable_analyses"))