将e1071 matchControls的结果按行号添加回原始数据
Add results of e1071 matchControls back to the original data by row number
我想使用我的数据子集进行 1:1 匹配,然后将输出代码作为新列添加到我的原始数据中。这是一个使用示例数据的工作示例:
mydata <- iris
dfrm <- subset(mydata, mydata$Petal.Length>4)
library(e1071)
m <- matchControls(Species~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width,
data = dfrm, caselabel = "versicolor", contlabel = "virginica")
输出中包含原始行号,我想在附加到原始数据时使用它。
m$factor
# 51 52 53 55 56 57 59 62 64 66 67 68 69 71 73 74 75 76 77
# case case case case case case case case case case case case case case case case case case case
# 78 79 84 85 86 87 88 89 91 92 95 96 97 98 100 101 102 103 104
# case case case case case case case case case case case case case case case <NA> cont <NA> cont
# 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123
# cont <NA> cont <NA> cont <NA> cont cont cont cont cont cont cont <NA> <NA> cont <NA> cont <NA>
# 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142
# cont cont <NA> cont cont cont <NA> <NA> <NA> cont cont cont <NA> cont cont cont cont cont cont
# 143 144 145 146 147 148 149 150
# cont <NA> <NA> cont cont cont cont cont
当我尝试将其作为新列直接添加到原始数据时,由于行号不同而收到错误消息:
mydata$output <- m$factor
# Error in `$<-.data.frame`(`*tmp*`, output, value = c(1L, 1L, 1L, 1L, 1L, :
# replacement has 84 rows, data has 150
我的搜索尝试失败了,也许是因为我不知道如何用正确的术语来描述我的问题。我尝试了 "merge dataframes by rows" 等,以及 did not seem relevant. Some auto-suggested duplicates like this one are about adding aggregate results back to the original data, which is not the case here. I tried using join
based on this answer,但我不知道如何将参数 by
定义为行号,而不是实际变量。
library(dplyr)
left_join(mydata, as.data.frame(m$factor), by=NULL)
# Error: `by` required, because the data sources have no common variables
我尝试了 cbind,但由于行号不同,它也会抛出错误。
cbind(mydata, m$factor)
cbind(mydata, as.data.frame(m$factor))
# Error in data.frame(..., check.names = FALSE) :
# arguments imply differing number of rows: 150, 84
我错过了什么?谢谢。
您必须创建一个变量才能加入...
下面我使用了行名...
library(dplyr)
left_join(mydata %>% mutate( rownumber = rownames(.) ),
as.data.frame(m$factor) %>% mutate( rownumber = rownames(.) ),
by = "rownumber" )
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species rownumber m$factor
# 1 5.1 3.5 1.4 0.2 setosa 1 <NA>
# 2 4.9 3.0 1.4 0.2 setosa 2 <NA>
# 3 4.7 3.2 1.3 0.2 setosa 3 <NA>
# ...
# 96 5.7 3.0 4.2 1.2 versicolor 96 case
# 97 5.7 2.9 4.2 1.3 versicolor 97 case
# 98 6.2 2.9 4.3 1.3 versicolor 98 case
# 99 5.1 2.5 3.0 1.1 versicolor 99 <NA>
# 100 5.7 2.8 4.1 1.3 versicolor 100 case
# 101 6.3 3.3 6.0 2.5 virginica 101 <NA>
# 102 5.8 2.7 5.1 1.9 virginica 102 cont
# 103 7.1 3.0 5.9 2.1 virginica 103 <NA>
# 104 6.3 2.9 5.6 1.8 virginica 104 cont
我想使用我的数据子集进行 1:1 匹配,然后将输出代码作为新列添加到我的原始数据中。这是一个使用示例数据的工作示例:
mydata <- iris
dfrm <- subset(mydata, mydata$Petal.Length>4)
library(e1071)
m <- matchControls(Species~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width,
data = dfrm, caselabel = "versicolor", contlabel = "virginica")
输出中包含原始行号,我想在附加到原始数据时使用它。
m$factor
# 51 52 53 55 56 57 59 62 64 66 67 68 69 71 73 74 75 76 77
# case case case case case case case case case case case case case case case case case case case
# 78 79 84 85 86 87 88 89 91 92 95 96 97 98 100 101 102 103 104
# case case case case case case case case case case case case case case case <NA> cont <NA> cont
# 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123
# cont <NA> cont <NA> cont <NA> cont cont cont cont cont cont cont <NA> <NA> cont <NA> cont <NA>
# 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142
# cont cont <NA> cont cont cont <NA> <NA> <NA> cont cont cont <NA> cont cont cont cont cont cont
# 143 144 145 146 147 148 149 150
# cont <NA> <NA> cont cont cont cont cont
当我尝试将其作为新列直接添加到原始数据时,由于行号不同而收到错误消息:
mydata$output <- m$factor
# Error in `$<-.data.frame`(`*tmp*`, output, value = c(1L, 1L, 1L, 1L, 1L, :
# replacement has 84 rows, data has 150
我的搜索尝试失败了,也许是因为我不知道如何用正确的术语来描述我的问题。我尝试了 "merge dataframes by rows" 等,以及 join
based on this answer,但我不知道如何将参数 by
定义为行号,而不是实际变量。
library(dplyr)
left_join(mydata, as.data.frame(m$factor), by=NULL)
# Error: `by` required, because the data sources have no common variables
我尝试了 cbind,但由于行号不同,它也会抛出错误。
cbind(mydata, m$factor)
cbind(mydata, as.data.frame(m$factor))
# Error in data.frame(..., check.names = FALSE) :
# arguments imply differing number of rows: 150, 84
我错过了什么?谢谢。
您必须创建一个变量才能加入... 下面我使用了行名...
library(dplyr)
left_join(mydata %>% mutate( rownumber = rownames(.) ),
as.data.frame(m$factor) %>% mutate( rownumber = rownames(.) ),
by = "rownumber" )
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species rownumber m$factor
# 1 5.1 3.5 1.4 0.2 setosa 1 <NA>
# 2 4.9 3.0 1.4 0.2 setosa 2 <NA>
# 3 4.7 3.2 1.3 0.2 setosa 3 <NA>
# ...
# 96 5.7 3.0 4.2 1.2 versicolor 96 case
# 97 5.7 2.9 4.2 1.3 versicolor 97 case
# 98 6.2 2.9 4.3 1.3 versicolor 98 case
# 99 5.1 2.5 3.0 1.1 versicolor 99 <NA>
# 100 5.7 2.8 4.1 1.3 versicolor 100 case
# 101 6.3 3.3 6.0 2.5 virginica 101 <NA>
# 102 5.8 2.7 5.1 1.9 virginica 102 cont
# 103 7.1 3.0 5.9 2.1 virginica 103 <NA>
# 104 6.3 2.9 5.6 1.8 virginica 104 cont