问题:将列作为因素附加到数据框中会在 r 的附加列中创建 NA
Issue: on appending a column as factor into data frame creates NA in appended column in r
我是 r 的新手,正在尝试使用插入符来学习 ml。
问题 - 在创建 dummies
并删除 NZV variables
后,当我添加回 Y
即 predicted variable
到 df as factors
然后它在同一列 (问题的步骤 5-6) 中创建 NA
。那么如何将 Y
变量作为最终 df.
中的因素
1. data(来自 uci / kaggle 的 Bank Marketing 响应数据)
str(data)
'data.frame': 4119 obs. of 21 variables:
$ age : int 30 39 25 38 47 32 32 41 31 35 ...
$ job : Factor w/ 12 levels "admin.","blue-collar",..: 2 8 8 8 1 8 1 3 8 2 ...
$ marital : Factor w/ 4 levels "divorced","married",..: 2 3 2 2 2 3 3 2 1 2 ...
$ education : Factor w/ 8 levels "basic.4y","basic.6y",..: 3 4 4 3 7 7 7 7 6 3 ...
$ default : Factor w/ 3 levels "no","unknown",..: 1 1 1 1 1 1 1 2 1 2 ...
$ housing : Factor w/ 3 levels "no","unknown",..: 3 1 3 2 3 1 3 3 1 1 ...
$ loan : Factor w/ 3 levels "no","unknown",..: 1 1 1 2 1 1 1 1 1 1 ...
$ contact : Factor w/ 2 levels "cellular","telephone": 1 2 2 2 1 1 1 1 1 2 ...
$ month : Factor w/ 10 levels "apr","aug","dec",..: 7 7 5 5 8 10 10 8 8 7 ...
$ day_of_week : Factor w/ 5 levels "fri","mon","thu",..: 1 1 5 1 2 3 2 2 4 3 ...
$ duration : int 487 346 227 17 58 128 290 44 68 170 ...
$ campaign : int 2 4 1 3 1 3 4 2 1 1 ...
$ pdays : int 999 999 999 999 999 999 999 999 999 999 ...
$ previous : int 0 0 0 0 0 2 0 0 1 0 ...
$ poutcome : Factor w/ 3 levels "failure","nonexistent",..: 2 2 2 2 2 1 2 2 1 2 ...
$ emp.var.rate : num -1.8 1.1 1.4 1.4 -0.1 -1.1 -1.1 -0.1 -0.1 1.1 ...
$ cons.price.idx: num 92.9 94 94.5 94.5 93.2 ...
$ cons.conf.idx : num -46.2 -36.4 -41.8 -41.8 -42 -37.5 -37.5 -42 -42 -36.4 ...
$ euribor3m : num 1.31 4.86 4.96 4.96 4.19 ...
$ nr.employed : num 5099 5191 5228 5228 5196 ...
$ y : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
2.保存X&Y变量
Y = subset(data, select = y)
X = subset(data, select = -y)
dim(X)
dim(Y)
[1] 4119 20
[1] 4119 1
3. 创建了 dummies
pp_dummy <- dummyVars(y ~ ., data = data)
data <- predict(pp_dummy, newdata = data)
data <- data.frame(data)
4. 使用 接近零方差
删除了变量
nzv_list <- nearZeroVar(data) %>%
as.vector()
data <- data[, -nzv_list ]
str(data)
'data.frame': 4119 obs. of 44 variables:
$ age : num 30 39 25 38 47 32 32 41 31 35 ...
$ job.admin. : num 0 0 0 0 1 0 1 0 0 0 ...
$ job.blue.collar : num 1 0 0 0 0 0 0 0 0 1 ...
$ job.management : num 0 0 0 0 0 0 0 0 0 0 ...
$ job.services : num 0 1 1 1 0 1 0 0 1 0 ...
$ job.technician : num 0 0 0 0 0 0 0 0 0 0 ...
$ marital.divorced : num 0 0 0 0 0 0 0 0 1 0 ...
$ marital.married : num 1 0 1 1 1 0 0 1 0 1 ...
$ marital.single : num 0 1 0 0 0 1 1 0 0 0 ...
$ education.basic.4y : num 0 0 0 0 0 0 0 0 0 0 ...
$ education.basic.6y : num 0 0 0 0 0 0 0 0 0 0 ...
$ education.basic.9y : num 1 0 0 1 0 0 0 0 0 1 ...
$ education.high.school : num 0 1 1 0 0 0 0 0 0 0 ...
$ education.professional.course: num 0 0 0 0 0 0 0 0 1 0 ...
$ education.university.degree : num 0 0 0 0 1 1 1 1 0 0 ...
$ default.no : num 1 1 1 1 1 1 1 0 1 0 ...
$ default.unknown : num 0 0 0 0 0 0 0 1 0 1 ...
$ housing.no : num 0 1 0 0 0 1 0 0 1 1 ...
$ housing.yes : num 1 0 1 0 1 0 1 1 0 0 ...
$ loan.no : num 1 1 1 0 1 1 1 1 1 1 ...
$ loan.yes : num 0 0 0 0 0 0 0 0 0 0 ...
$ contact.cellular : num 1 0 0 0 1 1 1 1 1 0 ...
$ contact.telephone : num 0 1 1 1 0 0 0 0 0 1 ...
$ month.apr : num 0 0 0 0 0 0 0 0 0 0 ...
$ month.aug : num 0 0 0 0 0 0 0 0 0 0 ...
$ month.jul : num 0 0 0 0 0 0 0 0 0 0 ...
$ month.jun : num 0 0 1 1 0 0 0 0 0 0 ...
$ month.may : num 1 1 0 0 0 0 0 0 0 1 ...
$ month.nov : num 0 0 0 0 1 0 0 1 1 0 ...
$ day_of_week.fri : num 1 1 0 1 0 0 0 0 0 0 ...
$ day_of_week.mon : num 0 0 0 0 1 0 1 1 0 0 ...
$ day_of_week.thu : num 0 0 0 0 0 1 0 0 0 1 ...
$ day_of_week.tue : num 0 0 0 0 0 0 0 0 1 0 ...
$ day_of_week.wed : num 0 0 1 0 0 0 0 0 0 0 ...
$ duration : num 487 346 227 17 58 128 290 44 68 170 ...
$ campaign : num 2 4 1 3 1 3 4 2 1 1 ...
$ previous : num 0 0 0 0 0 2 0 0 1 0 ...
$ poutcome.failure : num 0 0 0 0 0 1 0 0 1 0 ...
$ poutcome.nonexistent : num 1 1 1 1 1 0 1 1 0 1 ...
$ emp.var.rate : num -1.8 1.1 1.4 1.4 -0.1 -1.1 -1.1 -0.1 -0.1 1.1 ...
$ cons.price.idx : num 92.9 94 94.5 94.5 93.2 ...
$ cons.conf.idx : num -46.2 -36.4 -41.8 -41.8 -42 -37.5 -37.5 -42 -42 -36.4 ...
$ euribor3m : num 1.31 4.86 4.96 4.96 4.19 ...
$ nr.employed : num 5099 5191 5228 5228 5196 ...
5. ISSUE:在 appending y
上,数据 as factor
在 col.
中产生 NA
data$y <- as.factor(Y)
str(data)
'data.frame': 4119 obs. of 45 variables:
$ age : num 30 39 25 38 47 32 32 41 31 35 ...
$ job.admin. : num 0 0 0 0 1 0 1 0 0 0 ...
$ job.blue.collar : num 1 0 0 0 0 0 0 0 0 1 ...
$ job.management : num 0 0 0 0 0 0 0 0 0 0 ...
$ job.services : num 0 1 1 1 0 1 0 0 1 0 ...
$ job.technician : num 0 0 0 0 0 0 0 0 0 0 ...
$ marital.divorced : num 0 0 0 0 0 0 0 0 1 0 ...
$ marital.married : num 1 0 1 1 1 0 0 1 0 1 ...
$ marital.single : num 0 1 0 0 0 1 1 0 0 0 ...
$ education.basic.4y : num 0 0 0 0 0 0 0 0 0 0 ...
$ education.basic.6y : num 0 0 0 0 0 0 0 0 0 0 ...
$ education.basic.9y : num 1 0 0 1 0 0 0 0 0 1 ...
$ education.high.school : num 0 1 1 0 0 0 0 0 0 0 ...
$ education.professional.course: num 0 0 0 0 0 0 0 0 1 0 ...
$ education.university.degree : num 0 0 0 0 1 1 1 1 0 0 ...
$ default.no : num 1 1 1 1 1 1 1 0 1 0 ...
$ default.unknown : num 0 0 0 0 0 0 0 1 0 1 ...
$ housing.no : num 0 1 0 0 0 1 0 0 1 1 ...
$ housing.yes : num 1 0 1 0 1 0 1 1 0 0 ...
$ loan.no : num 1 1 1 0 1 1 1 1 1 1 ...
$ loan.yes : num 0 0 0 0 0 0 0 0 0 0 ...
$ contact.cellular : num 1 0 0 0 1 1 1 1 1 0 ...
$ contact.telephone : num 0 1 1 1 0 0 0 0 0 1 ...
$ month.apr : num 0 0 0 0 0 0 0 0 0 0 ...
$ month.aug : num 0 0 0 0 0 0 0 0 0 0 ...
$ month.jul : num 0 0 0 0 0 0 0 0 0 0 ...
$ month.jun : num 0 0 1 1 0 0 0 0 0 0 ...
$ month.may : num 1 1 0 0 0 0 0 0 0 1 ...
$ month.nov : num 0 0 0 0 1 0 0 1 1 0 ...
$ day_of_week.fri : num 1 1 0 1 0 0 0 0 0 0 ...
$ day_of_week.mon : num 0 0 0 0 1 0 1 1 0 0 ...
$ day_of_week.thu : num 0 0 0 0 0 1 0 0 0 1 ...
$ day_of_week.tue : num 0 0 0 0 0 0 0 0 1 0 ...
$ day_of_week.wed : num 0 0 1 0 0 0 0 0 0 0 ...
$ duration : num 487 346 227 17 58 128 290 44 68 170 ...
$ campaign : num 2 4 1 3 1 3 4 2 1 1 ...
$ previous : num 0 0 0 0 0 2 0 0 1 0 ...
$ poutcome.failure : num 0 0 0 0 0 1 0 0 1 0 ...
$ poutcome.nonexistent : num 1 1 1 1 1 0 1 1 0 1 ...
$ emp.var.rate : num -1.8 1.1 1.4 1.4 -0.1 -1.1 -1.1 -0.1 -0.1 1.1 ...
$ cons.price.idx : num 92.9 94 94.5 94.5 93.2 ...
$ cons.conf.idx : num -46.2 -36.4 -41.8 -41.8 -42 -37.5 -37.5 -42 -42 -36.4 ...
$ euribor3m : num 1.31 4.86 4.96 4.96 4.19 ...
$ nr.employed : num 5099 5191 5228 5228 5196 ...
$ y : Factor w/ 1 level "1:2": NA NA NA NA NA NA NA NA NA NA ...
6. 如果我按原样附加 Y
那么它不会立即创建 NA
但是当我将其转换为 factor
然后它给出 NA
data$y <- Y # as.factor(Y)
data <- data %>% mutate(y = as.factor(y))
str(data)
(更新)
7. 如果我不把它转换成 factor
那么我总是必须使用 pull(data$y)
而不是只使用 data$y
.示例如下:
subsets <- c(7, 10, 12, 15, 20)
control <- rfeControl(functions = rfFuncs, method = "cv", verbose = FALSE)
system.time(
RFE_res <- rfe(x = data[, 1:44], # subset(train, select = -y)
y = pull(data$y),
sizes = subsets,
rfeControl = control
)
)
如何避免使用 pull(data$y)
而只使用 data$y
?
与pull()
无关。
您无法将 data.frame 转换为向量,即使只有 1 列也是如此:
X = subset(iris,select=-Species)
Y = subset(iris,select=Species)
as.factor(Y)
Species
<NA>
Levels: 1:3
.valid.factor(Y)
[1] "factor levels must be \"character\""
levels(Y)
NULL
你需要把data.frame的列调用出来:
X$y = as.factor(Y$Species)
# or X %>% mutate(y = as.factor(Y$Species))
> str(X)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ y : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
我是 r 的新手,正在尝试使用插入符来学习 ml。
问题 - 在创建 dummies
并删除 NZV variables
后,当我添加回 Y
即 predicted variable
到 df as factors
然后它在同一列 (问题的步骤 5-6) 中创建 NA
。那么如何将 Y
变量作为最终 df.
1. data(来自 uci / kaggle 的 Bank Marketing 响应数据)
str(data)
'data.frame': 4119 obs. of 21 variables:
$ age : int 30 39 25 38 47 32 32 41 31 35 ...
$ job : Factor w/ 12 levels "admin.","blue-collar",..: 2 8 8 8 1 8 1 3 8 2 ...
$ marital : Factor w/ 4 levels "divorced","married",..: 2 3 2 2 2 3 3 2 1 2 ...
$ education : Factor w/ 8 levels "basic.4y","basic.6y",..: 3 4 4 3 7 7 7 7 6 3 ...
$ default : Factor w/ 3 levels "no","unknown",..: 1 1 1 1 1 1 1 2 1 2 ...
$ housing : Factor w/ 3 levels "no","unknown",..: 3 1 3 2 3 1 3 3 1 1 ...
$ loan : Factor w/ 3 levels "no","unknown",..: 1 1 1 2 1 1 1 1 1 1 ...
$ contact : Factor w/ 2 levels "cellular","telephone": 1 2 2 2 1 1 1 1 1 2 ...
$ month : Factor w/ 10 levels "apr","aug","dec",..: 7 7 5 5 8 10 10 8 8 7 ...
$ day_of_week : Factor w/ 5 levels "fri","mon","thu",..: 1 1 5 1 2 3 2 2 4 3 ...
$ duration : int 487 346 227 17 58 128 290 44 68 170 ...
$ campaign : int 2 4 1 3 1 3 4 2 1 1 ...
$ pdays : int 999 999 999 999 999 999 999 999 999 999 ...
$ previous : int 0 0 0 0 0 2 0 0 1 0 ...
$ poutcome : Factor w/ 3 levels "failure","nonexistent",..: 2 2 2 2 2 1 2 2 1 2 ...
$ emp.var.rate : num -1.8 1.1 1.4 1.4 -0.1 -1.1 -1.1 -0.1 -0.1 1.1 ...
$ cons.price.idx: num 92.9 94 94.5 94.5 93.2 ...
$ cons.conf.idx : num -46.2 -36.4 -41.8 -41.8 -42 -37.5 -37.5 -42 -42 -36.4 ...
$ euribor3m : num 1.31 4.86 4.96 4.96 4.19 ...
$ nr.employed : num 5099 5191 5228 5228 5196 ...
$ y : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
2.保存X&Y变量
Y = subset(data, select = y)
X = subset(data, select = -y)
dim(X)
dim(Y)
[1] 4119 20
[1] 4119 1
3. 创建了 dummies
pp_dummy <- dummyVars(y ~ ., data = data)
data <- predict(pp_dummy, newdata = data)
data <- data.frame(data)
4. 使用 接近零方差
删除了变量nzv_list <- nearZeroVar(data) %>%
as.vector()
data <- data[, -nzv_list ]
str(data)
'data.frame': 4119 obs. of 44 variables:
$ age : num 30 39 25 38 47 32 32 41 31 35 ...
$ job.admin. : num 0 0 0 0 1 0 1 0 0 0 ...
$ job.blue.collar : num 1 0 0 0 0 0 0 0 0 1 ...
$ job.management : num 0 0 0 0 0 0 0 0 0 0 ...
$ job.services : num 0 1 1 1 0 1 0 0 1 0 ...
$ job.technician : num 0 0 0 0 0 0 0 0 0 0 ...
$ marital.divorced : num 0 0 0 0 0 0 0 0 1 0 ...
$ marital.married : num 1 0 1 1 1 0 0 1 0 1 ...
$ marital.single : num 0 1 0 0 0 1 1 0 0 0 ...
$ education.basic.4y : num 0 0 0 0 0 0 0 0 0 0 ...
$ education.basic.6y : num 0 0 0 0 0 0 0 0 0 0 ...
$ education.basic.9y : num 1 0 0 1 0 0 0 0 0 1 ...
$ education.high.school : num 0 1 1 0 0 0 0 0 0 0 ...
$ education.professional.course: num 0 0 0 0 0 0 0 0 1 0 ...
$ education.university.degree : num 0 0 0 0 1 1 1 1 0 0 ...
$ default.no : num 1 1 1 1 1 1 1 0 1 0 ...
$ default.unknown : num 0 0 0 0 0 0 0 1 0 1 ...
$ housing.no : num 0 1 0 0 0 1 0 0 1 1 ...
$ housing.yes : num 1 0 1 0 1 0 1 1 0 0 ...
$ loan.no : num 1 1 1 0 1 1 1 1 1 1 ...
$ loan.yes : num 0 0 0 0 0 0 0 0 0 0 ...
$ contact.cellular : num 1 0 0 0 1 1 1 1 1 0 ...
$ contact.telephone : num 0 1 1 1 0 0 0 0 0 1 ...
$ month.apr : num 0 0 0 0 0 0 0 0 0 0 ...
$ month.aug : num 0 0 0 0 0 0 0 0 0 0 ...
$ month.jul : num 0 0 0 0 0 0 0 0 0 0 ...
$ month.jun : num 0 0 1 1 0 0 0 0 0 0 ...
$ month.may : num 1 1 0 0 0 0 0 0 0 1 ...
$ month.nov : num 0 0 0 0 1 0 0 1 1 0 ...
$ day_of_week.fri : num 1 1 0 1 0 0 0 0 0 0 ...
$ day_of_week.mon : num 0 0 0 0 1 0 1 1 0 0 ...
$ day_of_week.thu : num 0 0 0 0 0 1 0 0 0 1 ...
$ day_of_week.tue : num 0 0 0 0 0 0 0 0 1 0 ...
$ day_of_week.wed : num 0 0 1 0 0 0 0 0 0 0 ...
$ duration : num 487 346 227 17 58 128 290 44 68 170 ...
$ campaign : num 2 4 1 3 1 3 4 2 1 1 ...
$ previous : num 0 0 0 0 0 2 0 0 1 0 ...
$ poutcome.failure : num 0 0 0 0 0 1 0 0 1 0 ...
$ poutcome.nonexistent : num 1 1 1 1 1 0 1 1 0 1 ...
$ emp.var.rate : num -1.8 1.1 1.4 1.4 -0.1 -1.1 -1.1 -0.1 -0.1 1.1 ...
$ cons.price.idx : num 92.9 94 94.5 94.5 93.2 ...
$ cons.conf.idx : num -46.2 -36.4 -41.8 -41.8 -42 -37.5 -37.5 -42 -42 -36.4 ...
$ euribor3m : num 1.31 4.86 4.96 4.96 4.19 ...
$ nr.employed : num 5099 5191 5228 5228 5196 ...
5. ISSUE:在 appending y
上,数据 as factor
在 col.
NA
data$y <- as.factor(Y)
str(data)
'data.frame': 4119 obs. of 45 variables:
$ age : num 30 39 25 38 47 32 32 41 31 35 ...
$ job.admin. : num 0 0 0 0 1 0 1 0 0 0 ...
$ job.blue.collar : num 1 0 0 0 0 0 0 0 0 1 ...
$ job.management : num 0 0 0 0 0 0 0 0 0 0 ...
$ job.services : num 0 1 1 1 0 1 0 0 1 0 ...
$ job.technician : num 0 0 0 0 0 0 0 0 0 0 ...
$ marital.divorced : num 0 0 0 0 0 0 0 0 1 0 ...
$ marital.married : num 1 0 1 1 1 0 0 1 0 1 ...
$ marital.single : num 0 1 0 0 0 1 1 0 0 0 ...
$ education.basic.4y : num 0 0 0 0 0 0 0 0 0 0 ...
$ education.basic.6y : num 0 0 0 0 0 0 0 0 0 0 ...
$ education.basic.9y : num 1 0 0 1 0 0 0 0 0 1 ...
$ education.high.school : num 0 1 1 0 0 0 0 0 0 0 ...
$ education.professional.course: num 0 0 0 0 0 0 0 0 1 0 ...
$ education.university.degree : num 0 0 0 0 1 1 1 1 0 0 ...
$ default.no : num 1 1 1 1 1 1 1 0 1 0 ...
$ default.unknown : num 0 0 0 0 0 0 0 1 0 1 ...
$ housing.no : num 0 1 0 0 0 1 0 0 1 1 ...
$ housing.yes : num 1 0 1 0 1 0 1 1 0 0 ...
$ loan.no : num 1 1 1 0 1 1 1 1 1 1 ...
$ loan.yes : num 0 0 0 0 0 0 0 0 0 0 ...
$ contact.cellular : num 1 0 0 0 1 1 1 1 1 0 ...
$ contact.telephone : num 0 1 1 1 0 0 0 0 0 1 ...
$ month.apr : num 0 0 0 0 0 0 0 0 0 0 ...
$ month.aug : num 0 0 0 0 0 0 0 0 0 0 ...
$ month.jul : num 0 0 0 0 0 0 0 0 0 0 ...
$ month.jun : num 0 0 1 1 0 0 0 0 0 0 ...
$ month.may : num 1 1 0 0 0 0 0 0 0 1 ...
$ month.nov : num 0 0 0 0 1 0 0 1 1 0 ...
$ day_of_week.fri : num 1 1 0 1 0 0 0 0 0 0 ...
$ day_of_week.mon : num 0 0 0 0 1 0 1 1 0 0 ...
$ day_of_week.thu : num 0 0 0 0 0 1 0 0 0 1 ...
$ day_of_week.tue : num 0 0 0 0 0 0 0 0 1 0 ...
$ day_of_week.wed : num 0 0 1 0 0 0 0 0 0 0 ...
$ duration : num 487 346 227 17 58 128 290 44 68 170 ...
$ campaign : num 2 4 1 3 1 3 4 2 1 1 ...
$ previous : num 0 0 0 0 0 2 0 0 1 0 ...
$ poutcome.failure : num 0 0 0 0 0 1 0 0 1 0 ...
$ poutcome.nonexistent : num 1 1 1 1 1 0 1 1 0 1 ...
$ emp.var.rate : num -1.8 1.1 1.4 1.4 -0.1 -1.1 -1.1 -0.1 -0.1 1.1 ...
$ cons.price.idx : num 92.9 94 94.5 94.5 93.2 ...
$ cons.conf.idx : num -46.2 -36.4 -41.8 -41.8 -42 -37.5 -37.5 -42 -42 -36.4 ...
$ euribor3m : num 1.31 4.86 4.96 4.96 4.19 ...
$ nr.employed : num 5099 5191 5228 5228 5196 ...
$ y : Factor w/ 1 level "1:2": NA NA NA NA NA NA NA NA NA NA ...
6. 如果我按原样附加 Y
那么它不会立即创建 NA
但是当我将其转换为 factor
然后它给出 NA
data$y <- Y # as.factor(Y)
data <- data %>% mutate(y = as.factor(y))
str(data)
(更新)
7. 如果我不把它转换成 factor
那么我总是必须使用 pull(data$y)
而不是只使用 data$y
.示例如下:
subsets <- c(7, 10, 12, 15, 20)
control <- rfeControl(functions = rfFuncs, method = "cv", verbose = FALSE)
system.time(
RFE_res <- rfe(x = data[, 1:44], # subset(train, select = -y)
y = pull(data$y),
sizes = subsets,
rfeControl = control
)
)
如何避免使用 pull(data$y)
而只使用 data$y
?
与pull()
无关。
您无法将 data.frame 转换为向量,即使只有 1 列也是如此:
X = subset(iris,select=-Species)
Y = subset(iris,select=Species)
as.factor(Y)
Species
<NA>
Levels: 1:3
.valid.factor(Y)
[1] "factor levels must be \"character\""
levels(Y)
NULL
你需要把data.frame的列调用出来:
X$y = as.factor(Y$Species)
# or X %>% mutate(y = as.factor(Y$Species))
> str(X)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ y : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...