data_GAN R 中的逻辑回归
data_GAN Logistic Regression in R
我一直在阅读 R 中的逻辑回归。当 columns/variables 确实有意义时,它就有意义了。我的列是 A、B 和 C。C 列只有 1 和 0。我如何对如此有限的数据集进行回归?任何指导或阅读资源都将不胜感激。
> library(Amelia)
> library(mlbench)
> library(dplyr)
> my_data<-read.csv("/Users/morenikeirving/GAN/data_GAN.csv")
> names(my_data)
[1] "A" "B" "C"
> head(my_data)
A B C
1 4.4189 69.580 NA
2 13.2019 61.250 NA
3 25.6290 56.740 1
4 22.2943 68.860 1
5 0.2163 57.690 NA
6 0.2875 72.914 NA
> summary(my_data)
A B C
Min. : 0.000 Min. :33.00 Min. :1
1st Qu.: 1.226 1st Qu.:59.69 1st Qu.:1
Median : 5.897 Median :61.87 Median :1
Mean : 7.450 Mean :65.40 Mean :1
3rd Qu.:12.600 3rd Qu.:69.58 3rd Qu.:1
Max. :25.800 Max. :95.00 Max. :1
NA's :2923
> missmap(my_data, col=c("blue", "red"), legend=FALSE)
> my_data<-my_data %>% mutate(C = ifelse(is.na(C),0,C))
> missmap(my_data, col=c("blue", "red"), legend=FALSE)
> model <-glm(x~., data=my_data, family= binomial)
Error in eval(predvars, data, env) : object 'x' not found
> #Library to read in xls file
> library(Amelia)
> library(mlbench)
> library(dplyr)
>
> #Read in csv file
> my_data<-read.csv("/Users/GAN/data_GAN.csv")
>
> #Exploring Data
> #see what's on the data frame
> names(my_data)
[1] "A" "B" "C"
>
> #Look at first few rows of the data
> head(my_data)
A B C
1 4.4189 69.580 NA
2 13.2019 61.250 NA
3 25.6290 56.740 1
4 22.2943 68.860 1
5 0.2163 57.690 NA
6 0.2875 72.914 NA
>
> #Overall picture of data; looking at first few rows revealed missing data
> summary(my_data)
A B C
Min. : 0.000 Min. :33.00 Min. :1
1st Qu.: 1.226 1st Qu.:59.69 1st Qu.:1
Median : 5.897 Median :61.87 Median :1
Mean : 7.450 Mean :65.40 Mean :1
3rd Qu.:12.600 3rd Qu.:69.58 3rd Qu.:1
Max. :25.800 Max. :95.00 Max. :1
NA's :2923
> #lots of NAs
>
> #Examine missing data
>
> missmap(my_data, col=c("blue", "red"), legend=FALSE)
>
> #Replace N/A
>
> my_data<-my_data %>% mutate(C = ifelse(is.na(C),0,C))
>
> #Check to make sure missing values are resolved
> missmap(my_data, col=c("blue", "red"), legend=FALSE)
(1) 你问逻辑回归代码怎么写?
或者 (2) 您是在问如何提高数据集的质量?
(1) https://stats.idre.ucla.edu/r/dae/logit-regression/
模型<-glm(C~A+B,数据=my_data,家庭=“二项式”)
在真实环境中,您的数据应该有意义。但在训练实践数据集中,variables/columns 的名称无关紧要。重要的是您的数据适合用于您的模型(例如,线性回归要求您的结果是一个连续变量;逻辑回归倾向于使用二元结果,例如您的 C 列)
(2) 如果您的数据集较小且数据质量较低,那么除了获取新数据集或收集更多数据外,您无能为力。
您可以考虑重新采样,但这并不总是适用,并且在使用时有其自身的一系列问题
我一直在阅读 R 中的逻辑回归。当 columns/variables 确实有意义时,它就有意义了。我的列是 A、B 和 C。C 列只有 1 和 0。我如何对如此有限的数据集进行回归?任何指导或阅读资源都将不胜感激。
> library(Amelia)
> library(mlbench)
> library(dplyr)
> my_data<-read.csv("/Users/morenikeirving/GAN/data_GAN.csv")
> names(my_data)
[1] "A" "B" "C"
> head(my_data)
A B C
1 4.4189 69.580 NA
2 13.2019 61.250 NA
3 25.6290 56.740 1
4 22.2943 68.860 1
5 0.2163 57.690 NA
6 0.2875 72.914 NA
> summary(my_data)
A B C
Min. : 0.000 Min. :33.00 Min. :1
1st Qu.: 1.226 1st Qu.:59.69 1st Qu.:1
Median : 5.897 Median :61.87 Median :1
Mean : 7.450 Mean :65.40 Mean :1
3rd Qu.:12.600 3rd Qu.:69.58 3rd Qu.:1
Max. :25.800 Max. :95.00 Max. :1
NA's :2923
> missmap(my_data, col=c("blue", "red"), legend=FALSE)
> my_data<-my_data %>% mutate(C = ifelse(is.na(C),0,C))
> missmap(my_data, col=c("blue", "red"), legend=FALSE)
> model <-glm(x~., data=my_data, family= binomial)
Error in eval(predvars, data, env) : object 'x' not found
> #Library to read in xls file
> library(Amelia)
> library(mlbench)
> library(dplyr)
>
> #Read in csv file
> my_data<-read.csv("/Users/GAN/data_GAN.csv")
>
> #Exploring Data
> #see what's on the data frame
> names(my_data)
[1] "A" "B" "C"
>
> #Look at first few rows of the data
> head(my_data)
A B C
1 4.4189 69.580 NA
2 13.2019 61.250 NA
3 25.6290 56.740 1
4 22.2943 68.860 1
5 0.2163 57.690 NA
6 0.2875 72.914 NA
>
> #Overall picture of data; looking at first few rows revealed missing data
> summary(my_data)
A B C
Min. : 0.000 Min. :33.00 Min. :1
1st Qu.: 1.226 1st Qu.:59.69 1st Qu.:1
Median : 5.897 Median :61.87 Median :1
Mean : 7.450 Mean :65.40 Mean :1
3rd Qu.:12.600 3rd Qu.:69.58 3rd Qu.:1
Max. :25.800 Max. :95.00 Max. :1
NA's :2923
> #lots of NAs
>
> #Examine missing data
>
> missmap(my_data, col=c("blue", "red"), legend=FALSE)
>
> #Replace N/A
>
> my_data<-my_data %>% mutate(C = ifelse(is.na(C),0,C))
>
> #Check to make sure missing values are resolved
> missmap(my_data, col=c("blue", "red"), legend=FALSE)
(1) 你问逻辑回归代码怎么写? 或者 (2) 您是在问如何提高数据集的质量?
(1) https://stats.idre.ucla.edu/r/dae/logit-regression/
模型<-glm(C~A+B,数据=my_data,家庭=“二项式”)
在真实环境中,您的数据应该有意义。但在训练实践数据集中,variables/columns 的名称无关紧要。重要的是您的数据适合用于您的模型(例如,线性回归要求您的结果是一个连续变量;逻辑回归倾向于使用二元结果,例如您的 C 列)
(2) 如果您的数据集较小且数据质量较低,那么除了获取新数据集或收集更多数据外,您无能为力。
您可以考虑重新采样,但这并不总是适用,并且在使用时有其自身的一系列问题