为什么我不能从 uci 导入以下数据集
why i can't import the following dataset from uci
下午好,
假设我们有以下函数:
data_preprocessing<-function(link,drop_last_column=TRUE){
link=as.character(link)
DT <- data.table::fread(link,
fill = TRUE,
na.strings = "?")
DT=DT[-1,]
DT=as.data.frame(DT)
if(drop_last_column==TRUE){
DT=as.data.frame(DT)[,-ncol(DT)]
}
return(DT)
}
当我尝试从 uci
导入 acute
数据集时,出现以下错误:
acute=data_preprocessing("https://archive.ics.uci.edu/ml/machine-learning-databases/acute/diagnosis.data")
[100%] Downloaded 7276 bytes...
Error in data.table::fread(link, fill = TRUE, na.strings = "?") :
File is encoded in UTF-16, this encoding is not supported by fread(). Please recode the file to UTF-8.
我也试过了:
acute=read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/acute/diagnosis.data")
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 2 appears to contain embedded nulls
3: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 3 appears to contain embedded nulls
4: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 4 appears to contain embedded nulls
5: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 5 appears to contain embedded nulls
6: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
embedded nul(s) found in input
感谢您的帮助!
改用 read.table 和适当的编码。
data_preprocessing<-function(link,drop_last_column=TRUE){
link=as.character(link)
DT <- read.table(link,
fileEncoding="UTF-16",
fill = TRUE,
na.strings = "?")
DT=DT[-1,]
DT=as.data.frame(DT)
if(drop_last_column==TRUE){
DT=as.data.frame(DT)[,-ncol(DT)]
}
return(DT)
}
acute=data_preprocessing("https://archive.ics.uci.edu/ml/machine-learning-databases/acute/diagnosis.data")
head(acute)
V1 V2 V3 V4 V5 V6 V7
2 35,9 no no yes yes yes yes
3 35,9 no yes no no no no
4 36,0 no no yes yes yes yes
5 36,0 no yes no no no no
6 36,0 no yes no no no no
7 36,2 no no yes yes yes yes
编辑:
要自动查找数据文件中使用的编码,可以使用 readr 包中的 guess_encoding 函数。
data_preprocessing<-function(link,drop_last_column=TRUE){
link=as.character(link)
enc_guess <- readr::guess_encoding(link)
enc <- enc_guess[enc_guess$confidence == max(enc_guess$confidence),]$encoding
DT <- read.table(link,
fileEncoding = enc,
fill = TRUE,
na.strings = "?")
DT=DT[-1,]
DT=as.data.frame(DT)
if(drop_last_column==TRUE){
DT=as.data.frame(DT)[,-ncol(DT)]
}
return(DT)
}
下午好,
假设我们有以下函数:
data_preprocessing<-function(link,drop_last_column=TRUE){
link=as.character(link)
DT <- data.table::fread(link,
fill = TRUE,
na.strings = "?")
DT=DT[-1,]
DT=as.data.frame(DT)
if(drop_last_column==TRUE){
DT=as.data.frame(DT)[,-ncol(DT)]
}
return(DT)
}
当我尝试从 uci
导入 acute
数据集时,出现以下错误:
acute=data_preprocessing("https://archive.ics.uci.edu/ml/machine-learning-databases/acute/diagnosis.data")
[100%] Downloaded 7276 bytes...
Error in data.table::fread(link, fill = TRUE, na.strings = "?") :
File is encoded in UTF-16, this encoding is not supported by fread(). Please recode the file to UTF-8.
我也试过了:
acute=read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/acute/diagnosis.data")
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 2 appears to contain embedded nulls
3: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 3 appears to contain embedded nulls
4: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 4 appears to contain embedded nulls
5: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 5 appears to contain embedded nulls
6: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
embedded nul(s) found in input
感谢您的帮助!
改用 read.table 和适当的编码。
data_preprocessing<-function(link,drop_last_column=TRUE){
link=as.character(link)
DT <- read.table(link,
fileEncoding="UTF-16",
fill = TRUE,
na.strings = "?")
DT=DT[-1,]
DT=as.data.frame(DT)
if(drop_last_column==TRUE){
DT=as.data.frame(DT)[,-ncol(DT)]
}
return(DT)
}
acute=data_preprocessing("https://archive.ics.uci.edu/ml/machine-learning-databases/acute/diagnosis.data")
head(acute)
V1 V2 V3 V4 V5 V6 V7
2 35,9 no no yes yes yes yes
3 35,9 no yes no no no no
4 36,0 no no yes yes yes yes
5 36,0 no yes no no no no
6 36,0 no yes no no no no
7 36,2 no no yes yes yes yes
编辑: 要自动查找数据文件中使用的编码,可以使用 readr 包中的 guess_encoding 函数。
data_preprocessing<-function(link,drop_last_column=TRUE){
link=as.character(link)
enc_guess <- readr::guess_encoding(link)
enc <- enc_guess[enc_guess$confidence == max(enc_guess$confidence),]$encoding
DT <- read.table(link,
fileEncoding = enc,
fill = TRUE,
na.strings = "?")
DT=DT[-1,]
DT=as.data.frame(DT)
if(drop_last_column==TRUE){
DT=as.data.frame(DT)[,-ncol(DT)]
}
return(DT)
}