使用 read.table() 导入带有数字的长字符字段
Importing long character field with numbers with read.table()
我尝试导入一个大型数据集,其中有一列表示文档编号。该字段包含一个带前导零的 25 位数字。
我尝试使用 read.table() 导入数据,但即使在导入期间将 "character" 分配为 class 时,此特定字段始终为“1e+19”。
# import elyte
colnames<-c("patnr","name","birthday","sex","casenr","Bew","Art","docnr","date","time","none","Na","K","Cl","Ca","corCa")
classes <- rep("character",length(colnames))
ELYTE <- read.table(file="ELYTE.TXT",skip=3,comment.char="",sep="|",col.names=colnames, header=FALSE, colClasses=classes)
原始数据是这样的:
0010000005|Weber|19091220|1|0000337340|00000|LAB|000001000000000000011524|20000127|084800||140|3.7|100|2.1|
0010000005|Weber|19091220|1|0000337340|00000|LAB|000001000000000000011541|20000127|080200||||||
0010000005|Weber|19091220|1|0000337340|00000|LAB|000001000000000000011562|20000127|101800||140|4.6|101|2.2|
0010000005|Weber|19091220|1|0000337340|00000|LAB|000001000000000000011579|20000127|134500||138|4.0||2.2|
0010000005|Weber|19091220|1|0000337340|00000|LAB|000001000000000000011591|20000128|084200||138|3.6|98|2.1|
0010000005|Weber|19091220|1|0000337340|00000|LAB|000001000000000000011593|20000128|085900||||||
0010000005|Weber|19091220|1|0000337340|00000|LAB|000001000000000000011653|20000129|093400||140|4.2|99|2.2|
0010000005|Weber|19091220|1|0000337340|00000|LAB|000001000000000000011717|20000129|094100||||||
我得到的是:
patnr name birthday sex casenr Bew Art docnr date time none Na K Cl Ca corCa
1 0010000005 韦伯 19091220 1 0000337340 00000 LAB 1e+19 20000127 084800 140 3.7 100 2.1
2 0010000005 韦伯 19091220 1 0000337340 00000 LAB 1e+19 20000127 080200
3 0010000005 韦伯 19091220 1 0000337340 00000 LAB 1e+19 20000127 101800 140 4.6 101 2.2
4 0010000005 韦伯 19091220 1 0000337340 00000 LAB 1e+19 20000127 134500 138 4.0 2.2
5 0010000005 韦伯 19091220 1 0000337340 00000 LAB 1e+19 20000128 084200 138 3.6 98 2.1
6 0010000005 韦伯 19091220 1 0000337340 00000 LAB 1e+19 20000128 085900
如何防止 "docnr" 转换为“1e+19”?
...例如,通过将列设置为类型 character
,就像您所做的那样:
txt <- "0010000005|Weber|19091220|1|0000337340|00000|LAB|0000010000000000000011524|20000127|084800||140|3.7|100|2.1| 0010000005|Weber|19091220|1|0000337340|00000|LAB|0000010000000000000011541|20000127|080200|||||| 0010000005|Weber|19091220|1|0000337340|00000|LAB|0000010000000000000011562|20000127|101800||140|4.6|101|2.2| 0010000005|Weber|19091220|1|0000337340|00000|LAB|0000010000000000000011579|20000127|134500||138|4.0||2.2| 0010000005|Weber|19091220|1|0000337340|00000|LAB|0000010000000000000011591|20000128|084200||138|3.6|98|2.1| 0010000005|Weber|19091220|1|0000337340|00000|LAB|0000010000000000000011593|20000128|085900|||||| 0010000005|Weber|19091220|1|0000337340|00000|LAB|0000010000000000000011653|20000129|093400||140|4.2|99|2.2| 0010000005|Weber|19091220|1|0000337340|00000|LAB|0000010000000000000011717|20000129|094100||||||"
txt <- gsub(" ", "\n", txt)
colnames<-c("patnr","name","birthday","sex","casenr","Bew","Art","docnr","date","time","none","Na","K","Cl","Ca","corCa")
classes <- rep("character",length(colnames))
ELYTE <- read.table(text = txt, skip=3,comment.char="", sep="|", col.names=colnames, header=FALSE, colClasses=classes)
ELYTE
# patnr name birthday sex casenr Bew Art docnr date time none Na K Cl Ca corCa
# 1 0010000005 Weber 19091220 1 0000337340 00000 LAB 0000010000000000000011579 20000127 134500 138 4.0 2.2
# 2 0010000005 Weber 19091220 1 0000337340 00000 LAB 0000010000000000000011591 20000128 084200 138 3.6 98 2.1
# 3 0010000005 Weber 19091220 1 0000337340 00000 LAB 0000010000000000000011593 20000128 085900
# 4 0010000005 Weber 19091220 1 0000337340 00000 LAB 0000010000000000000011653 20000129 093400 140 4.2 99 2.2
# 5 0010000005 Weber 19091220 1 0000337340 00000 LAB 0000010000000000000011717 20000129 094100
我尝试导入一个大型数据集,其中有一列表示文档编号。该字段包含一个带前导零的 25 位数字。 我尝试使用 read.table() 导入数据,但即使在导入期间将 "character" 分配为 class 时,此特定字段始终为“1e+19”。
# import elyte
colnames<-c("patnr","name","birthday","sex","casenr","Bew","Art","docnr","date","time","none","Na","K","Cl","Ca","corCa")
classes <- rep("character",length(colnames))
ELYTE <- read.table(file="ELYTE.TXT",skip=3,comment.char="",sep="|",col.names=colnames, header=FALSE, colClasses=classes)
原始数据是这样的: 0010000005|Weber|19091220|1|0000337340|00000|LAB|000001000000000000011524|20000127|084800||140|3.7|100|2.1| 0010000005|Weber|19091220|1|0000337340|00000|LAB|000001000000000000011541|20000127|080200|||||| 0010000005|Weber|19091220|1|0000337340|00000|LAB|000001000000000000011562|20000127|101800||140|4.6|101|2.2| 0010000005|Weber|19091220|1|0000337340|00000|LAB|000001000000000000011579|20000127|134500||138|4.0||2.2| 0010000005|Weber|19091220|1|0000337340|00000|LAB|000001000000000000011591|20000128|084200||138|3.6|98|2.1| 0010000005|Weber|19091220|1|0000337340|00000|LAB|000001000000000000011593|20000128|085900|||||| 0010000005|Weber|19091220|1|0000337340|00000|LAB|000001000000000000011653|20000129|093400||140|4.2|99|2.2| 0010000005|Weber|19091220|1|0000337340|00000|LAB|000001000000000000011717|20000129|094100||||||
我得到的是:
patnr name birthday sex casenr Bew Art docnr date time none Na K Cl Ca corCa
1 0010000005 韦伯 19091220 1 0000337340 00000 LAB 1e+19 20000127 084800 140 3.7 100 2.1
2 0010000005 韦伯 19091220 1 0000337340 00000 LAB 1e+19 20000127 080200
3 0010000005 韦伯 19091220 1 0000337340 00000 LAB 1e+19 20000127 101800 140 4.6 101 2.2
4 0010000005 韦伯 19091220 1 0000337340 00000 LAB 1e+19 20000127 134500 138 4.0 2.2
5 0010000005 韦伯 19091220 1 0000337340 00000 LAB 1e+19 20000128 084200 138 3.6 98 2.1
6 0010000005 韦伯 19091220 1 0000337340 00000 LAB 1e+19 20000128 085900
如何防止 "docnr" 转换为“1e+19”?
...例如,通过将列设置为类型 character
,就像您所做的那样:
txt <- "0010000005|Weber|19091220|1|0000337340|00000|LAB|0000010000000000000011524|20000127|084800||140|3.7|100|2.1| 0010000005|Weber|19091220|1|0000337340|00000|LAB|0000010000000000000011541|20000127|080200|||||| 0010000005|Weber|19091220|1|0000337340|00000|LAB|0000010000000000000011562|20000127|101800||140|4.6|101|2.2| 0010000005|Weber|19091220|1|0000337340|00000|LAB|0000010000000000000011579|20000127|134500||138|4.0||2.2| 0010000005|Weber|19091220|1|0000337340|00000|LAB|0000010000000000000011591|20000128|084200||138|3.6|98|2.1| 0010000005|Weber|19091220|1|0000337340|00000|LAB|0000010000000000000011593|20000128|085900|||||| 0010000005|Weber|19091220|1|0000337340|00000|LAB|0000010000000000000011653|20000129|093400||140|4.2|99|2.2| 0010000005|Weber|19091220|1|0000337340|00000|LAB|0000010000000000000011717|20000129|094100||||||"
txt <- gsub(" ", "\n", txt)
colnames<-c("patnr","name","birthday","sex","casenr","Bew","Art","docnr","date","time","none","Na","K","Cl","Ca","corCa")
classes <- rep("character",length(colnames))
ELYTE <- read.table(text = txt, skip=3,comment.char="", sep="|", col.names=colnames, header=FALSE, colClasses=classes)
ELYTE
# patnr name birthday sex casenr Bew Art docnr date time none Na K Cl Ca corCa
# 1 0010000005 Weber 19091220 1 0000337340 00000 LAB 0000010000000000000011579 20000127 134500 138 4.0 2.2
# 2 0010000005 Weber 19091220 1 0000337340 00000 LAB 0000010000000000000011591 20000128 084200 138 3.6 98 2.1
# 3 0010000005 Weber 19091220 1 0000337340 00000 LAB 0000010000000000000011593 20000128 085900
# 4 0010000005 Weber 19091220 1 0000337340 00000 LAB 0000010000000000000011653 20000129 093400 140 4.2 99 2.2
# 5 0010000005 Weber 19091220 1 0000337340 00000 LAB 0000010000000000000011717 20000129 094100