在R中读取文本文件
read a text file in R
听起来很蠢!但我找不到读取 this 文本文件的正确方法。
我试过 read.table 和 fread 函数。但没有成功,列与数据不匹配:
m = fread(meq,fill = T,sep = " ")
m = read.table(meq,fill = T,comment.char="-",sep = "")
该文件在顶部包含一些元数据,并且不是可以轻松解析为数据帧的标准格式。一种解决方案是将其作为字符向量读入,进行一些操作,然后读入生成的文件:
meq <- "LS_FLS_YUN_IECED3_Tower_100pctAv.meq"
lines <- readLines(meq)
lines <- lines[-(1:5)]
lines <- gsub("\|", "", lines)
lines <- gsub(" +", " ", lines)
file <- tempfile()
writeLines(lines, file)
data.table::fread(file, sep = " ", fill = TRUE)
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
# 1: 1 t 279.1 307.1 351.1 429.2 524.8 539.1 540.5 550.3 558.5 NA
# 2: 2 v_hor_size 6.6 6.8 7.3 9.6 11.8 13.3 13.4 14.9 16.4 NA
# 3: 3 Elevation 4.1 4.1 4.3 5.1 6.2 6.7 6.7 7.3 7.9 NA
# 4: 4 Mf_x e.1n1 43.8 61.4 106.6 270.0 330.2 447.4 461.3 573.9 689.9
# 5: 5 Mf_y e.1n1 107.1 148.0 236.8 493.8 603.8 746.3 762.0 881.7 994.3
# ---
#766: 766 Mt_030 e13n2 5694.5 5524.4 5559.4 6850.9 8377.6 9381.7 9510.6 10592.5 11771.5
#767: 767 Mt_060 e13n2 9223.2 8757.3 8448.3 9210.8 11263.4 11821.5 11901.7 12606.0 13398.2
#768: 768 Mt_090 e13n2 11582.0 10912.8 10380.1 10898.0 13326.5 13686.1 13745.4 14298.0 14960.4
#769: 769 Mt_120 e13n2 11658.8 11015.8 10529.9 11142.4 13625.4 13989.1 14046.3 14564.8 15166.2
#770: 770 Mt_150 e13n2 9386.4 8973.3 8741.8 9551.2 11679.6 12116.0 12177.5 12712.0 13312.7
这是我的解决方案:
read.table(text = mgsub::mgsub(readLines(meq),c("\|", " e"," e"),c("","e","e")),fill = T,comment.char = "-",sep = "",na.strings ="", stringsAsFactors= F,skip = 2)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1 W<f6>hler slope: 3.5 4.0 5.0 8.0 8.0 10.0 10.25 12.4 14.95
2 Reference cycles: 10000000.0 10000000.0 10000000.0 10000000.0 2000000.0 2000000.0 2000000.00 2000000.0 2000000.00
3 1 t 279.1 307.1 351.1 429.2 524.8 539.1 540.50 550.3 558.50
4 2 v_hor_size 6.6 6.8 7.3 9.6 11.8 13.3 13.40 14.9 16.40
5 3 Elevation 4.1 4.1 4.3 5.1 6.2 6.7 6.70 7.3 7.90
6 4 Mf_xe.1n1 43.8 61.4 106.6 270.0 330.2 447.4 461.30 573.9 689.90
7 5 Mf_ye.1n1 107.1 148.0 236.8 493.8 603.8 746.3 762.00 881.7 994.30
8 6 Mf_xye.1n1 71.7 98.6 156.9 324.7 397.0 488.6 498.60 574.4 644.80
9 7 Mf_000e.1n1 107.1 148.0 236.8 493.8 603.8 746.3 762.00 881.7 994.30
10 8 Mf_030e.1n1 99.6 137.2 219.7 460.4 563.0 698.3 713.30 828.3 938.10
11 9 Mf_060e.1n1 72.4 100.6 165.6 371.9 454.8 581.1 595.50 706.6 814.60
12 10 Mf_090e.1n1 43.8 61.4 106.6 270.0 330.2 447.4 461.30 573.9 689.90
13 11 Mf_120e.1n1 56.1 78.5 130.2 302.2 369.5 486.4 500.10 610.4 722.30
14 12 Mf_150e.1n1 90.6 125.8 202.9 429.6 525.4 653.7 668.00 777.4 882.00
15 13 Mf_xe.2n1 2591.1 2521.6 2510.2 2854.9 3491.1 3777.4 3819.20 4199.4 4652.10
16 14 Mf_ye.2n1 1407.2 1385.4 1606.3 2993.9 3661.1 4509.0 4603.60 5327.9 6016.10
17 15 Mf_xye.2n1 2337.5 2239.9 2157.8 2262.2 2766.3 3002.7 3045.10 3432.6 3845.10
18 16 Mf_000e.2n1 1407.2 1385.4 1606.3 2993.9 3661.1 4509.0 4603.60 5327.9 6016.10
19 17 Mf_030e.2n1 1692.9 1668.5 1798.8 2846.6 3480.9 4238.0 4325.60 5008.6 5674.50
20 18 Mf_060e.2n1 2298.1 2247.6 2275.5 2781.8 3401.7 3849.5 3909.90 4430.1 5003.00
21 19 Mf_090e.2n1 2591.1 2521.6 2510.2 2854.9 3491.1 3777.4 3819.20 4199.4 4652.10
22 20 Mf_120e.2n1 2410.8 2334.5 2300.9 2561.2 3132.0 3411.9 3457.50 3902.0 4461.90
23 21 Mf_150e.2n1 1862.8 1799.9 1810.7 2631.1 3217.5 3955.5 4040.70 4700.3 5339.30
24 22 Mf_xe.3n1 6414.2 6261.0 6252.6 7146.9 8739.6 9466.1 9570.50 10506.0 11580.60
听起来很蠢!但我找不到读取 this 文本文件的正确方法。 我试过 read.table 和 fread 函数。但没有成功,列与数据不匹配:
m = fread(meq,fill = T,sep = " ")
m = read.table(meq,fill = T,comment.char="-",sep = "")
该文件在顶部包含一些元数据,并且不是可以轻松解析为数据帧的标准格式。一种解决方案是将其作为字符向量读入,进行一些操作,然后读入生成的文件:
meq <- "LS_FLS_YUN_IECED3_Tower_100pctAv.meq"
lines <- readLines(meq)
lines <- lines[-(1:5)]
lines <- gsub("\|", "", lines)
lines <- gsub(" +", " ", lines)
file <- tempfile()
writeLines(lines, file)
data.table::fread(file, sep = " ", fill = TRUE)
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
# 1: 1 t 279.1 307.1 351.1 429.2 524.8 539.1 540.5 550.3 558.5 NA
# 2: 2 v_hor_size 6.6 6.8 7.3 9.6 11.8 13.3 13.4 14.9 16.4 NA
# 3: 3 Elevation 4.1 4.1 4.3 5.1 6.2 6.7 6.7 7.3 7.9 NA
# 4: 4 Mf_x e.1n1 43.8 61.4 106.6 270.0 330.2 447.4 461.3 573.9 689.9
# 5: 5 Mf_y e.1n1 107.1 148.0 236.8 493.8 603.8 746.3 762.0 881.7 994.3
# ---
#766: 766 Mt_030 e13n2 5694.5 5524.4 5559.4 6850.9 8377.6 9381.7 9510.6 10592.5 11771.5
#767: 767 Mt_060 e13n2 9223.2 8757.3 8448.3 9210.8 11263.4 11821.5 11901.7 12606.0 13398.2
#768: 768 Mt_090 e13n2 11582.0 10912.8 10380.1 10898.0 13326.5 13686.1 13745.4 14298.0 14960.4
#769: 769 Mt_120 e13n2 11658.8 11015.8 10529.9 11142.4 13625.4 13989.1 14046.3 14564.8 15166.2
#770: 770 Mt_150 e13n2 9386.4 8973.3 8741.8 9551.2 11679.6 12116.0 12177.5 12712.0 13312.7
这是我的解决方案:
read.table(text = mgsub::mgsub(readLines(meq),c("\|", " e"," e"),c("","e","e")),fill = T,comment.char = "-",sep = "",na.strings ="", stringsAsFactors= F,skip = 2)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1 W<f6>hler slope: 3.5 4.0 5.0 8.0 8.0 10.0 10.25 12.4 14.95
2 Reference cycles: 10000000.0 10000000.0 10000000.0 10000000.0 2000000.0 2000000.0 2000000.00 2000000.0 2000000.00
3 1 t 279.1 307.1 351.1 429.2 524.8 539.1 540.50 550.3 558.50
4 2 v_hor_size 6.6 6.8 7.3 9.6 11.8 13.3 13.40 14.9 16.40
5 3 Elevation 4.1 4.1 4.3 5.1 6.2 6.7 6.70 7.3 7.90
6 4 Mf_xe.1n1 43.8 61.4 106.6 270.0 330.2 447.4 461.30 573.9 689.90
7 5 Mf_ye.1n1 107.1 148.0 236.8 493.8 603.8 746.3 762.00 881.7 994.30
8 6 Mf_xye.1n1 71.7 98.6 156.9 324.7 397.0 488.6 498.60 574.4 644.80
9 7 Mf_000e.1n1 107.1 148.0 236.8 493.8 603.8 746.3 762.00 881.7 994.30
10 8 Mf_030e.1n1 99.6 137.2 219.7 460.4 563.0 698.3 713.30 828.3 938.10
11 9 Mf_060e.1n1 72.4 100.6 165.6 371.9 454.8 581.1 595.50 706.6 814.60
12 10 Mf_090e.1n1 43.8 61.4 106.6 270.0 330.2 447.4 461.30 573.9 689.90
13 11 Mf_120e.1n1 56.1 78.5 130.2 302.2 369.5 486.4 500.10 610.4 722.30
14 12 Mf_150e.1n1 90.6 125.8 202.9 429.6 525.4 653.7 668.00 777.4 882.00
15 13 Mf_xe.2n1 2591.1 2521.6 2510.2 2854.9 3491.1 3777.4 3819.20 4199.4 4652.10
16 14 Mf_ye.2n1 1407.2 1385.4 1606.3 2993.9 3661.1 4509.0 4603.60 5327.9 6016.10
17 15 Mf_xye.2n1 2337.5 2239.9 2157.8 2262.2 2766.3 3002.7 3045.10 3432.6 3845.10
18 16 Mf_000e.2n1 1407.2 1385.4 1606.3 2993.9 3661.1 4509.0 4603.60 5327.9 6016.10
19 17 Mf_030e.2n1 1692.9 1668.5 1798.8 2846.6 3480.9 4238.0 4325.60 5008.6 5674.50
20 18 Mf_060e.2n1 2298.1 2247.6 2275.5 2781.8 3401.7 3849.5 3909.90 4430.1 5003.00
21 19 Mf_090e.2n1 2591.1 2521.6 2510.2 2854.9 3491.1 3777.4 3819.20 4199.4 4652.10
22 20 Mf_120e.2n1 2410.8 2334.5 2300.9 2561.2 3132.0 3411.9 3457.50 3902.0 4461.90
23 21 Mf_150e.2n1 1862.8 1799.9 1810.7 2631.1 3217.5 3955.5 4040.70 4700.3 5339.30
24 22 Mf_xe.3n1 6414.2 6261.0 6252.6 7146.9 8739.6 9466.1 9570.50 10506.0 11580.60