将大型 Excel 文件直接粘贴到 R 中(有很多列)- "Error in scan"
Copy Pasting large Excel files directly into R (with many columns) - "Error in scan"
我可以在 Microsoft Excel 内容上使用 read.delim("clipboard")
,并将复制的控制台输出插入到如下所示的 text =
参数中。这很好用。
df1 <- read.table(header = TRUE, text =
" a b
1 0.2267953 -0.25450740
2 -1.4967091 -0.90682792
3 -1.3156086 -0.08949872
4 0.2720266 -1.01155805
5 1.1755608 -1.73036765
6 0.5024211 -0.01226299
7 0.2806160 0.33141502
8 -1.8631702 0.35364807
9 0.2669309 0.90964756
10 -1.9147608 0.18394934")
如果我的 Excel 文件中的列太多,事情就会开始崩溃。我认为这是因为我的控制台输出被分成几个块。如果我从 read.delim("clipboard")
复制我的 'too many columns' 控制台输出并将其插入到下面的 text =
参数中,我最终会遇到以下错误:
df2 <- read.table(header = TRUE, text =
" a b c
1 0.6604331 -0.09190024 -1.30400419
2 0.5114487 0.29496370 -1.25137557
3 0.1955764 0.30972257 0.00478639
4 -1.0400516 -1.08210784 -0.14906742
5 -0.5022574 -0.12988141 0.93325264
6 1.6502558 0.01255227 -0.58192138
7 -0.5359307 -0.92271576 0.43877026
8 -1.1947015 -1.05887833 0.89072608
9 1.0664275 -1.12816603 1.97051795
10 0.2466212 -0.78481492 -0.69115265
d e f
1 0.46968125 1.13310269 0.90007897
2 1.41915478 -0.15813081 -1.07687043
3 2.57197248 0.08487282 0.82166321
4 0.18698150 0.23860853 -0.04076551
5 1.20221764 -0.97671366 -0.13799642
6 0.64680778 -0.77625578 -1.01934201
7 0.25143965 -0.13433564 -2.11476517
8 -0.04562408 -0.41225541 -1.34095833
9 0.77567374 -0.53714819 1.12345455
10 -0.76428423 -0.22667688 -0.18617513
g h
1 0.3160803 0.6623033
2 0.6979845 1.3685583
3 -1.5598213 -0.6806526
4 -0.3178346 0.4211778
5 0.8634450 -1.5223605
6 0.4252802 0.1312011
7 -0.6166845 1.6632878
8 -0.2589889 -0.1199479
9 -0.7146200 0.7655468
10 -0.6124751 -0.6891370
")
#> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
#> line 11 did not have 4 elements
是否有解决此 'Error in scan'
错误的方法?我知道 @MilesMcBain 非常出色 datapasta package 但想要一个不需要 R Studio 的解决方案。 Base-R 和 Tidyverse 解决方案很受欢迎。
另请注意,我需要将数据直接保存在我的脚本中,而不是从 *.csv
、*.tsv
或 *.xls
文件中导入,因此动机这个问题。
x <- readClipboard()
获取剪贴板内容
或:从 excel 复制,并使用 "clipboad"
作为输入文件....
read.table(file = "clipboard", sep = "\t")
管理此问题的一种方法是将文件编写为易于重构的压缩数据结构:
library(jsonlite)
toJSON(read.table('clipboard', header = TRUE))
完整的 JSON 字符串将打印到控制台,您只需将其复制并粘贴到您的代码中,然后将其分配给一个对象,比方说 data
-- 注意,您确实需要引用 JSON 字符串:
data <- '[{"a":0.0978,"b":0.1704,"c":0.469,"d":0.0919,"e":0.4881,"f":0.414,"g":0.865,"h":0.6461},{"a":0.4975,"b":0.3762,"c":0.5015,"d":0.8096,"e":0.1041,"f":0.8868,"g":0.7983,"h":0.072},{"a":0.2335,"b":0.1997,"c":0.7992,"d":0.3203,"e":0.694,"f":0.2838,"g":0.3469,"h":0.4552},{"a":0.8392,"b":0.2544,"c":0.6384,"d":0.9021,"e":0.7761,"f":0.806,"g":0.431,"h":0.9182},{"a":0.2685,"b":0.2624,"c":0.8339,"d":0.1081,"e":0.3896,"f":0.6784,"g":0.7051,"h":0.2658},{"a":0.4708,"b":0.3424,"c":0.505,"d":0.2119,"e":0.3758,"f":0.1155,"g":0.0585,"h":0.2035},{"a":0.1734,"b":0.9656,"c":0.2278,"d":0.6977,"e":0.7876,"f":0.0204,"g":0.7441,"h":0.626},{"a":0.0751,"b":0.0729,"c":0.3399,"d":0.9851,"e":0.2846,"f":0.0652,"g":0.6614,"h":0.7401},{"a":0.9651,"b":0.9437,"c":0.8807,"d":0.2687,"e":0.6538,"f":0.3907,"g":0.8816,"h":0.5983}]'
这为您提供了一个很好的压缩单行来存储数据。与 read.table(text = ...)
不同,这不会有太多列或 row/line 间距的问题 - 至少假设您没有尝试以这种方式加载大量数据集。
您可以使用以下方法轻松重建数据框:
fromJSON(data)
a b c d e f g h
1 0.0978 0.1704 0.4690 0.0919 0.4881 0.4140 0.8650 0.6461
2 0.4975 0.3762 0.5015 0.8096 0.1041 0.8868 0.7983 0.0720
3 0.2335 0.1997 0.7992 0.3203 0.6940 0.2838 0.3469 0.4552
4 0.8392 0.2544 0.6384 0.9021 0.7761 0.8060 0.4310 0.9182
如果您致力于留在 base
环境中,并且不想加载 jsonlite
,您仍然可以使用 write.csv
执行此操作,只是不那么干净:
write.csv(df2)
将 df2
作为 .csv
输出到控制台。然后您可以将其复制并粘贴回您的代码中(以前两行为例):
"","a","b","c","d","e","f","g","h"
"1",0.097767305,0.17043808,0.469039979,0.091881245,0.488090975,0.41400278,0.865041585,0.646119496
"2",0.497482762,0.376181817,0.50152601,0.809582305,0.104101727,0.8868107,0.798329506,0.072007646
然后像这样读回去——再次注意,write.csv
的输出用单引号括起来:
read.csv(text = '"","a","b","c","d","e","f","g","h"
"1",0.097767305,0.17043808,0.469039979,0.091881245,0.488090975,0.41400278,0.865041585,0.646119496
"2",0.497482762,0.376181817,0.50152601,0.809582305,0.104101727,0.8868107,0.798329506,0.072007646', header = T)
使用 .csv
的缺点是代码中的数据结构比较混乱,但在功能上它仍然可以正常工作。
所以我 运行 一个类似的代码,问题似乎不是有多少列,而是它们是否损坏。我 运行 它两次,当我把我的 window 加宽时,它让 R 把所有东西都打印在一起,所以它起作用了。我将链接到我拍摄的印刷品和代码。
https://puu.sh/CEE61.png #这就是你想做的,为了工作
df2 <- read.table(header = TRUE, text = "
1 a b c d e f
2 1 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
3 2 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
4 3 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
5 4 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
6 5 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
7 6 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
8 7 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
9 8 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
10 9 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
11 10 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
12 g h
13 1 456456456456 456456456456
14 2 456456456456 456456456456
15 3 456456456456 456456456456
16 4 456456456456 456456456456
17 5 456456456456 456456456456
18 6 456456456456 456456456456
19 7 456456456456 456456456456
20 8 456456456456 456456456456
21 9 456456456456 456456456456
22 10 456456456456 456456456456
")
#running this got a similar error, but running the next one doesn't
df2 <- read.table(header = TRUE, text = " a b c d e f g h
1 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
2 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
3 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
4 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
5 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
6 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
7 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
8 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
9 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
10 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456")
我可以在 Microsoft Excel 内容上使用 read.delim("clipboard")
,并将复制的控制台输出插入到如下所示的 text =
参数中。这很好用。
df1 <- read.table(header = TRUE, text =
" a b
1 0.2267953 -0.25450740
2 -1.4967091 -0.90682792
3 -1.3156086 -0.08949872
4 0.2720266 -1.01155805
5 1.1755608 -1.73036765
6 0.5024211 -0.01226299
7 0.2806160 0.33141502
8 -1.8631702 0.35364807
9 0.2669309 0.90964756
10 -1.9147608 0.18394934")
如果我的 Excel 文件中的列太多,事情就会开始崩溃。我认为这是因为我的控制台输出被分成几个块。如果我从 read.delim("clipboard")
复制我的 'too many columns' 控制台输出并将其插入到下面的 text =
参数中,我最终会遇到以下错误:
df2 <- read.table(header = TRUE, text =
" a b c
1 0.6604331 -0.09190024 -1.30400419
2 0.5114487 0.29496370 -1.25137557
3 0.1955764 0.30972257 0.00478639
4 -1.0400516 -1.08210784 -0.14906742
5 -0.5022574 -0.12988141 0.93325264
6 1.6502558 0.01255227 -0.58192138
7 -0.5359307 -0.92271576 0.43877026
8 -1.1947015 -1.05887833 0.89072608
9 1.0664275 -1.12816603 1.97051795
10 0.2466212 -0.78481492 -0.69115265
d e f
1 0.46968125 1.13310269 0.90007897
2 1.41915478 -0.15813081 -1.07687043
3 2.57197248 0.08487282 0.82166321
4 0.18698150 0.23860853 -0.04076551
5 1.20221764 -0.97671366 -0.13799642
6 0.64680778 -0.77625578 -1.01934201
7 0.25143965 -0.13433564 -2.11476517
8 -0.04562408 -0.41225541 -1.34095833
9 0.77567374 -0.53714819 1.12345455
10 -0.76428423 -0.22667688 -0.18617513
g h
1 0.3160803 0.6623033
2 0.6979845 1.3685583
3 -1.5598213 -0.6806526
4 -0.3178346 0.4211778
5 0.8634450 -1.5223605
6 0.4252802 0.1312011
7 -0.6166845 1.6632878
8 -0.2589889 -0.1199479
9 -0.7146200 0.7655468
10 -0.6124751 -0.6891370
")
#> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
#> line 11 did not have 4 elements
是否有解决此 'Error in scan'
错误的方法?我知道 @MilesMcBain 非常出色 datapasta package 但想要一个不需要 R Studio 的解决方案。 Base-R 和 Tidyverse 解决方案很受欢迎。
另请注意,我需要将数据直接保存在我的脚本中,而不是从 *.csv
、*.tsv
或 *.xls
文件中导入,因此动机这个问题。
x <- readClipboard()
获取剪贴板内容
或:从 excel 复制,并使用 "clipboad"
作为输入文件....
read.table(file = "clipboard", sep = "\t")
管理此问题的一种方法是将文件编写为易于重构的压缩数据结构:
library(jsonlite)
toJSON(read.table('clipboard', header = TRUE))
完整的 JSON 字符串将打印到控制台,您只需将其复制并粘贴到您的代码中,然后将其分配给一个对象,比方说 data
-- 注意,您确实需要引用 JSON 字符串:
data <- '[{"a":0.0978,"b":0.1704,"c":0.469,"d":0.0919,"e":0.4881,"f":0.414,"g":0.865,"h":0.6461},{"a":0.4975,"b":0.3762,"c":0.5015,"d":0.8096,"e":0.1041,"f":0.8868,"g":0.7983,"h":0.072},{"a":0.2335,"b":0.1997,"c":0.7992,"d":0.3203,"e":0.694,"f":0.2838,"g":0.3469,"h":0.4552},{"a":0.8392,"b":0.2544,"c":0.6384,"d":0.9021,"e":0.7761,"f":0.806,"g":0.431,"h":0.9182},{"a":0.2685,"b":0.2624,"c":0.8339,"d":0.1081,"e":0.3896,"f":0.6784,"g":0.7051,"h":0.2658},{"a":0.4708,"b":0.3424,"c":0.505,"d":0.2119,"e":0.3758,"f":0.1155,"g":0.0585,"h":0.2035},{"a":0.1734,"b":0.9656,"c":0.2278,"d":0.6977,"e":0.7876,"f":0.0204,"g":0.7441,"h":0.626},{"a":0.0751,"b":0.0729,"c":0.3399,"d":0.9851,"e":0.2846,"f":0.0652,"g":0.6614,"h":0.7401},{"a":0.9651,"b":0.9437,"c":0.8807,"d":0.2687,"e":0.6538,"f":0.3907,"g":0.8816,"h":0.5983}]'
这为您提供了一个很好的压缩单行来存储数据。与 read.table(text = ...)
不同,这不会有太多列或 row/line 间距的问题 - 至少假设您没有尝试以这种方式加载大量数据集。
您可以使用以下方法轻松重建数据框:
fromJSON(data)
a b c d e f g h
1 0.0978 0.1704 0.4690 0.0919 0.4881 0.4140 0.8650 0.6461
2 0.4975 0.3762 0.5015 0.8096 0.1041 0.8868 0.7983 0.0720
3 0.2335 0.1997 0.7992 0.3203 0.6940 0.2838 0.3469 0.4552
4 0.8392 0.2544 0.6384 0.9021 0.7761 0.8060 0.4310 0.9182
如果您致力于留在 base
环境中,并且不想加载 jsonlite
,您仍然可以使用 write.csv
执行此操作,只是不那么干净:
write.csv(df2)
将 df2
作为 .csv
输出到控制台。然后您可以将其复制并粘贴回您的代码中(以前两行为例):
"","a","b","c","d","e","f","g","h"
"1",0.097767305,0.17043808,0.469039979,0.091881245,0.488090975,0.41400278,0.865041585,0.646119496
"2",0.497482762,0.376181817,0.50152601,0.809582305,0.104101727,0.8868107,0.798329506,0.072007646
然后像这样读回去——再次注意,write.csv
的输出用单引号括起来:
read.csv(text = '"","a","b","c","d","e","f","g","h"
"1",0.097767305,0.17043808,0.469039979,0.091881245,0.488090975,0.41400278,0.865041585,0.646119496
"2",0.497482762,0.376181817,0.50152601,0.809582305,0.104101727,0.8868107,0.798329506,0.072007646', header = T)
使用 .csv
的缺点是代码中的数据结构比较混乱,但在功能上它仍然可以正常工作。
所以我 运行 一个类似的代码,问题似乎不是有多少列,而是它们是否损坏。我 运行 它两次,当我把我的 window 加宽时,它让 R 把所有东西都打印在一起,所以它起作用了。我将链接到我拍摄的印刷品和代码。
https://puu.sh/CEE61.png #这就是你想做的,为了工作
df2 <- read.table(header = TRUE, text = "
1 a b c d e f
2 1 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
3 2 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
4 3 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
5 4 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
6 5 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
7 6 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
8 7 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
9 8 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
10 9 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
11 10 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
12 g h
13 1 456456456456 456456456456
14 2 456456456456 456456456456
15 3 456456456456 456456456456
16 4 456456456456 456456456456
17 5 456456456456 456456456456
18 6 456456456456 456456456456
19 7 456456456456 456456456456
20 8 456456456456 456456456456
21 9 456456456456 456456456456
22 10 456456456456 456456456456
")
#running this got a similar error, but running the next one doesn't
df2 <- read.table(header = TRUE, text = " a b c d e f g h
1 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
2 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
3 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
4 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
5 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
6 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
7 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
8 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
9 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456
10 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456 456456456456")