如何读取 R 中没有文件扩展名的文件？

Question

我正在处理 R 中的气候数据集，我每年从这里下载 Temp/Precip 全球观测数据：climate data archive, and example datasets can be found yearly temperature data for all countires and another one is yearly precipitation data for all countries。但是，此数据的格式没有文件扩展名，并且忘记或丢失了相应的文件扩展名。我尝试 base::scan() 将它们加载到 R 中，但输出不是我想要的。因为每个文件必须有14个固定列，但如果我使用scan()，它只会读取7列，这对我来说是不希望的。有没有更好的读取没有特定文件扩展名的文件的功能？有什么想法吗？

这是气候数据列表的样子：

list.files("stella/data/air_temp_1980_2014/", 递归 = TRUE)

 [1] "air_temp.1980" "air_temp.1981" "air_temp.1982" "air_temp.1983"
 [5] "air_temp.1984" "air_temp.1985" "air_temp.1986" "air_temp.1987"
 [9] "air_temp.1988" "air_temp.1989" "air_temp.1990" "air_temp.1991"
[13] "air_temp.1992" "air_temp.1993" "air_temp.1994" "air_temp.1995"
[17] "air_temp.1996" "air_temp.1997" "air_temp.1998" "air_temp.1999"
[21] "air_temp.2000" "air_temp.2001" "air_temp.2002" "air_temp.2003"
[25] "air_temp.2004" "air_temp.2005" "air_temp.2006" "air_temp.2007"
[29] "air_temp.2008" "air_temp.2009" "air_temp.2010" "air_temp.2011"
[33] "air_temp.2012" "air_temp.2013" "air_temp.2014"

下面是 scan() 产生输出的方式：

>scan(file = "stella/data/air_temp_1980_2014/air_temp.1980", sep = "", skip = 1)

[1] -179.75   68.75  -27.00  -28.20  -27.20  -21.60   -9.00
   [8]    0.60    2.80    1.90   -0.20  -11.90  -22.70  -25.10
  [15] -179.75   68.25  -27.80  -28.50  -27.50  -22.00   -9.50
  [22]    0.40    3.00    1.80   -0.80  -12.70  -23.60  -26.80
  [29] -179.75   67.75  -26.80  -26.60  -25.70  -20.50   -8.00
  [36]    2.70    6.00    4.00    0.50  -12.20  -23.20  -27.30
  [43] -179.75   67.25  -29.10  -28.40  -27.50  -22.30   -9.70
  [50]    2.20    6.20    3.30   -1.30  -15.40  -26.40  -31.10
  [57] -179.75   66.75  -25.40  -23.80  -22.90  -18.20   -6.10
  [64]    3.80    8.60    6.00    1.10  -11.50  -22.30  -27.20

期望输出:

> desired output
         Long    Lat   Jan   Feb   Mar April   May   Jun   Jul
1     -179.75  68.75 -27.0 -28.2 -27.2 -21.6  -9.0   0.6   2.8
2     -179.75  68.25 -27.8 -28.5 -27.5 -22.0  -9.5   0.4   3.0
3     -179.75  67.75 -26.8 -26.6 -25.7 -20.5  -8.0   2.7   6.0
4     -179.75  67.25 -29.1 -28.4 -27.5 -22.3  -9.7   2.2   6.2
5     -179.75  66.75 -25.4 -23.8 -22.9 -18.2  -6.1   3.8   8.6
6     -179.75  66.25 -21.5 -18.9 -17.2 -14.0  -2.3   3.4   9.2
7     -179.75  65.75 -20.2 -17.9 -17.1 -13.2  -2.2   4.3  10.1
8     -179.75  65.25 -20.0 -18.7 -17.4 -14.1  -2.4   4.3  10.5
9     -179.75 -16.75  27.4  28.3  27.9  27.2  25.7  24.9  24.7
10    -179.75 -84.75 -18.9 -27.9 -38.6 -41.5 -41.2 -44.4 -45.2
11    -179.75 -85.25 -23.9 -33.8 -45.1 -47.9 -47.7 -50.4 -51.5
12    -179.75 -85.75 -22.8 -33.5 -45.2 -48.1 -47.7 -49.9 -51.4
13    -179.75 -86.25 -24.3 -35.5 -47.7 -50.6 -50.2 -52.1 -53.8
14    -179.75 -86.75 -25.5 -37.1 -49.6 -52.6 -52.1 -53.8 -55.7
15    -179.75 -87.25 -26.2 -38.1 -50.9 -53.8 -53.2 -54.8 -56.8
16    -179.75 -87.75 -26.7 -39.0 -51.9 -54.8 -54.3 -55.7 -57.9

我想读取 R 中的所有文件列表。如何在 R 中按预期正确读取上述数据？有什么想法吗？

Answer 1

它似乎是一个制表符分隔的文件（通过添加 .txt 扩展名进行确认）。如果您为每个文件添加 .csv 扩展名，然后使用空格作为分隔符明确读取它们，它应该可以正常工作。这可能很乏味，但可能是您的最佳选择，因为没有适当扩展名的文件本身就会令人困惑。

不过要小心，因为不会保留列名。为避免第一行存储为列名，您还需要将名称向量传递给函数。

name_vector <- c("Long", "Lat", ... )

x <- read.csv("path/precip.1980.csv", sep = "", col.names = name_vector)

编辑：

由于您已经扫描了数据，您应该能够将“.csv”粘贴到文件列表向量中每个元素的末尾，而不必手动执行。但是，read.csv() 没有扩展名将无法工作，因此必须在某个时候完成。

# store file list
filelist <- list.files("stella/data/air_temp_1980_2014/", recursive = TRUE)

# paste extension
filelist <- paste0(filelist, ".csv")

然后你可以用我上面的代码迭代地读入文件。这是一个可行的解决方案示例。为你做那件事。

dat <- lapply(filelist, function (x) {
   read.csv(x, sep = "", col.names = name_vector)
})

我没有明确测试过这个解决方案，它仍然可能会因为列名问题而出现错误。如果您能提供适当的 reprex，那么为您解决这些问题会容易得多。

Answer 2

文件扩展名本身意义不大。它在那里表示文件中的数据是如何排序的。您应该在 text-editor 中打开文件以弄清楚它是如何表示的。

从外观上看，根据，它可能是一个 tab-delimited csv 文件。所以将它导入 R 的方法是使用 CSV-related 输入函数，例如 read.csv 或 data.table::fread.

如何读取 R 中没有文件扩展名的文件？

How can I read the file which doesn't have Filename extension in R?

file-extension

r

dataframe

编辑：