read.csv 错误的列多于列名?
read.csv error more columns than columns name?
我尝试使用 read.csv 从 https://data.worldbank.org/indicator/IS.AIR.PSGR
导入 csv 格式的数据
然而,read.csv
函数返回:
Error in read.table(file = file, header = header, sep = sep, quote = quote,
:
more columns than column names.
我搜索了以前的帖子,但看起来答案因实际情况而异 data tables
,所以这个有什么问题?
问题是由于前 4 行有随机文本。您需要使用 skip = 4
。使用 readr
包中的 read_csv
更好,因为它保留了原始列名。
library(readr)
dat <- read_csv("API_IS.AIR.PSGR_DS2_en_csv_v2.csv", skip = 4)
#> Warning: Missing column names filled in: 'X63' [63]
#> Parsed with column specification:
#> cols(
#> .default = col_integer(),
#> `Country Name` = col_character(),
#> `Country Code` = col_character(),
#> `Indicator Name` = col_character(),
#> `Indicator Code` = col_character(),
#> `1960` = col_character(),
#> `1961` = col_character(),
#> `1962` = col_character(),
#> `1963` = col_character(),
#> `1964` = col_character(),
#> `1965` = col_character(),
#> `1966` = col_character(),
#> `1967` = col_character(),
#> `1968` = col_character(),
#> `1969` = col_character(),
#> `1995` = col_double(),
#> `2007` = col_double(),
#> `2008` = col_double(),
#> `2009` = col_double(),
#> `2010` = col_double(),
#> `2011` = col_double()
#> # ... with 7 more columns
#> )
#> See spec(...) for full column specifications.
head(dat)
#> # A tibble: 6 x 63
#> `Country Name` `Country Code` `Indicator Name` `Indicator Code` `1960`
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Aruba ABW Air transport, pa~ IS.AIR.PSGR <NA>
#> 2 Afghanistan AFG Air transport, pa~ IS.AIR.PSGR <NA>
#> 3 Angola AGO Air transport, pa~ IS.AIR.PSGR <NA>
#> 4 Albania ALB Air transport, pa~ IS.AIR.PSGR <NA>
#> 5 Andorra AND Air transport, pa~ IS.AIR.PSGR <NA>
#> 6 Arab World ARB Air transport, pa~ IS.AIR.PSGR <NA>
#> # ... with 58 more variables: `1961` <chr>, `1962` <chr>, `1963` <chr>,
#> # `1964` <chr>, `1965` <chr>, `1966` <chr>, `1967` <chr>, `1968` <chr>,
#> # `1969` <chr>, `1970` <int>, `1971` <int>, `1972` <int>, `1973` <int>,
#> # `1974` <int>, `1975` <int>, `1976` <int>, `1977` <int>, `1978` <int>,
#> # `1979` <int>, `1980` <int>, `1981` <int>, `1982` <int>, `1983` <int>,
#> # `1984` <int>, `1985` <int>, `1986` <int>, `1987` <int>, `1988` <int>,
#> # `1989` <int>, `1990` <int>, `1991` <int>, `1992` <int>, `1993` <int>,
#> # `1994` <int>, `1995` <dbl>, `1996` <int>, `1997` <int>, `1998` <int>,
#> # `1999` <int>, `2000` <int>, `2001` <int>, `2002` <int>, `2003` <int>,
#> # `2004` <int>, `2005` <int>, `2006` <int>, `2007` <dbl>, `2008` <dbl>,
#> # `2009` <dbl>, `2010` <dbl>, `2011` <dbl>, `2012` <dbl>, `2013` <dbl>,
#> # `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <chr>, X63 <chr>
由 reprex package (v0.2.0) 创建于 2018-03-05。
我在 运行 到 docker 期间遇到了类似的问题。所以,我必须先下载文件,然后再读取 csv 文件。
# download data
download.file("https://data.worldbank.org/indicator/IS.AIR.PSGR", dest = "file.csv")
# load data
gm = read.table("file.csv", header = T, stringsAsFactors = F, skipNul = F)
我尝试使用 read.csv 从 https://data.worldbank.org/indicator/IS.AIR.PSGR
导入 csv 格式的数据然而,read.csv
函数返回:
Error in
read.table(file = file, header = header, sep = sep, quote = quote,
: more columns than column names.
我搜索了以前的帖子,但看起来答案因实际情况而异 data tables
,所以这个有什么问题?
问题是由于前 4 行有随机文本。您需要使用 skip = 4
。使用 readr
包中的 read_csv
更好,因为它保留了原始列名。
library(readr)
dat <- read_csv("API_IS.AIR.PSGR_DS2_en_csv_v2.csv", skip = 4)
#> Warning: Missing column names filled in: 'X63' [63]
#> Parsed with column specification:
#> cols(
#> .default = col_integer(),
#> `Country Name` = col_character(),
#> `Country Code` = col_character(),
#> `Indicator Name` = col_character(),
#> `Indicator Code` = col_character(),
#> `1960` = col_character(),
#> `1961` = col_character(),
#> `1962` = col_character(),
#> `1963` = col_character(),
#> `1964` = col_character(),
#> `1965` = col_character(),
#> `1966` = col_character(),
#> `1967` = col_character(),
#> `1968` = col_character(),
#> `1969` = col_character(),
#> `1995` = col_double(),
#> `2007` = col_double(),
#> `2008` = col_double(),
#> `2009` = col_double(),
#> `2010` = col_double(),
#> `2011` = col_double()
#> # ... with 7 more columns
#> )
#> See spec(...) for full column specifications.
head(dat)
#> # A tibble: 6 x 63
#> `Country Name` `Country Code` `Indicator Name` `Indicator Code` `1960`
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Aruba ABW Air transport, pa~ IS.AIR.PSGR <NA>
#> 2 Afghanistan AFG Air transport, pa~ IS.AIR.PSGR <NA>
#> 3 Angola AGO Air transport, pa~ IS.AIR.PSGR <NA>
#> 4 Albania ALB Air transport, pa~ IS.AIR.PSGR <NA>
#> 5 Andorra AND Air transport, pa~ IS.AIR.PSGR <NA>
#> 6 Arab World ARB Air transport, pa~ IS.AIR.PSGR <NA>
#> # ... with 58 more variables: `1961` <chr>, `1962` <chr>, `1963` <chr>,
#> # `1964` <chr>, `1965` <chr>, `1966` <chr>, `1967` <chr>, `1968` <chr>,
#> # `1969` <chr>, `1970` <int>, `1971` <int>, `1972` <int>, `1973` <int>,
#> # `1974` <int>, `1975` <int>, `1976` <int>, `1977` <int>, `1978` <int>,
#> # `1979` <int>, `1980` <int>, `1981` <int>, `1982` <int>, `1983` <int>,
#> # `1984` <int>, `1985` <int>, `1986` <int>, `1987` <int>, `1988` <int>,
#> # `1989` <int>, `1990` <int>, `1991` <int>, `1992` <int>, `1993` <int>,
#> # `1994` <int>, `1995` <dbl>, `1996` <int>, `1997` <int>, `1998` <int>,
#> # `1999` <int>, `2000` <int>, `2001` <int>, `2002` <int>, `2003` <int>,
#> # `2004` <int>, `2005` <int>, `2006` <int>, `2007` <dbl>, `2008` <dbl>,
#> # `2009` <dbl>, `2010` <dbl>, `2011` <dbl>, `2012` <dbl>, `2013` <dbl>,
#> # `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <chr>, X63 <chr>
由 reprex package (v0.2.0) 创建于 2018-03-05。
我在 运行 到 docker 期间遇到了类似的问题。所以,我必须先下载文件,然后再读取 csv 文件。
# download data
download.file("https://data.worldbank.org/indicator/IS.AIR.PSGR", dest = "file.csv")
# load data
gm = read.table("file.csv", header = T, stringsAsFactors = F, skipNul = F)