在 R 中导入 txt 文件,忽略前几行
Import txt file in R ignoring first few lines
从 MET 办公室下载了苏格兰降雨量数据。
前几行:
Scotland Rainfall (mm)
Areal series, starting from 1910
Allowances have been made for topographic, coastal and urban effects where relationships are found to exist.
Seasons: Winter=Dec-Feb, Spring=Mar-May, Summer=June-Aug, Autumn=Sept-Nov. (Winter: Year refers to Jan/Feb).
Values are ranked and displayed to 1 dp. Where values are equal, rankings are based in order of year descending.
Data are provisional from February 2015 & Winter 2015. Last updated 26/11/2015
JAN Year FEB Year MAR Year APR Year MAY Year JUN Year JUL Year AUG Year SEP Year OCT Year NOV Year DEC Year WIN Year SPR Year SUM Year AUT Year ANN Year
293.8 1993 278.1 1990 238.5 1994 191.1 1947 191.4 2011 155.0 1938 185.6 1940 216.5 1985 267.6 1950 258.1 1935 262.0 2009 300.7 2013 743.6 2014 409.5 1986 455.6 1985 661.2 1981 1886.4 2011
292.2 1928 258.8 1997 233.4 1990 149.0 1910 168.7 1986 137.9 2002 181.4 1988 211.9 1992 221.2 1981 254.0 1954 244.8 1938 268.5 1986 649.5 1995 401.3 2015 435.6 1948 633.8 1954 1828.1 1990
275.6 2008 244.7 2002 201.3 1992 146.8 1934 155.9 1925 137.8 1948 170.1 1939 202.3 2009 193.9 1982 248.8 2014 242.2 2006 267.2 1929 645.4 2000 393.7 1994 427.8 2009 615.8 1938 1756.8 2014
我正在尝试将此 txt 文件读入 R 并尝试了以下操作:
fileURL <- "http://www.metoffice.gov.uk/pub/data/weather/uk/climate/datasets/Rainfall/ranked/Scotland.txt"
if(!file.exists("scotland_rainfall.txt")){
#this will download the file in the current working directory
download.file(fileURL,destfile = "scotland_rainfall.txt")
dateDownload <- Sys.Date() #30-11-2015
}
scotland_weather <- read.table("scotland_rainfall.txt",skip = 8,header = F,sep = "\t",na.strings = "")
它以各种级别解释所有因素:
> head(scotland_weather)
V1
1 293.8 1993 278.1 1990 238.5 1994 191.1 1947 191.4 2011 155.0 1938 185.6 1940 216.5 1985 267.6 1950 258.1 1935 262.0 2009 300.7 2013 743.6 2014 409.5 1986 455.6 1985 661.2 1981 1886.4 2011
2 292.2 1928 258.8 1997 233.4 1990 149.0 1910 168.7 1986 137.9 2002 181.4 1988 211.9 1992 221.2 1981 254.0 1954 244.8 1938 268.5 1986 649.5 1995 401.3 2015 435.6 1948 633.8 1954 1828.1 1990
3 275.6 2008 244.7 2002 201.3 1992 146.8 1934 155.9 1925 137.8 1948 170.1 1939 202.3 2009 193.9 1982 248.8 2014 242.2 2006 267.2 1929 645.4 2000 393.7 1994 427.8 2009 615.8 1938 1756.8 2014
4 252.3 2015 227.9 1989 200.2 1967 142.1 1949 149.5 2015 137.7 1931 165.8 2010 191.4 1962 189.7 2011 247.7 1938 231.3 1917 265.4 2011 638.3 2007 393.2 1967 422.6 1956 594.5 1935 1735.8 1938
5 246.2 1974 224.9 2014 180.2 1979 133.5 1950 137.4 2003 135.0 1966 162.9 1956 190.3 2014 189.7 1927 242.3 1983 229.9 1981 264.0 2006 608.9 1990 391.7 1992 397.0 2004 590.6 1982 1720.0 2008
6 245.0 1975 195.6 1995 180.0 1989 132.9 1932 129.7 2007 131.7 2004 159.9 1985 189.1 2004 189.6 1985 240.9 2001 224.9 1951 261.0 1912 592.8 2015 389.1 1913 390.1 1938 589.2 2006 1716.5 1954
> str(scotland_weather)
'data.frame': 106 obs. of 1 variable:
$ V1: Factor w/ 106 levels " 38.6 1963 10.3 1932 28.7 1929 14.0 1974 22.5 1984 30.1 1988 32.7 1913 5.1 1947 31.7 1972 "| __truncated__,..: 106 105 104 103 102 101 100 99 98 97 ...
也尝试过Header=T
> scotland_weather <- read.table("scotland_rainfall.txt",skip = 8,header = T,sep = "\t",na.strings = "")
> head(scotland_weather)
X293.8..1993...278.1..1990...238.5..1994...191.1..1947...191.4..2011...155.0..1938...185.6..1940...216.5..1985...267.6..1950...258.1..1935...262.0..2009...300.7..2013...743.6..2014...409.5..1986...455.6..1985...661.2..1981..1886.4..2011
1 292.2 1928 258.8 1997 233.4 1990 149.0 1910 168.7 1986 137.9 2002 181.4 1988 211.9 1992 221.2 1981 254.0 1954 244.8 1938 268.5 1986 649.5 1995 401.3 2015 435.6 1948 633.8 1954 1828.1 1990
2 275.6 2008 244.7 2002 201.3 1992 146.8 1934 155.9 1925 137.8 1948 170.1 1939 202.3 2009 193.9 1982 248.8 2014 242.2 2006 267.2 1929 645.4 2000 393.7 1994 427.8 2009 615.8 1938 1756.8 2014
3 252.3 2015 227.9 1989 200.2 1967 142.1 1949 149.5 2015 137.7 1931 165.8 2010 191.4 1962 189.7 2011 247.7 1938 231.3 1917 265.4 2011 638.3 2007 393.2 1967 422.6 1956 594.5 1935 1735.8 1938
4 246.2 1974 224.9 2014 180.2 1979 133.5 1950 137.4 2003 135.0 1966 162.9 1956 190.3 2014 189.7 1927 242.3 1983 229.9 1981 264.0 2006 608.9 1990 391.7 1992 397.0 2004 590.6 1982 1720.0 2008
5 245.0 1975 195.6 1995 180.0 1989 132.9 1932 129.7 2007 131.7 2004 159.9 1985 189.1 2004 189.6 1985 240.9 2001 224.9 1951 261.0 1912 592.8 2015 389.1 1913 390.1 1938 589.2 2006 1716.5 1954
6 241.9 2005 194.8 1998 179.6 1921 132.3 1927 129.6 1920 130.4 1980 158.0 1953 188.8 1948 187.5 1935 238.1 2008 223.2 1986 260.8 1949 580.6 1920 386.5 1947 387.5 2012 587.8 1984 1696.7 2004
> str(scotland_weather)
'data.frame': 105 obs. of 1 variable:
$ X293.8..1993...278.1..1990...238.5..1994...191.1..1947...191.4..2011...155.0..1938...185.6..1940...216.5..1985...267.6..1950...258.1..1935...262.0..2009...300.7..2013...743.6..2014...409.5..1986...455.6..1985...661.2..1981..1886.4..2011: Factor w/ 105 levels " 38.6 1963 10.3 1932 28.7 1929 14.0 1974 22.5 1984 30.1 1988 32.7 1913 5.1 1947 31.7 1972 "| __truncated__,..: 105 104 103 102 101 100 99 98 97 96 ...
我希望保留与 txt 文件相同的列名。
任何其他想法将不胜感激。
谢谢
好像文件确实有固定宽度的字段,但是表头和数据行不一致,所以像这样分别读取表头和数据。不需要软件包。
hdr <- read.table(fileURL, skip = 7, nrow = 1, as.is = TRUE)
widths <- rep(c(8, 6), times = 17) # 8, 6, 8, 6, ..., 8, 6
dd <- read.fwf(fileURL, widths, skip = 8, col.names = hdr, check.names = FALSE)
注意: 可以从数据的第一行计算出 widths
,如下所示:
one.line <- readLines(fileURL, n = 9)[9] # char string with 1st line of data
widths <- diff(c(0, gregexpr("\S(?=\s)", paste(one.line, ""), perl = TRUE)[[1]]))
从 MET 办公室下载了苏格兰降雨量数据。
前几行:
Scotland Rainfall (mm)
Areal series, starting from 1910
Allowances have been made for topographic, coastal and urban effects where relationships are found to exist.
Seasons: Winter=Dec-Feb, Spring=Mar-May, Summer=June-Aug, Autumn=Sept-Nov. (Winter: Year refers to Jan/Feb).
Values are ranked and displayed to 1 dp. Where values are equal, rankings are based in order of year descending.
Data are provisional from February 2015 & Winter 2015. Last updated 26/11/2015
JAN Year FEB Year MAR Year APR Year MAY Year JUN Year JUL Year AUG Year SEP Year OCT Year NOV Year DEC Year WIN Year SPR Year SUM Year AUT Year ANN Year
293.8 1993 278.1 1990 238.5 1994 191.1 1947 191.4 2011 155.0 1938 185.6 1940 216.5 1985 267.6 1950 258.1 1935 262.0 2009 300.7 2013 743.6 2014 409.5 1986 455.6 1985 661.2 1981 1886.4 2011
292.2 1928 258.8 1997 233.4 1990 149.0 1910 168.7 1986 137.9 2002 181.4 1988 211.9 1992 221.2 1981 254.0 1954 244.8 1938 268.5 1986 649.5 1995 401.3 2015 435.6 1948 633.8 1954 1828.1 1990
275.6 2008 244.7 2002 201.3 1992 146.8 1934 155.9 1925 137.8 1948 170.1 1939 202.3 2009 193.9 1982 248.8 2014 242.2 2006 267.2 1929 645.4 2000 393.7 1994 427.8 2009 615.8 1938 1756.8 2014
我正在尝试将此 txt 文件读入 R 并尝试了以下操作:
fileURL <- "http://www.metoffice.gov.uk/pub/data/weather/uk/climate/datasets/Rainfall/ranked/Scotland.txt"
if(!file.exists("scotland_rainfall.txt")){
#this will download the file in the current working directory
download.file(fileURL,destfile = "scotland_rainfall.txt")
dateDownload <- Sys.Date() #30-11-2015
}
scotland_weather <- read.table("scotland_rainfall.txt",skip = 8,header = F,sep = "\t",na.strings = "")
它以各种级别解释所有因素:
> head(scotland_weather)
V1
1 293.8 1993 278.1 1990 238.5 1994 191.1 1947 191.4 2011 155.0 1938 185.6 1940 216.5 1985 267.6 1950 258.1 1935 262.0 2009 300.7 2013 743.6 2014 409.5 1986 455.6 1985 661.2 1981 1886.4 2011
2 292.2 1928 258.8 1997 233.4 1990 149.0 1910 168.7 1986 137.9 2002 181.4 1988 211.9 1992 221.2 1981 254.0 1954 244.8 1938 268.5 1986 649.5 1995 401.3 2015 435.6 1948 633.8 1954 1828.1 1990
3 275.6 2008 244.7 2002 201.3 1992 146.8 1934 155.9 1925 137.8 1948 170.1 1939 202.3 2009 193.9 1982 248.8 2014 242.2 2006 267.2 1929 645.4 2000 393.7 1994 427.8 2009 615.8 1938 1756.8 2014
4 252.3 2015 227.9 1989 200.2 1967 142.1 1949 149.5 2015 137.7 1931 165.8 2010 191.4 1962 189.7 2011 247.7 1938 231.3 1917 265.4 2011 638.3 2007 393.2 1967 422.6 1956 594.5 1935 1735.8 1938
5 246.2 1974 224.9 2014 180.2 1979 133.5 1950 137.4 2003 135.0 1966 162.9 1956 190.3 2014 189.7 1927 242.3 1983 229.9 1981 264.0 2006 608.9 1990 391.7 1992 397.0 2004 590.6 1982 1720.0 2008
6 245.0 1975 195.6 1995 180.0 1989 132.9 1932 129.7 2007 131.7 2004 159.9 1985 189.1 2004 189.6 1985 240.9 2001 224.9 1951 261.0 1912 592.8 2015 389.1 1913 390.1 1938 589.2 2006 1716.5 1954
> str(scotland_weather)
'data.frame': 106 obs. of 1 variable:
$ V1: Factor w/ 106 levels " 38.6 1963 10.3 1932 28.7 1929 14.0 1974 22.5 1984 30.1 1988 32.7 1913 5.1 1947 31.7 1972 "| __truncated__,..: 106 105 104 103 102 101 100 99 98 97 ...
也尝试过Header=T
> scotland_weather <- read.table("scotland_rainfall.txt",skip = 8,header = T,sep = "\t",na.strings = "")
> head(scotland_weather)
X293.8..1993...278.1..1990...238.5..1994...191.1..1947...191.4..2011...155.0..1938...185.6..1940...216.5..1985...267.6..1950...258.1..1935...262.0..2009...300.7..2013...743.6..2014...409.5..1986...455.6..1985...661.2..1981..1886.4..2011
1 292.2 1928 258.8 1997 233.4 1990 149.0 1910 168.7 1986 137.9 2002 181.4 1988 211.9 1992 221.2 1981 254.0 1954 244.8 1938 268.5 1986 649.5 1995 401.3 2015 435.6 1948 633.8 1954 1828.1 1990
2 275.6 2008 244.7 2002 201.3 1992 146.8 1934 155.9 1925 137.8 1948 170.1 1939 202.3 2009 193.9 1982 248.8 2014 242.2 2006 267.2 1929 645.4 2000 393.7 1994 427.8 2009 615.8 1938 1756.8 2014
3 252.3 2015 227.9 1989 200.2 1967 142.1 1949 149.5 2015 137.7 1931 165.8 2010 191.4 1962 189.7 2011 247.7 1938 231.3 1917 265.4 2011 638.3 2007 393.2 1967 422.6 1956 594.5 1935 1735.8 1938
4 246.2 1974 224.9 2014 180.2 1979 133.5 1950 137.4 2003 135.0 1966 162.9 1956 190.3 2014 189.7 1927 242.3 1983 229.9 1981 264.0 2006 608.9 1990 391.7 1992 397.0 2004 590.6 1982 1720.0 2008
5 245.0 1975 195.6 1995 180.0 1989 132.9 1932 129.7 2007 131.7 2004 159.9 1985 189.1 2004 189.6 1985 240.9 2001 224.9 1951 261.0 1912 592.8 2015 389.1 1913 390.1 1938 589.2 2006 1716.5 1954
6 241.9 2005 194.8 1998 179.6 1921 132.3 1927 129.6 1920 130.4 1980 158.0 1953 188.8 1948 187.5 1935 238.1 2008 223.2 1986 260.8 1949 580.6 1920 386.5 1947 387.5 2012 587.8 1984 1696.7 2004
> str(scotland_weather)
'data.frame': 105 obs. of 1 variable:
$ X293.8..1993...278.1..1990...238.5..1994...191.1..1947...191.4..2011...155.0..1938...185.6..1940...216.5..1985...267.6..1950...258.1..1935...262.0..2009...300.7..2013...743.6..2014...409.5..1986...455.6..1985...661.2..1981..1886.4..2011: Factor w/ 105 levels " 38.6 1963 10.3 1932 28.7 1929 14.0 1974 22.5 1984 30.1 1988 32.7 1913 5.1 1947 31.7 1972 "| __truncated__,..: 105 104 103 102 101 100 99 98 97 96 ...
我希望保留与 txt 文件相同的列名。
任何其他想法将不胜感激。
谢谢
好像文件确实有固定宽度的字段,但是表头和数据行不一致,所以像这样分别读取表头和数据。不需要软件包。
hdr <- read.table(fileURL, skip = 7, nrow = 1, as.is = TRUE)
widths <- rep(c(8, 6), times = 17) # 8, 6, 8, 6, ..., 8, 6
dd <- read.fwf(fileURL, widths, skip = 8, col.names = hdr, check.names = FALSE)
注意: 可以从数据的第一行计算出 widths
,如下所示:
one.line <- readLines(fileURL, n = 9)[9] # char string with 1st line of data
widths <- diff(c(0, gregexpr("\S(?=\s)", paste(one.line, ""), perl = TRUE)[[1]]))