使用 read_excel() 时将 ... 设置为 na
set ... to na when using read_excel()
我正在尝试读取 excel 文件,这些文件使用三个水平点来表示缺失值,例如...
是否可以使用 read_excel()
将它们设置为 NA
?我为 na
参数尝试了不同的选项(见下文),none 似乎有效
d0 <- read_excel(path = "WPP2019_FERT_F02_SEX_RATIO_AT_BIRTH.xlsx)",
# na = "...", # does not work
# na = "…", # copying the output does not work
# na = "U+2026", # unicode character does not work
sheet = 2, skip = 16)
d0
# # A tibble: 255 x 21
# Index Variant `Region, subreg~ Notes `Country code` Type `Parent code` `1950-1955`
# <dbl> <chr> <chr> <chr> <dbl> <chr> <dbl> <chr>
# 1 1 Estima~ WORLD NA 900 World 0 1.06
# 2 2 Estima~ UN development ~ a 1803 Labe~ 900 …
# 3 3 Estima~ More developed ~ b 901 Deve~ 1803 1.06
# 4 4 Estima~ Less developed ~ c 902 Deve~ 1803 1.06
# 5 5 Estima~ Least developed~ d 941 Deve~ 902 1.04
# 6 6 Estima~ Less developed ~ e 934 Deve~ 902 1.06
# 7 7 Estima~ Less developed ~ NA 948 Deve~ 1803 1.05
# 8 8 Estima~ Land-locked Dev~ f 1636 Spec~ 1803 1.04
# 9 9 Estima~ Small Island De~ g 1637 Spec~ 1803 1.05
# 10 10 Estima~ World Bank inco~ NA 1802 Labe~ 900 …
# # ... with 245 more rows, and 13 more variables: `1955-1960` <chr>, `1960-1965` <chr>,
# # `1965-1970` <chr>, `1970-1975` <chr>, `1975-1980` <chr>, `1980-1985` <chr>,
# # `1985-1990` <chr>, `1990-1995` <chr>, `1995-2000` <chr>, `2000-2005` <chr>,
# # `2005-2010` <chr>, `2010-2015` <chr>, `2015-2020` <chr>
未创建 NA
且值不是数字类型的示例列...
d3 %>% select(`1950-1955`) %>% pull()
# [1] "1.06" "…" "1.06" "1.06"
# [5] "1.04" "1.06" "1.05" "1.04"
# [9] "1.05" "…" "1.06" "1.06"
简单而有效,只需在导入后将列转换为数字即可。发出警告,但谁在乎呢。
head(df)
# something v
# 1 -0.2168503 1.06
# 2 0.9863558 …
# 3 1.8623381 1.06
# 4 -1.0441477 1.06
# 5 0.4244308 1.04
# 6 1.5825152 1.06
df <- transform(df, v=as.numeric(v))
head(df)
# something v
# 1 -0.2168503 1.06
# 2 0.9863558 NA
# 3 1.8623381 1.06
# 4 -1.0441477 1.06
# 5 0.4244308 1.04
# 6 1.5825152 1.06
数据
df <- structure(list(something = c(0.344600422686915, 1.12754949114835,
0.264102711671497, -0.588052830551214, 0.916134405190614, 0.118418825652515,
-1.5711759894206, 0.561452729377526, -0.27524305006459, -0.611306705421411,
0.614179300117269, 0.765082495652037), v = c("1.06", "…", "1.06",
"1.06", "1.04", "1.06", "1.05", "1.04", "1.05", "…", "1.06",
"1.06")), class = "data.frame", row.names = c(NA, -12L))
正如您在此处看到的那样Print unicode character string in R问题在于您如何定义 unicode 字符。
试试这个:
readxl::read_xlsx("C:/Stack/WPP2019_FERT_F02_SEX_RATIO_AT_BIRTH.xlsx", sheet = 2, skip = 16, na="\U2026")
我正在尝试读取 excel 文件,这些文件使用三个水平点来表示缺失值,例如...
是否可以使用 read_excel()
将它们设置为 NA
?我为 na
参数尝试了不同的选项(见下文),none 似乎有效
d0 <- read_excel(path = "WPP2019_FERT_F02_SEX_RATIO_AT_BIRTH.xlsx)",
# na = "...", # does not work
# na = "…", # copying the output does not work
# na = "U+2026", # unicode character does not work
sheet = 2, skip = 16)
d0
# # A tibble: 255 x 21
# Index Variant `Region, subreg~ Notes `Country code` Type `Parent code` `1950-1955`
# <dbl> <chr> <chr> <chr> <dbl> <chr> <dbl> <chr>
# 1 1 Estima~ WORLD NA 900 World 0 1.06
# 2 2 Estima~ UN development ~ a 1803 Labe~ 900 …
# 3 3 Estima~ More developed ~ b 901 Deve~ 1803 1.06
# 4 4 Estima~ Less developed ~ c 902 Deve~ 1803 1.06
# 5 5 Estima~ Least developed~ d 941 Deve~ 902 1.04
# 6 6 Estima~ Less developed ~ e 934 Deve~ 902 1.06
# 7 7 Estima~ Less developed ~ NA 948 Deve~ 1803 1.05
# 8 8 Estima~ Land-locked Dev~ f 1636 Spec~ 1803 1.04
# 9 9 Estima~ Small Island De~ g 1637 Spec~ 1803 1.05
# 10 10 Estima~ World Bank inco~ NA 1802 Labe~ 900 …
# # ... with 245 more rows, and 13 more variables: `1955-1960` <chr>, `1960-1965` <chr>,
# # `1965-1970` <chr>, `1970-1975` <chr>, `1975-1980` <chr>, `1980-1985` <chr>,
# # `1985-1990` <chr>, `1990-1995` <chr>, `1995-2000` <chr>, `2000-2005` <chr>,
# # `2005-2010` <chr>, `2010-2015` <chr>, `2015-2020` <chr>
未创建 NA
且值不是数字类型的示例列...
d3 %>% select(`1950-1955`) %>% pull()
# [1] "1.06" "…" "1.06" "1.06"
# [5] "1.04" "1.06" "1.05" "1.04"
# [9] "1.05" "…" "1.06" "1.06"
简单而有效,只需在导入后将列转换为数字即可。发出警告,但谁在乎呢。
head(df)
# something v
# 1 -0.2168503 1.06
# 2 0.9863558 …
# 3 1.8623381 1.06
# 4 -1.0441477 1.06
# 5 0.4244308 1.04
# 6 1.5825152 1.06
df <- transform(df, v=as.numeric(v))
head(df)
# something v
# 1 -0.2168503 1.06
# 2 0.9863558 NA
# 3 1.8623381 1.06
# 4 -1.0441477 1.06
# 5 0.4244308 1.04
# 6 1.5825152 1.06
数据
df <- structure(list(something = c(0.344600422686915, 1.12754949114835,
0.264102711671497, -0.588052830551214, 0.916134405190614, 0.118418825652515,
-1.5711759894206, 0.561452729377526, -0.27524305006459, -0.611306705421411,
0.614179300117269, 0.765082495652037), v = c("1.06", "…", "1.06",
"1.06", "1.04", "1.06", "1.05", "1.04", "1.05", "…", "1.06",
"1.06")), class = "data.frame", row.names = c(NA, -12L))
正如您在此处看到的那样Print unicode character string in R问题在于您如何定义 unicode 字符。
试试这个:
readxl::read_xlsx("C:/Stack/WPP2019_FERT_F02_SEX_RATIO_AT_BIRTH.xlsx", sheet = 2, skip = 16, na="\U2026")