R - 警告:"argument is not an atomic vector" 尝试删除空格时

R - Warning: "argument is not an atomic vector" when attempting to remove whitespace

我正处于分析前整理数据的最后阶段,在删除数据中的空格时遇到了一个我无法真正理解的问题 table。有关代码中步骤的说明,请参阅下面的完整代码。

从下一页 (How to remove all whitespace from a string?) 开始,并尝试通过其他页面讨论 errors/warning 原子向量进行故障排除,但运气不佳。

在第 6 步,我收到了流动警告

In stri_replace_all_fixed(allData, " ", "") :
  argument is not an atomic vector; coercing

并且在第 7 步出现以下警告

> #Change sold and taxed columes from character to numerical
> allData$SoldAmount <- as.numeric(allData$SoldAmount)
Warning message:
NAs introduced by coercion 
> allData$Tax <- as.numeric(allData$Tax)
Warning message:
NAs introduced by coercion

第 6 步和第 7 步似乎都 运行,但结果在两个列中最终为 NA(见图)

Result after wihtespace are removed

下面列出了完整的代码,我希望获得有关如何获得第 6 步和第 7 步的一些建议,以便为我提供没有空格且为数字的列。

#Step 1: Load needed library 
library(tidyverse) 
library(rvest) 
library(jsonlite)
library(stringi)

#Step 2: Access the URL 
url <- "https://www.forsvarsbygg.no/ListApi/ListContent/78635/SoldEstates/0/10/" 

#Step 3: Direct JSON as format of data in URL 
data <- jsonlite::fromJSON(url, flatten = TRUE) 

#Step 4: Access all items in API 
totalItems <- data$TotalNumberOfItems 

#Step 5: Summarize all data from API 
allData <- paste0('https://www.forsvarsbygg.no/ListApi/ListContent/78635/SoldEstates/0/', totalItems,'/') %>% 
  jsonlite::fromJSON(., flatten = TRUE) %>% 
  .[1] %>% 
  as.data.frame() %>% 
  rename_with(~str_replace(., "ListItems.", ""), everything())

#Step 6: removing colums not needed
allData <- allData[, -c(1,4,8,9,11,12,13,14,15)]

#Step 6: remove whitespace in all colums
stri_replace_all_fixed(allData, " ", "")

#Step 7: Change sold and taxed columes from character to numerical
allData$SoldAmount <- as.numeric(allData$SoldAmount)
allData$Tax <- as.numeric(allData$Tax)

你调用 stri_replace_all_fixed(allData, " ", "") 但 ignore/discard 它的输出。 保存在某处。

#Step 6: remove whitespace in all colums
allData[] <- lapply(allData, gsub, pattern = " ", replacement = "")

#Step 7: Change sold and taxed columes from character to numerical
allData$SoldAmount <- as.numeric(allData$SoldAmount)
allData$Tax <- as.numeric(allData$Tax)
head(allData)
#     County Municipality      Tax SoldAmount           Type Date
# 1 Akershus        FROGN  2400000    2550000          Bolig 2004
# 2 Akershus        FROGN  2225000    2100000          Bolig 2004
# 3 Akershus          SKI  7600000   18000000    Næringstomt 2006
# 4  Østfold    SARPSBORG  3000000    3815000           Tomt 2004
# 5  Østfold        RYGGE 10000000   16000000 Næringseiendom 2006
# 6 Vestfold       LARVIK    61950      61950           Tomt 2013

或者,只对您需要的列执行一次:

# allData <- paste0(...) %>% ...
allData <- allData[, -c(1,4,8,9,11,12,13,14,15)]
allData[c("Tax", "SoldAmount")] <- lapply(allData[c("Tax", "SoldAmount")], function(z) as.numeric(gsub(" ", "", z)))
head(allData)
#     County Municipality      Tax SoldAmount           Type Date
# 1 Akershus        FROGN  2400000    2550000          Bolig 2004
# 2 Akershus        FROGN  2225000    2100000          Bolig 2004
# 3 Akershus          SKI  7600000   18000000    Næringstomt 2006
# 4  Østfold    SARPSBORG  3000000    3815000           Tomt 2004
# 5  Østfold        RYGGE 10000000   16000000 Næringseiendom 2006
# 6 Vestfold       LARVIK    61950      61950           Tomt 2013

仅替换这两列的特殊性很重要,因为其他列中有许多值有空格,我不知道您是打算将它们全部压缩:

str(sapply(allData, function(z) unique(grep(" ", z, value = TRUE)), simplify = FALSE))
# List of 6
#  $ County      : chr [1:2] "Møre og Romsdal" "Sogn- og fjordane"
#  $ Municipality: chr [1:4] "EVJE OG HORNNES" "VESTRE TOTEN" "ØSTRE TOTEN" "NORDRE LAND"
#  $ Tax         : chr [1:414] " 2 400 000" " 2 225 000" " 7 600 000" " 3 000 000" ...
#  $ SoldAmount  : chr [1:538] " 2 550 000" " 2 100 000" " 18 000 000" " 3 815 000" ...
#  $ Type        : chr "Annen kategori"
#  $ Date        : chr(0)