nchar(Tony.raw$neighborhood_overview) 中的错误:'nchar()' 需要字符向量

Error in nchar(Tony.raw$neighborhood_overview) : 'nchar()' requires a character vector

link 到 .CSV 数据 https://drive.google.com/open?id=1mGsy52nZtRNpAFEWiWaJHB2nsm2hnvsU

nchar(Tony.raw$neighborhood_overview) 错误: 'nchar()' 需要字符向量

我不知道为什么 nchar 不能读入 neighborhood_overview 列

我有一个作业,其中提供了 CSV 文件,用于从问卷中获取有关丹佛社区社会统计数据的数据。我需要计算某些数据列的字符长度,然后将它们绘制成图表以表示数据中可用的某些观点。

我将在不同的数据列上尝试相同的代码,看看我得到了什么。

#Load up the .CSV data and explore in RStudio
Tony.raw <- read.csv("denver_listings.csv", stringsAsFactors = FALSE)
View(Tony.raw)

# Clean up the data frame and view our handiwork.
Tony.raw <- Tony.raw[, c("description", "neighborhood_overview")]
View(Tony.raw)

# Check data to see if there are missing values.
length(which(!complete.cases(Tony.raw)))

#Convert our class label into a factor.
Tony.raw$neighborhood_overview <- 
as.factor(which(complete.cases(Tony.raw$neighborhood_overview)))

# The first step , as always, is to expore the data.
#First, let's take a look at distribution of the class labels (i.e., ham 
vs. spam),
prop.table(table(Tony.raw$neighborhood_overview))

#Next up , let's get a feel for the distribution of text lengths of the 
SMS
# messages by adding a new dearture for the length of each message.
Tony.raw$TextLength <- nchar(Tony.raw$neighborhood_overview)
summary(Tony.raw$TextLength)

#Visualize distribution with ggplot2, adding segmentation for ham/spam
library(ggplot2)

ggplot(Tony.raw, aes(x=TextLength, fill = neighborhood_overview)) +
  theme_bw() +
  geom_histogram(binwidth = 5) +
  labs(y = "Text Count", x = "Length of Text",
       title = "Distribution of Text Lengths with class Labels")

将 Tony.raw$TextLength 设置为 Tony.raw$neighborhood_overview 的 nchar,我应该能够计算出字符数,然后用 ggplot2 将其绘制到图表中。但它说 nchar 需要一个字符向量。是描述数据不是字符还是列标签不是字符?我不知道。

在代码的第四个块中,您已将 Tony.raw$neighborhood_overview 变成了 factor。 你需要

nchar(labels(Tony.raw$neighborhood_overview)[Tony.raw$neighborhood_overview])

而不是nchar(Tony.raw$neighborhood_overview) 获取因子标签的 nchar

当你写 nchar(Tony.raw$neighborhood_overview) 时,它会在因子的 级别 上调用 nchar,它们是整数从 1 到级别数的值,并在 nchar 获取数字而不是字符串时抛出错误。