有没有办法确定数据集中有多少行对多个条件（列）具有相同的分类变量？

Question

例如，我有下面的数据集，其中 1 = 是，0 = 否，我需要计算有多少电话是通过固定电话拨打的，持续时间不到 10 分钟。

Image of example dataset

Answer 1

我们可以使用sum

sum(df1[, "under 10 minutes"])

如果需要两列

colSums(df1[, c("landline", "under 10 minutes")])

如果我们同时检查两列，请使用 rowSums

sum(rowSums(df1[, c("landline", "under 10 minutes")], na.rm = TRUE) == 2)

Answer 2

grep 函数查找 landline=1 的行。然后，我们只调用这些行和 sum 不到 10 分钟的列。

sum( df[ grep(1,df[,1]) ,4] )

Answer 3

R 会方便地将 1 和 0 视为 TRUE 和 FALSE，因此我们可以应用逻辑布尔运算，如 AND (&) 和 OR (|)。

df <- data.frame(x = c(1, 0, 1, 0), 
                 y = c(0, 0, 1, 1))

> sum(df$x & df$y)
[1] 1
> sum(df$x | df$y)
[1] 3

对于以后的问题，您应该查看如何使用 dput 之类的函数或其他方式来提供示例数据集，而不是使用图像。

Answer 4

您还可以在查找总和时专门定义要在每一列中查找的值。（如果您需要对列中值不是 1 的行进行计数，这将有所帮助。）

sum(df$landline == 1 & df$`under 10 minutes` == 1)

Is there a way to determine how many rows in a dataset have the same categorical variable for multiple conditions (columns)?