R中新列中值的计数

Question

我有一个以交易 ID 和产品名称作为列的数据框。我正在尝试创建一个第三列，它为我提供交易 ID 的计数。最终数据框应如下所示。

    TID       Product        Orders         
    100       iPhone           2  
    100       Samsung          2  
    101       Lenovo           3  
    101       iPad             3  
    101       Galaxy           3  
    102       iPhone           1  
    103       HTC              1

我尝试使用长度函数，但它给出了整个列的长度，而不是单个 TID 的长度。

df$Orders <- length(df$Tid)

我也试过使用 sqldf 函数，如图所示。但这只给出了不同的 TID 值。

test <- sqldf("Select TID, count(TID) as Orders, Product from df Group By TID")

Answer 1

我们可以使用其中一个按组聚合的函数。使用 dplyr，我们按 'TID' 列分组，使用 mutate[= 创建一个新列 'Orders' 作为每组 (n()) 内的观察数22=]

library(dplyr)
df1 %>%
  group_by(TID)%>%
  mutate(Orders=n())
#    TID Product Orders
#1 100  iPhone      2
#2 100 Samsung      2
#3 101  Lenovo      3
#4 101    iPad      3
#5 101  Galaxy      3
#6 102  iPhone      1
#7 103     HTC      1

或使用 data.table，我们将 'data.frame' 转换为 'data.table' (setDT(df1))。按 'Product' 分组，我们创建一个新列 ('Orders') 作为每个组 (.N) 中的观察数。

library(data.table)
setDT(df1)[, Orders:=.N, by=Product]

或带有 sqldf 的选项，其中我们 left join 原始数据集和修改后的数据集。

library(sqldf)
sqldf('Select * from df1
       left join(select TID, 
        count(TID) as Orders 
        from df1
        group by TID) 
        using (TID)')
    using (TID)')
#  TID Product Orders
#1 100  iPhone      2
#2 100 Samsung      2
#3 101  Lenovo      3
#4 101    iPad      3
#5 101  Galaxy      3
#6 102  iPhone      1
#7 103     HTC      1

数据

df1 <- structure(list(TID = c(100L, 100L, 101L, 101L, 101L, 102L, 103L
), Product = c("iPhone", "Samsung", "Lenovo", "iPad", "Galaxy", 
"iPhone", "HTC")), .Names = c("TID", "Product"), row.names = c(NA, 
-7L), class = "data.frame")

Answer 2

您可以使用 data.table 包：

library(data.table)
setDT(df)
df[, .(Orders = .N), by = Product]

Answer 3

基础包：

df1$count <- ave(df1$TID, df1$TID, FUN=length)

输出：

  TID Product count
1 100  iPhone     2
2 100 Samsung     2
3 101  Lenovo     3
4 101    iPad     3
5 101  Galaxy     3
6 102  iPhone     1
7 103     HTC     1

R中新列中值的计数

Count of a value in a new column in R

r

count

sqldf

数据