项目计数描述性统计
Item count descriptive statistics
我有一个数据框
x userId bookId rating
1 1 412 6
2 1 454 5
3 2 412 4
等..
基本上,一个用户对很多本书进行了评分,一本书有很多评分。
我需要提取一些关于 userId
的描述性统计数据。给出的平均评分数、给出的平均评分等
谁能给我指出正确的方向?
您可以使用 data.table
:
进行这些计算
如果你的data.frame
被称为books
:
require(data.table)
setDT(books)
# average rating by user
books[, mean(rating), by=userId]
# userId V1
#1: 1 5.5
#2: 2 4.0
# average amount of ratings given :
books[, .N, by=userId][, mean(N)]
#[1] 1.5
我不确定我是否得到你的确切 question/task。但以下内容可以提供一些见解:
data = read.table(header = T, stringsAsFactors = F, text = "x userId bookId rating
1 1 412 6
2 1 454 5
3 2 412 4")
# Number of ratings per user
userFreq = data.frame(table(data$userId))
# Var1 Freq
# 1 1 2
# 2 2 1
# mean rating per userID
meanRatingPerUser = aggregate(data$rating, by=list(data$userId), FUN = mean )
# Group.1 x
# 1 1 5.5
# 2 2 4.0
# mean rating per book
meanRatingPerBook = aggregate(data$rating, by=list(data$bookId), FUN = mean )
# Group.1 x
# 1 412 5
# 2 454 5
# "Summary" function, applied per bookID
moreStats = aggregate(data$rating, by=list(data$bookId), FUN = summary )
# Group.1 x.Min. x.1st Qu. x.Median x.Mean x.3rd Qu. x.Max.
# 1 412 4.0 4.5 5.0 5.0 5.5 6.0
# 2 454 5.0 5.0 5.0 5.0 5.0 5.0
我有一个数据框
x userId bookId rating
1 1 412 6
2 1 454 5
3 2 412 4
等..
基本上,一个用户对很多本书进行了评分,一本书有很多评分。
我需要提取一些关于 userId
的描述性统计数据。给出的平均评分数、给出的平均评分等
谁能给我指出正确的方向?
您可以使用 data.table
:
如果你的data.frame
被称为books
:
require(data.table)
setDT(books)
# average rating by user
books[, mean(rating), by=userId]
# userId V1
#1: 1 5.5
#2: 2 4.0
# average amount of ratings given :
books[, .N, by=userId][, mean(N)]
#[1] 1.5
我不确定我是否得到你的确切 question/task。但以下内容可以提供一些见解:
data = read.table(header = T, stringsAsFactors = F, text = "x userId bookId rating
1 1 412 6
2 1 454 5
3 2 412 4")
# Number of ratings per user
userFreq = data.frame(table(data$userId))
# Var1 Freq
# 1 1 2
# 2 2 1
# mean rating per userID
meanRatingPerUser = aggregate(data$rating, by=list(data$userId), FUN = mean )
# Group.1 x
# 1 1 5.5
# 2 2 4.0
# mean rating per book
meanRatingPerBook = aggregate(data$rating, by=list(data$bookId), FUN = mean )
# Group.1 x
# 1 412 5
# 2 454 5
# "Summary" function, applied per bookID
moreStats = aggregate(data$rating, by=list(data$bookId), FUN = summary )
# Group.1 x.Min. x.1st Qu. x.Median x.Mean x.3rd Qu. x.Max.
# 1 412 4.0 4.5 5.0 5.0 5.5 6.0
# 2 454 5.0 5.0 5.0 5.0 5.0 5.0