R从当前数据帧创建新数据帧
R make new data frame from current one
我正在尝试计算 2014 年世界杯小组赛阶段的最佳净胜球。
football <- read.csv(
file="http://pastebin.com/raw.php?i=iTXdPvGf",
header = TRUE,
strip.white = TRUE
)
football <- head(football,n=48L)
football[which(max(abs(football$home_score - football$away_score)) == abs(football$home_score - football$away_score)),]
结果
home home_continent home_score away away_continent away_score result
4 Cameroon Africa 0 Croatia Europe 4 l
7 Spain Europe 1 Netherlands Europe 5 l
37 Germany
这些是进球数最高的比赛,但现在我需要制作一个包含球队名称的新数据框,并且 abs(football$home_score-football$away_score)
football$score_diff <- abs(football$home_score - football$away_score)
football$winner <- ifelse(football$home_score > football$away_score, as.character(football$home),
ifelse(football$result == "d", NA, as.character(football$away)))
您可以通过这种方式节省一些输入时间。您首先获得分数差异和获胜者。当结果显示 w
时,主场获胜。所以你根本不必查看分数。添加得分差异和获胜者后,您可以通过使用 max()
.
对数据进行子集化来对数据进行子集化
mydf <- read.csv(file="http://pastebin.com/raw.php?i=iTXdPvGf",
header = TRUE, strip.white = TRUE)
mydf <- head(mydf,n = 48L)
library(dplyr)
mutate(mydf, scorediff = abs(home_score - away_score),
winner = ifelse(result == "w", as.character(home),
ifelse(result == "l", as.character(away), "draw"))) %>%
filter(scorediff == max(scorediff))
# home home_continent home_score away away_continent away_score result scorediff winner
#1 Cameroon Africa 0 Croatia Europe 4 l 4 Croatia
#2 Spain Europe 1 Netherlands Europe 5 l 4 Netherlands
#3 Germany Europe 4 Portugal Europe 0 w 4 Germany
这是另一个不使用 ifelse
来创建 "winner" 列的选项。这是基于 row/column 索引。数字列索引是通过将结果列与其唯一元素 (match(football$result,..
) 匹配而创建的,行索引只是 1:nrow(football)
。将 "football" 数据集的 'home'、'away' 和 cbind
列与带有 NA 的附加列 'draw' 进行子集,以便 'd' 中的元素 "result" 改为 NA。
football$score_diff <- abs(football$home_score - football$away_score)
football$winner <- cbind(football[c('home', 'away')],draw=NA)[
cbind(1:nrow(football), match(football$result, c('w', 'l', 'd')))]
football[with(football, score_diff==max(score_diff)),]
# home home_continent home_score away away_continent away_score result
#60 Brazil South America 1 Germany Europe 7 l
# score_diff winner
#60 6 Germany
如果数据集非常大,您可以使用 library(data.table)
中的 chmatch
来加快 match
library(data.table)
chmatch(as.character(football$result), c('w', 'l', 'd'))
注意:我在 link
中使用了完整的数据集
我正在尝试计算 2014 年世界杯小组赛阶段的最佳净胜球。
football <- read.csv(
file="http://pastebin.com/raw.php?i=iTXdPvGf",
header = TRUE,
strip.white = TRUE
)
football <- head(football,n=48L)
football[which(max(abs(football$home_score - football$away_score)) == abs(football$home_score - football$away_score)),]
结果
home home_continent home_score away away_continent away_score result
4 Cameroon Africa 0 Croatia Europe 4 l
7 Spain Europe 1 Netherlands Europe 5 l
37 Germany
这些是进球数最高的比赛,但现在我需要制作一个包含球队名称的新数据框,并且 abs(football$home_score-football$away_score)
football$score_diff <- abs(football$home_score - football$away_score)
football$winner <- ifelse(football$home_score > football$away_score, as.character(football$home),
ifelse(football$result == "d", NA, as.character(football$away)))
您可以通过这种方式节省一些输入时间。您首先获得分数差异和获胜者。当结果显示 w
时,主场获胜。所以你根本不必查看分数。添加得分差异和获胜者后,您可以通过使用 max()
.
mydf <- read.csv(file="http://pastebin.com/raw.php?i=iTXdPvGf",
header = TRUE, strip.white = TRUE)
mydf <- head(mydf,n = 48L)
library(dplyr)
mutate(mydf, scorediff = abs(home_score - away_score),
winner = ifelse(result == "w", as.character(home),
ifelse(result == "l", as.character(away), "draw"))) %>%
filter(scorediff == max(scorediff))
# home home_continent home_score away away_continent away_score result scorediff winner
#1 Cameroon Africa 0 Croatia Europe 4 l 4 Croatia
#2 Spain Europe 1 Netherlands Europe 5 l 4 Netherlands
#3 Germany Europe 4 Portugal Europe 0 w 4 Germany
这是另一个不使用 ifelse
来创建 "winner" 列的选项。这是基于 row/column 索引。数字列索引是通过将结果列与其唯一元素 (match(football$result,..
) 匹配而创建的,行索引只是 1:nrow(football)
。将 "football" 数据集的 'home'、'away' 和 cbind
列与带有 NA 的附加列 'draw' 进行子集,以便 'd' 中的元素 "result" 改为 NA。
football$score_diff <- abs(football$home_score - football$away_score)
football$winner <- cbind(football[c('home', 'away')],draw=NA)[
cbind(1:nrow(football), match(football$result, c('w', 'l', 'd')))]
football[with(football, score_diff==max(score_diff)),]
# home home_continent home_score away away_continent away_score result
#60 Brazil South America 1 Germany Europe 7 l
# score_diff winner
#60 6 Germany
如果数据集非常大,您可以使用 library(data.table)
chmatch
来加快 match
library(data.table)
chmatch(as.character(football$result), c('w', 'l', 'd'))
注意:我在 link
中使用了完整的数据集