在 R 中创建新变量时引用列标签
Referencing column labels when creating new variables in R
我像这样将一个数据框抓取到 R 中:
page.201702050atl = read_html("http://www.pro-football-reference.com/boxscores/201702050atl.htm")
comments.201702050atl = page.201702050atl %>% html_nodes(xpath = "//comment()")
team.stats.201702050atl = comments.201702050atl[27] %>% html_text() %>% read_html() %>% html_node("#team_stats") %>% html_table()
> team.stats.201702050atl
NWE ATL
1 First Downs 37 17
2 Rush-Yds-TDs 25-104-2 18-104-1
3 Cmp-Att-Yd-TD-INT 43-63-466-2-1 17-23-284-2-0
4 Sacked-Yards 5-24 5-44
5 Net Pass Yards 442 240
6 Total Yards 546 344
7 Fumbles-Lost 1-1 1-1
8 Turnovers 2 1
9 Penalties-Yards 4-23 9-65
10 Third Down Conv. 7-14 1-8
11 Fourth Down Conv. 1-1 0-0
12 Time of Possession 40:31 23:27
> str(team.stats.201702050atl)
'data.frame': 12 obs. of 3 variables:
$ : chr "First Downs" "Rush-Yds-TDs" "Cmp-Att-Yd-TD-INT" "Sacked-Yards" ...
$ NWE: chr "37" "25-104-2" "43-63-466-2-1" "5-24" ...
$ ATL: chr "17" "18-104-1" "17-23-284-2-0" "5-44" ...
如您所见,R 将此 table 与已标记的第 2 列和第 3 列擦除。我想为这些列提供通用标签并将 c("", "NWE", "ATL")
移动到 table 本身,以便我可以使用它。此外,当我将行移动到 table 时,我想用我自己的文本填充那个空单元格。换句话说,我想以这样的方式结束:
> team.stats.201702050atl.a
V1 V2 V3
1 Tm NWE ATL
2 First Downs 37 17
3 Rush-Yds-TDs 25-104-2 18-104-1
4 Cmp-Att-Yd-TD-INT 43-63-466-2-1 17-23-284-2-0
5 Sacked-Yards 5-24 5-44
6 Net Pass Yards 442 240
7 Total Yards 546 344
8 Fumbles-Lost 1-1 1-1
9 Turnovers 2 1
10 Penalties-Yards 4-23 9-65
11 Third Down Conv. 7-14 1-8
12 Fourth Down Conv. 1-1 0-0
13 Time of Possession 40:31 23:27
我知道我可以做类似的事情:
team.stats.201702050atl.a = as.data.frame(t(team.stats.201702050atl))
team.stats.201702050atl.a$r1 = c("Tm", "NWE", "ATL")
team.stats.201702050atl = as.data.frame(t(team.stats.201702050atl.a))
...但是如何让 R 直接引用 team.stats.201702050atl$V2 和 team.stats.201702050atl$V3 中的列标签而不显式输入它们?并且,如何在该行的第一列中插入我自己的原始文本?
不需要转置,可以使用rbind将列名向量添加为一行,例如:
team.stats.201702050atl2 <- rbind(c("Tm", "NWE", "ATL"), team.stats.201702050atl)
或者直接使用 colnames rbind 列名,并添加缺少的 "Tm" 值:
team.stats.201702050atl2 <- rbind(colnames(team.stats.201702050atl), team.stats.201702050atl)
team.stats.201702050atl2[1,1] <- "Tm"
请参阅 ?colnames
和 ?rownames
以引用列名和行名。例如,您可以通过索引引用特定的列名。例如:colnames(team.stats.201702050atl2)[1]
或 colnames(team.stats.201702050atl2)[2:3]
,这给出了另一种方法:
team.stats.201702050atl2 <- rbind(c("Tm", colnames(team.stats.201702050atl)[2:3]), team.stats.201702050atl)
或变体:
team.stats.201702050atl2 <- rbind(c("Tm", colnames(team.stats.201702050atl)[2:ncol(team.stats.201702050atl)]), team.stats.201702050atl)
最后,使用 colnames 分配新的列名:
colnames(team.stats.201702050atl2) <- c("V1", "V2", "V3")
我像这样将一个数据框抓取到 R 中:
page.201702050atl = read_html("http://www.pro-football-reference.com/boxscores/201702050atl.htm")
comments.201702050atl = page.201702050atl %>% html_nodes(xpath = "//comment()")
team.stats.201702050atl = comments.201702050atl[27] %>% html_text() %>% read_html() %>% html_node("#team_stats") %>% html_table()
> team.stats.201702050atl
NWE ATL
1 First Downs 37 17
2 Rush-Yds-TDs 25-104-2 18-104-1
3 Cmp-Att-Yd-TD-INT 43-63-466-2-1 17-23-284-2-0
4 Sacked-Yards 5-24 5-44
5 Net Pass Yards 442 240
6 Total Yards 546 344
7 Fumbles-Lost 1-1 1-1
8 Turnovers 2 1
9 Penalties-Yards 4-23 9-65
10 Third Down Conv. 7-14 1-8
11 Fourth Down Conv. 1-1 0-0
12 Time of Possession 40:31 23:27
> str(team.stats.201702050atl)
'data.frame': 12 obs. of 3 variables:
$ : chr "First Downs" "Rush-Yds-TDs" "Cmp-Att-Yd-TD-INT" "Sacked-Yards" ...
$ NWE: chr "37" "25-104-2" "43-63-466-2-1" "5-24" ...
$ ATL: chr "17" "18-104-1" "17-23-284-2-0" "5-44" ...
如您所见,R 将此 table 与已标记的第 2 列和第 3 列擦除。我想为这些列提供通用标签并将 c("", "NWE", "ATL")
移动到 table 本身,以便我可以使用它。此外,当我将行移动到 table 时,我想用我自己的文本填充那个空单元格。换句话说,我想以这样的方式结束:
> team.stats.201702050atl.a
V1 V2 V3
1 Tm NWE ATL
2 First Downs 37 17
3 Rush-Yds-TDs 25-104-2 18-104-1
4 Cmp-Att-Yd-TD-INT 43-63-466-2-1 17-23-284-2-0
5 Sacked-Yards 5-24 5-44
6 Net Pass Yards 442 240
7 Total Yards 546 344
8 Fumbles-Lost 1-1 1-1
9 Turnovers 2 1
10 Penalties-Yards 4-23 9-65
11 Third Down Conv. 7-14 1-8
12 Fourth Down Conv. 1-1 0-0
13 Time of Possession 40:31 23:27
我知道我可以做类似的事情:
team.stats.201702050atl.a = as.data.frame(t(team.stats.201702050atl))
team.stats.201702050atl.a$r1 = c("Tm", "NWE", "ATL")
team.stats.201702050atl = as.data.frame(t(team.stats.201702050atl.a))
...但是如何让 R 直接引用 team.stats.201702050atl$V2 和 team.stats.201702050atl$V3 中的列标签而不显式输入它们?并且,如何在该行的第一列中插入我自己的原始文本?
不需要转置,可以使用rbind将列名向量添加为一行,例如:
team.stats.201702050atl2 <- rbind(c("Tm", "NWE", "ATL"), team.stats.201702050atl)
或者直接使用 colnames rbind 列名,并添加缺少的 "Tm" 值:
team.stats.201702050atl2 <- rbind(colnames(team.stats.201702050atl), team.stats.201702050atl)
team.stats.201702050atl2[1,1] <- "Tm"
请参阅 ?colnames
和 ?rownames
以引用列名和行名。例如,您可以通过索引引用特定的列名。例如:colnames(team.stats.201702050atl2)[1]
或 colnames(team.stats.201702050atl2)[2:3]
,这给出了另一种方法:
team.stats.201702050atl2 <- rbind(c("Tm", colnames(team.stats.201702050atl)[2:3]), team.stats.201702050atl)
或变体:
team.stats.201702050atl2 <- rbind(c("Tm", colnames(team.stats.201702050atl)[2:ncol(team.stats.201702050atl)]), team.stats.201702050atl)
最后,使用 colnames 分配新的列名:
colnames(team.stats.201702050atl2) <- c("V1", "V2", "V3")