将 R 中的行与 Pivot 或 Spread 结合起来？

Question

这里我操作的是选举数据，目前的数据格式如下。包括视觉和编码示例（虽然视觉有点浓缩）。此外，值已从其原始值进行了编辑。

# Representative Example
library(tidyverse)
test.df <- tibble(yr=rep(1956),mn=rep(11),
             sub=rep("Alabama"),
             unit_type=rep("County"),
             unit_name=c("Autauga","Baldwin","Barbour"),
             TotalVotes=c(1000,2000,3000),
             RepVotes=c(500,1000,1500),
             RepCandidate=rep("Eisenhower"),
             DemVotes=c(500,1000,1500),
             DemCandidate=rep("Stevenson"),
             ThirdVotes=c(0,0,0),
             ThirdCandidate=rep("Uncommitted"),
             RepVotesTotalPerc=rep(50.00),
             DemVotesTotalPerc=rep(50.00),
             ThirdVotesTotalPerc=rep(0.00)
             )
----------------------------------------------------------------------------------------------------
yr   | mn  | sub  | unit_type | unit_name | TotalVotes | RepVotes |  RepCan  | DemVotes | DemCan 
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County    Autauga        1000        500      EisenHower   500     Stevenson
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County    Baldwin        2000       1000      EisenHower   1000    Stevenson
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County    Barbour        3000       2000      EisenHower   2000    Stevenson
----------------------------------------------------------------------------------------------------

我正在尝试获得如下所示的 table：

----------------------------------------------------------------------------------------------------
yr   | mn  | sub  | unit_type | unit_name |   pty_n   |   can    |   TotalVotes   | CanVotes
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County     Autauga    Republican   Eisenhower     1000          500 
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County     Autauga    Democrat     Stevenson      1000          500 
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County     Autauga    Independent  Uncommitted    1000            0 
----------------------------------------------------------------------------------------------------
# and etc. for other counties in example (Baldwin, Barbour, etc)

如您所见，我非常希望每个县有三个观察值，候选人都在一栏中，而他们各自的选票则在另一栏中（CanVotes，等等）。

我曾尝试使用 pivot_longer() 或 spread() 之类的东西，但我很难在代码中将它们可视化。如果能重新调整我的数据以获得候选列，同时移动其余数据，我们将不胜感激！

Answer 1

这是一个解决方案，首先使用 pivot_longer 将投票转换为长格式。然后我使用 mutate 和 case_when 将以前的列名称替换为实际的候选名称并删除单个候选列：

long_table <- pivot_longer(test.df,
                           cols = c(RepVotes, DemVotes, ThirdVotes),
                           names_to = "pty_n",
                           values_to = "CanVotes") %>% 
  mutate(can = case_when(
    pty_n == "RepVotes" ~ RepCandidate,
    pty_n == "DemVotes" ~ DemCandidate,
    pty_n == "ThirdVotes" ~ ThirdCandidate
  ),
  pty_n = case_when(
    pty_n == "RepVotes" ~ "Republican",
    pty_n == "DemVotes" ~ "Democrat",
    pty_n == "ThirdVotes" ~ "Independent"
  )) %>% 
  select(-c(RepCandidate, DemCandidate, ThirdCandidate))
# A tibble: 9 x 12
     yr    mn sub     unit_type unit_name TotalVotes RepVotesTotalPerc DemVotesTotalPerc ThirdVotesTotalPe~ pty_n      CanVotes can       
  <dbl> <dbl> <chr>   <chr>     <chr>          <dbl>             <dbl>             <dbl>              <dbl> <chr>         <dbl> <chr>     
1  1956    11 Alabama County    Autauga         1000                50                50                  0 Republican      500 Eisenhower
2  1956    11 Alabama County    Autauga         1000                50                50                  0 Democrat        500 Stevenson 
3  1956    11 Alabama County    Autauga         1000                50                50                  0 Independe~        0 Uncommitt~
4  1956    11 Alabama County    Baldwin         2000                50                50                  0 Republican     1000 Eisenhower
5  1956    11 Alabama County    Baldwin         2000                50                50                  0 Democrat       1000 Stevenson 
6  1956    11 Alabama County    Baldwin         2000                50                50                  0 Independe~        0 Uncommitt~
7  1956    11 Alabama County    Barbour         3000                50                50                  0 Republican     1500 Eisenhower
8  1956    11 Alabama County    Barbour         3000                50                50                  0 Democrat       1500 Stevenson 
9  1956    11 Alabama County    Barbour         3000                50                50                  0 Independe~        0 Uncommitt~

我尝试构建自定义 spec，但名称似乎必须从列名派生，不能直接以其他列为条件。

Answer 2

这里是data.table做事

library( data.table )
#convert data to the data.table-format
setDT( test.df )
#get the different paries to update the variable balter in
parties <- gsub( "Candidate", "", grep( "^.*Candidate$", names( test.df ), value = TRUE ) )
#melt to each candidate and his/her votes
DT.melt <- melt(test.df, 
                id.vars = c("yr", "mn", "sub", "unit_type", "unit_name"),
                measure.vars = patterns( can = "^.*Candidate$",
                                         canVotes = "^(Rep|Dem|Third)Votes$" ),
                variable.name = "pty_n"
                )
#get the totals from the original date (by unit_name) through joining
DT.melt[ test.df, TotalVotes := i.TotalVotes, on = .(unit_name)]
#and pass the correct party name to the pty_n column
DT.melt[, pty_n := parties[ pty_n ] ][]

#      yr mn     sub unit_type unit_name pty_n         can canVotes TotalVotes
# 1: 1956 11 Alabama    County   Autauga   Rep  Eisenhower      500       1000
# 2: 1956 11 Alabama    County   Baldwin   Rep  Eisenhower     1000       2000
# 3: 1956 11 Alabama    County   Barbour   Rep  Eisenhower     1500       3000
# 4: 1956 11 Alabama    County   Autauga   Dem   Stevenson      500       1000
# 5: 1956 11 Alabama    County   Baldwin   Dem   Stevenson     1000       2000
# 6: 1956 11 Alabama    County   Barbour   Dem   Stevenson     1500       3000
# 7: 1956 11 Alabama    County   Autauga Third Uncommitted        0       1000
# 8: 1956 11 Alabama    County   Baldwin Third Uncommitted        0       2000
# 9: 1956 11 Alabama    County   Barbour Third Uncommitted        0       3000

将 R 中的行与 Pivot 或 Spread 结合起来？

Combining Rows in R with Pivot or Spread?

pivot

pivot-table

r

data-manipulation

spread