如何在 R 中获取一个变量相对于另一个变量的总数？这两个变量都是非数字的

Question

我有这个数据集，我正在尝试创建一个新变量 (n_commitments)，它将为我提供每个国家/地区的段落总数。我知道这是超级基本的，但我现在不知何故被困了一个小时。我认为这与两个变量都是字符类并且我想要一个数字作为输出有关。

请帮助我终于可以继续前进了。谢谢。

      structure(list(country = c("Afghanistan", "Afghanistan"), paragraphs = c("The representative of Afghanistan confirmed that his Government would ensure the transparency of its ongoing privatization programme. He stated that his Government would provide reports to WTO Members on developments in its privatisation programme, periodically and upon request, as long as the programme would be in existence, and along the lines of the information already provided to the Working Party during the accession process. The Working Party took note of this commitment. ", 
"The representative of Afghanistan confirmed that from the date of accession, State-trading enterprises (including State-owned and State-controlled enterprises, enterprises with special or exclusive privileges, and unitary enterprises) in Afghanistan would make any purchases or sales, which were not for the Government's own use or consumption, solely in accordance with commercial considerations, including price, quality, availability, marketability, transportation and other conditions of purchase or sale. He further confirmed that these State trading enterprises would afford the enterprises of other Members adequate opportunity, in accordance with customary business practice, to compete for participation in purchases from or sales to Afghanistan's State enterprises. The Working Party took note of these commitments.  "
)), row.names = 1:2, class = "data.frame")

    Columns: 8
$ country            <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanis…
$ category           <chr> "State Ownership and Privatization; State-Trading Entities", "State Ownership and Pr…
$ paragraphs         <chr> "The representative of Afghanistan confirmed that his Government would ensure the tr…
$ year_complete      <int> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, …
$ year_start         <int> 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, …
$ accession_duration <int> 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, …
$ wto                <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ n_commitments      <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", …

Answer 1

按国家/地区计算独特段落的方法如下：

df %>% 
  group_by(country) %>%
  summarize(n_unique_paragraphs = n_distinct(paragraphs))

如果如您所说，“数据的每一行都是一个独特的段落”，那么我们可以简化并只计算行数：

df %>% group_by(country) %>%
  summarize(n = n())

还有built-in效用函数：

df %>% count(country)

如何在 R 中获取一个变量相对于另一个变量的总数？这两个变量都是非数字的

How do I get the aggregate number of a variable against another variable in R? Both these variables are non-numeric

aggregate

r

dplyr