在 R 中创建键值存储的问题

Problems creating a key value store in R

我正在尝试创建一个键值存储,键是实体,值是实体在新闻文章中的平均情绪得分。

我有一个包含新闻文章的数据框和一个名为 organizations1 的实体列表,这些实体在这些新闻文章中由分类器识别。 organization1 列表的第一行包含在 news_us 数据框第一行的文章中标识的实体。我正在尝试遍历组织列表并创建一个键值存储,键是 organization1 列表中的实体,值是提到该实体的新闻描述的情绪分数。我拥有的代码不会改变情绪列表中的分数,我也不知道为什么。我的第一个猜测是我必须在情绪列表上使用 $ 运算符来添加值,但这也没有改变任何东西。这是我目前的代码:

library(syuzhet)
sentiment <- list()
organization1 <- list(NULL, "US", "Bath", "Animal Crossing", "World Health Organization", 
    NULL, c("Microsoft", "Facebook"))
news_us <- structure(list(title = c("Stocks making the biggest moves after hours: Bed Bath & Beyond, JC Penney, United Airlines and more - CNBC", 
"Los Angeles mayor says 'very difficult to see' large gatherings like concerts and sporting events until 2021 - CNN", 
"Bed Bath & Beyond shares rise as earnings top estimates, retailer plans to maintain some key investments - CNBC", 
"6 weeks with Animal Crossing: New Horizons reveals many frustrations - VentureBeat", 
"Timeline: How Trump And WHO Reacted At Key Moments During The Coronavirus Crisis : Goats and Soda - NPR", 
"Michigan protesters turn out against Whitmer’s strict stay-at-home order - POLITICO"
), description = c("Check out the companies making headlines after the bell.", 
"Los Angeles Mayor Eric Garcetti said Wednesday large gatherings like sporting events or concerts may not resume in the city before 2021 as the US grapples with mitigating the novel coronavirus pandemic.", 
"Bed Bath & Beyond said that its results in 2020 \"will be unfavorably impacted\" by the crisis, and so it will not be offering a first-quarter nor full-year outlook.", 
"Six weeks with Animal Crossing: New Horizons has helped to illuminate some of the game's shortcomings that weren't obvious in our first review.", 
"How did the president respond to key moments during the pandemic? And how did representatives of the World Health Organization respond during the same period?", 
"Many demonstrators, some waving Trump campaign flags, ignored organizers‘ pleas to stay in their cars and flooded the streets of Lansing, the state capital."
), name = c("CNBC", "CNN", "CNBC", "Venturebeat.com", "Npr.org", 
"Politico")), na.action = structure(c(`35` = 35L, `95` = 95L, 
`137` = 137L, `154` = 154L, `213` = 213L, `214` = 214L, `232` = 232L, 
`276` = 276L, `321` = 321L), class = "omit"), row.names = c(NA, 
6L), class = "data.frame")
i = as.integer(0)
for(index in organizations1){
  i <- i+1
   if(is.character(index)) { #if entity is not null/NA
     val <- get_sentiment(news_us$description[i], method = "afinn")
     #print(val)
     print(sentiment[[index[1]]])
     sentiment[[index[1]]] <- sentiment[[index[1]]]+val
   }
}

这是 运行 上述代码块之后的情绪列表:

$US
integer(0)

$Bath
integer(0)

$`Animal Crossing`
integer(0)

$`World Health Organization`
integer(0)

$`Apple TV`
integer(0)

$`Pittsburgh Steelers`
integer(0)

而我希望它看起来像:

$US
1.3

$Bath
0.3

$`Animal Crossing`
2.4

$`World Health Organization`
1.2

$`Apple TV`
-0.7

$`Pittsburgh Steelers`
0.3

文章中标识的多个实体的值列可以有多个值。

我不确定 organization1news_us$description 之间的关系,但也许您打算像这样使用它?

library(syuzhet)

setNames(lapply(news_us$description, get_sentiment), unlist(organization1))

#$US
#[1] 0

#$Bath
#[1] -0.4

#$`Animal Crossing`
#[1] -0.1

#$`World Health Organization`
#[1] 1.1

#$Microsoft
#[1] -0.6

#$Facebook
#[1] -1.9