R中多行的文本摘要

Text summary in R for multiple rows

我有一组短文本文件,我可以将它们组合成一个数据测试,以便每个文件都排成一行。

我正在尝试使用通用函数参数使用 LSAfun 包来总结内容 genericSummary(text,k,split=c(".","!","?"),min=5,breakdown=FALSE,...)

这对于单个文本输入非常有效,但在我的情况下却不行。在包说明中,它说文本输入应该是“长度(文本)= 1 的字符向量,指定要汇总的文本”。

请看这个例子

# Generate a dataset example (text examples were copied from wikipedia): 
 
dd = structure(list(text = structure(1:2, .Label = c("Forest gardening, a forest-based food production system, is the world's oldest form of gardening.[1] Forest gardens originated in prehistoric times along jungle-clad river banks and in the wet foothills of monsoon regions. In the gradual process of families improving their immediate environment, useful tree and vine species were identified, protected and improved while undesirable species were eliminated. Eventually foreign species were also selected and incorporated into the gardens.[2]\n\nAfter the emergence of the first civilizations, wealthy individuals began to create gardens for aesthetic purposes. Ancient Egyptian tomb paintings from the New Kingdom (around 1500 BC) provide some of the earliest physical evidence of ornamental horticulture and landscape design; they depict lotus ponds surrounded by symmetrical rows of acacias and palms. A notable example of ancient ornamental gardens were the Hanging Gardens of Babylon—one of the Seven Wonders of the Ancient World —while ancient Rome had dozens of gardens.\n\nWealthy ancient Egyptians used gardens for providing shade. Egyptians associated trees and gardens with gods, believing that their deities were pleased by gardens. Gardens in ancient Egypt were often surrounded by walls with trees planted in rows. Among the most popular species planted were date palms, sycamores, fir trees, nut trees, and willows. These gardens were a sign of higher socioeconomic status. In addition, wealthy ancient Egyptians grew vineyards, as wine was a sign of the higher social classes. Roses, poppies, daisies and irises could all also be found in the gardens of the Egyptians.\n\nAssyria was also renowned for its beautiful gardens. These tended to be wide and large, some of them used for hunting game—rather like a game reserve today—and others as leisure gardens. Cypresses and palms were some of the most frequently planted types of trees.\n\nGardens were also available in Kush. In Musawwarat es-Sufra, the Great Enclosure dated to the 3rd century BC included splendid gardens. [3]\n\nAncient Roman gardens were laid out with hedges and vines and contained a wide variety of flowers—acanthus, cornflowers, crocus, cyclamen, hyacinth, iris, ivy, lavender, lilies, myrtle, narcissus, poppy, rosemary and violets[4]—as well as statues and sculptures. Flower beds were popular in the courtyards of rich Romans.", 
"The Middle Ages represent a period of decline in gardens for aesthetic purposes. After the fall of Rome, gardening was done for the purpose of growing medicinal herbs and/or decorating church altars. Monasteries carried on a tradition of garden design and intense horticultural techniques during the medieval period in Europe. Generally, monastic garden types consisted of kitchen gardens, infirmary gardens, cemetery orchards, cloister garths and vineyards. Individual monasteries might also have had a \"green court\", a plot of grass and trees where horses could graze, as well as a cellarer's garden or private gardens for obedientiaries, monks who held specific posts within the monastery.\n\nIslamic gardens were built after the model of Persian gardens and they were usually enclosed by walls and divided in four by watercourses. Commonly, the centre of the garden would have a reflecting pool or pavilion. Specific to the Islamic gardens are the mosaics and glazed tiles used to decorate the rills and fountains that were built in these gardens.\n\nBy the late 13th century, rich Europeans began to grow gardens for leisure and for medicinal herbs and vegetables.[4] They surrounded the gardens by walls to protect them from animals and to provide seclusion. During the next two centuries, Europeans started planting lawns and raising flowerbeds and trellises of roses. Fruit trees were common in these gardens and also in some, there were turf seats. At the same time, the gardens in the monasteries were a place to grow flowers and medicinal herbs but they were also a space where the monks could enjoy nature and relax.\n\nThe gardens in the 16th and 17th century were symmetric, proportioned and balanced with a more classical appearance. Most of these gardens were built around a central axis and they were divided into different parts by hedges. Commonly, gardens had flowerbeds laid out in squares and separated by gravel paths.\n\nGardens in Renaissance were adorned with sculptures, topiary and fountains. In the 17th century, knot gardens became popular along with the hedge mazes. By this time, Europeans started planting new flowers such as tulips, marigolds and sunflowers."
), class = "factor")), class = "data.frame", row.names = c(NA, 
-2L))


# This code is trying to generate the summary into another column:

dd$sum = genericSummary(dd$text,k=1) 


这会报错Error in strsplit(text, split = split, fixed = T) : non-character argument

我认为这是因为使用了变量而不是单个文本

我的预期输出 是为每一行生成的摘要位于名为 dd$sum

的相应第二列中

我尝试使用 as.vector(dd$text) 但这不起作用。 (感觉还是把输出合并成一行)

我试图从 purrr 中阅读一些关于 map 函数的信息,但无法在这种情况下应用它,想知道是否有 r 编程经验的人可以提供帮助。

此外,如果您知道使用文本摘要包(例如 lexrankr 完成此部分的方法,这也将有效。我从这里尝试了他们的代码,但仍然无法正常工作。

谢谢

勾选class(dd$text)。这是一个因素,不是一个字符。

以下作品:

library(dplyr)
library(purrr)
dd %>% 
  mutate(text = as.character(text)) %>%
  mutate(sum = map(text, genericSummary, k = 1))