从 R 中的数据框构建句子
Building Sentences from a dataframe in R
我正在尝试从数据框生成句子
下面是数据框
# Code
mycode <- c("AAABBB", "AAABBB", "AAACCC", "AAABBD")
mycode <- sample(mycode, 20, replace = TRUE)
# Date
mydate <-c("2016-10-17","2016-10-18","2016-10-19","2016-10-20")
mydate <-sample(mydate, 20, replace = TRUE)
# resort
myresort <-c("GB","IE","GR","DK")
myresort <-sample(myresort, 20, replace = TRUE)
# Number of holidaymakers
HolidayMakers <- sample(1000, 20, replace = TRUE)
mydf <- data.frame(mycode,
mydate,
myresort,
HolidayMakers)
所以如果我们以 mycode
为例,我想创建一个像 "For the code mycode
, the biggest destinations are myresorts
where the top days of visiting were mydate
with a total of HolidayMakers
"
这样的句子
如果我们假设每个代码有多行。我想要的是一个句子,例如,我想说的不是每个 mydate
和 myresort
一个句子,而是像
"For the code AAABBB, the biggest destinations are GB,GR,DK,IE where the top days of visiting were 2016-10-17,2016-10-18,2016-10-19 with a total of 650"
根据 mycode
,650 基本上是那些天所有这些国家/地区的所有度假者的总和
有人帮忙吗?
感谢您的宝贵时间
你可以试试:
library(dplyr)
res <- mydf %>%
group_by(mycode) %>%
summarise(d = toString(unique(mydate)),
r = toString(unique(myresort)),
h = sum(HolidayMakers)) %>%
mutate(s = paste("For the code", mycode,
"the biggest destinations are", r,
"where the top days of visiting were", d,
"with a total of", h))
给出:
> res$s
#[1] "For the code AAABBB the biggest destinations are GB, GR, IE, DK
# where the top days of visiting were 2016-10-17, 2016-10-18,
# 2016-10-20, 2016-10-19 with a total of 6577"
#[2] "For the code AAABBD the biggest destinations are IE
# where the top days of visiting were 2016-10-17, 2016-10-18
# with a total of 1925"
#[3] "For the code AAACCC the biggest destinations are IE, GR, DK
# where the top days of visiting were 2016-10-20, 2016-10-17,
# 2016-10-19, 2016-10-18 with a total of 2878"
注意:由于您没有就打算如何计算 "top visiting days" 提供任何指导,我只是将所有天数包括在内。您可以轻松编辑以上内容以适合您的实际情况。
我正在尝试从数据框生成句子 下面是数据框
# Code
mycode <- c("AAABBB", "AAABBB", "AAACCC", "AAABBD")
mycode <- sample(mycode, 20, replace = TRUE)
# Date
mydate <-c("2016-10-17","2016-10-18","2016-10-19","2016-10-20")
mydate <-sample(mydate, 20, replace = TRUE)
# resort
myresort <-c("GB","IE","GR","DK")
myresort <-sample(myresort, 20, replace = TRUE)
# Number of holidaymakers
HolidayMakers <- sample(1000, 20, replace = TRUE)
mydf <- data.frame(mycode,
mydate,
myresort,
HolidayMakers)
所以如果我们以 mycode
为例,我想创建一个像 "For the code mycode
, the biggest destinations are myresorts
where the top days of visiting were mydate
with a total of HolidayMakers
"
如果我们假设每个代码有多行。我想要的是一个句子,例如,我想说的不是每个 mydate
和 myresort
一个句子,而是像
"For the code AAABBB, the biggest destinations are GB,GR,DK,IE where the top days of visiting were 2016-10-17,2016-10-18,2016-10-19 with a total of 650"
根据 mycode
,650 基本上是那些天所有这些国家/地区的所有度假者的总和有人帮忙吗?
感谢您的宝贵时间
你可以试试:
library(dplyr)
res <- mydf %>%
group_by(mycode) %>%
summarise(d = toString(unique(mydate)),
r = toString(unique(myresort)),
h = sum(HolidayMakers)) %>%
mutate(s = paste("For the code", mycode,
"the biggest destinations are", r,
"where the top days of visiting were", d,
"with a total of", h))
给出:
> res$s
#[1] "For the code AAABBB the biggest destinations are GB, GR, IE, DK
# where the top days of visiting were 2016-10-17, 2016-10-18,
# 2016-10-20, 2016-10-19 with a total of 6577"
#[2] "For the code AAABBD the biggest destinations are IE
# where the top days of visiting were 2016-10-17, 2016-10-18
# with a total of 1925"
#[3] "For the code AAACCC the biggest destinations are IE, GR, DK
# where the top days of visiting were 2016-10-20, 2016-10-17,
# 2016-10-19, 2016-10-18 with a total of 2878"
注意:由于您没有就打算如何计算 "top visiting days" 提供任何指导,我只是将所有天数包括在内。您可以轻松编辑以上内容以适合您的实际情况。