R:总结,删除列更改列表中数据框的名称并将结果保存到环境
R: summarize, drop columns change name of dataframe in a list and save result to env
这个主题是 this one and 的混合体。
我的麻烦来自于我无法将 functions/code 传递给 tibbles 列表的所有元素。我知道如何逐行得到想要的结果,但整体做不到。
对于主题,让我们采用结构与我的真实案例非常相似的两个小标题。
MyRes_tw <- structure(list(text = c("follow @SmartRE_Info and get your token in waves t.co/g3q4XelPaK #SmartRE",
"RT @investFeed: Make your FEED work for you - check out this blog on the power of the FEED token: t.co/JOHSCeitGc",
"RT @investFeed: WE HAVE NOW PASSED 8,000 $ETH IN OUR TOKEN SALE PURCHASED! t.co/bx7s1xWyXI #ICO #Tokensale t.co/ZFndFhUfVT"
), Tweet.id = c("889602043249254400", "889589518159945729", "889573909405679616"
), created.date = structure(c(17371, 17371, 17371), class = "Date"),
created.week = c(30, 30, 31), retweet = c(0, 0, 0), custom = c(0,
0, 0)), .Names = c("text", "Tweet.id", "created.date", "created.week",
"retweet", "custom"), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"))
MyRes1_tw <- structure(list(text = c("RT @AmbrosusAMB: We are on the front page of #NASDAQ / #Editorial Choice, Proud #Ethereum #Blockchain #ICO #TGE @Nasdaq @gavofyork @jutta_s…",
"RT @MyBit_DApp: 10 minutes left in #mybit #tokensale over 10,000 #ethereum contributed! Check it out t.co/AgyRCcyyzD",
"RT @MyBit_DApp: only 23 ETH left now", "RT @MyBit_DApp: #MyBit #tokensale ends in ~1 hour. 9k+ $ETH raised so far. Only 125 #ethereum left at 25% discount. t.co/AgyRCcyyzD",
"RT @MyBit_DApp: ~12 hours left in the t.co/AgyRCcyyzD #TokenSale #ICO 25% Bonus activated for #ethereum $ether #bitcoin $BTC $xbt"
), Tweet.id = c("897499492219445252", "897487635442274305", "897487621714305024",
"897487610494558208", "897487593117450244"), created.date = structure(c(17393,
17393, 17393, 17393, 17393), class = "Date"), created.week = c(33,
33, 34, 34, 34), retweet = c(0, 0, 0, 0, 0), custom = c(0, 0,
0, 0, 0)), .Names = c("text", "Tweet.id", "created.date", "created.week",
"retweet", "custom"), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
这两个df是来自推特的数据。我想对它们做一些整理以得到这些结果:
MyRes <- structure(list(created.week = c(33, 34, 35), retweet = c(12,
0, 8), custom = c(0, 0, 2), Twitter.name = c("MyRes", "MyRes",
"MyRes")), .Names = c("created.week", "retweet", "custom", "Twitter.name"
), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"
))
MyRes1 <- structure(list(created.week = c(33, 34, 35), retweet = c(12,
0, 8), custom = c(0, 0, 2), Twitter.name = c("MyRes1", "MyRes1",
"MyRes1")), .Names = c("created.week", "retweet", "custom", "Twitter.name"
), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"
))
请注意名称很重要,每个结果小标题的名称是从 _tw 开始的小标题的名称被删除。
另外请注意,在最终结果中,最后一列 $Twitter.name 应反映标题名称。
我列出我在我的环境中的tibbles myUser.tw <- ls(,pattern = "_tw")
因为它们是唯一以_tw[=44结尾的objects =].
我写这个函数是为了帮助:
MySummarize <- function(x){
summarise(group_by(x, created.week, Retweet.count = sum(retweet), Custom.count = sum(custom)))
}
痛苦来了!以下是我的工作代码:
testLst <- mget(myUser.tw) %>%
lapply(function(x) MySummarize(x)) %>%
list2env(testLst, envir = .GlobalEnv)
然后我找不到办法:
- 更改 df 的名称以获取 MyRes、MyRes1 作为名称
- 添加一列,所有行都包含上述文本 (MyRes, MyRes1)
- 将结果保存在我的环境中。
信不信由你,我已经关注这个很长时间了。我将不胜感激帮助完成我的整个代码。谢谢
不清楚 "the df" 指的是什么,但是如果 objective 是为了获得一个附加了源列的摘要列表:
library(dplyr)
myUser.tw %>%
mget(.GlobalEnv) %>%
lapply(MySummarize) %>%
bind_rows(.id = "source") %>%
mutate(source = sub("_tw$", "", source)) %>%
split(.$source)
给予:
$MyRes
# A tibble: 2 x 4
# Groups: created.week, Retweet.count [2]
source created.week Retweet.count Custom.count
<chr> <dbl> <dbl> <dbl>
1 MyRes 30 0 0
2 MyRes 31 0 0
$MyRes1
# A tibble: 2 x 4
# Groups: created.week, Retweet.count [2]
source created.week Retweet.count Custom.count
<chr> <dbl> <dbl> <dbl>
1 MyRes1 33 0 0
2 MyRes1 34 0 0
或者如果您想要单个数据框,请省略 split
。
一种可能的方法:
# list of tibbles with tw
myUser.tw.list <- mget(myUser.tw)
# perform lapply over the sequence of positions rather than the list of elements
myUser <- lapply(seq(myUser.tw),
function(i){
myUser.tw.list[i][[1]] %>% group_by(created.week) %>%
summarise(retweet = sum(retweet), custom = sum(custom)) %>%
ungroup() %>%
mutate(Twitter.name = gsub("_tw$", "", names(myUser.tw.list[i])))
})
names(myUser) <- gsub("_tw$", "", myUser.tw)
结果:带名称的 tibbles 列表
> myUser
$MyRes
# A tibble: 2 x 4
created.week retweet custom Twitter.name
<dbl> <dbl> <dbl> <chr>
1 30 0 0 MyRes
2 31 0 0 MyRes
$MyRes1
# A tibble: 2 x 4
created.week retweet custom Twitter.name
<dbl> <dbl> <dbl> <chr>
1 33 0 0 MyRes1
2 34 0 0 MyRes1
这个主题是 this one and
对于主题,让我们采用结构与我的真实案例非常相似的两个小标题。
MyRes_tw <- structure(list(text = c("follow @SmartRE_Info and get your token in waves t.co/g3q4XelPaK #SmartRE",
"RT @investFeed: Make your FEED work for you - check out this blog on the power of the FEED token: t.co/JOHSCeitGc",
"RT @investFeed: WE HAVE NOW PASSED 8,000 $ETH IN OUR TOKEN SALE PURCHASED! t.co/bx7s1xWyXI #ICO #Tokensale t.co/ZFndFhUfVT"
), Tweet.id = c("889602043249254400", "889589518159945729", "889573909405679616"
), created.date = structure(c(17371, 17371, 17371), class = "Date"),
created.week = c(30, 30, 31), retweet = c(0, 0, 0), custom = c(0,
0, 0)), .Names = c("text", "Tweet.id", "created.date", "created.week",
"retweet", "custom"), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"))
MyRes1_tw <- structure(list(text = c("RT @AmbrosusAMB: We are on the front page of #NASDAQ / #Editorial Choice, Proud #Ethereum #Blockchain #ICO #TGE @Nasdaq @gavofyork @jutta_s…",
"RT @MyBit_DApp: 10 minutes left in #mybit #tokensale over 10,000 #ethereum contributed! Check it out t.co/AgyRCcyyzD",
"RT @MyBit_DApp: only 23 ETH left now", "RT @MyBit_DApp: #MyBit #tokensale ends in ~1 hour. 9k+ $ETH raised so far. Only 125 #ethereum left at 25% discount. t.co/AgyRCcyyzD",
"RT @MyBit_DApp: ~12 hours left in the t.co/AgyRCcyyzD #TokenSale #ICO 25% Bonus activated for #ethereum $ether #bitcoin $BTC $xbt"
), Tweet.id = c("897499492219445252", "897487635442274305", "897487621714305024",
"897487610494558208", "897487593117450244"), created.date = structure(c(17393,
17393, 17393, 17393, 17393), class = "Date"), created.week = c(33,
33, 34, 34, 34), retweet = c(0, 0, 0, 0, 0), custom = c(0, 0,
0, 0, 0)), .Names = c("text", "Tweet.id", "created.date", "created.week",
"retweet", "custom"), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
这两个df是来自推特的数据。我想对它们做一些整理以得到这些结果:
MyRes <- structure(list(created.week = c(33, 34, 35), retweet = c(12,
0, 8), custom = c(0, 0, 2), Twitter.name = c("MyRes", "MyRes",
"MyRes")), .Names = c("created.week", "retweet", "custom", "Twitter.name"
), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"
))
MyRes1 <- structure(list(created.week = c(33, 34, 35), retweet = c(12,
0, 8), custom = c(0, 0, 2), Twitter.name = c("MyRes1", "MyRes1",
"MyRes1")), .Names = c("created.week", "retweet", "custom", "Twitter.name"
), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"
))
请注意名称很重要,每个结果小标题的名称是从 _tw 开始的小标题的名称被删除。
另外请注意,在最终结果中,最后一列 $Twitter.name 应反映标题名称。
我列出我在我的环境中的tibbles myUser.tw <- ls(,pattern = "_tw")
因为它们是唯一以_tw[=44结尾的objects =].
我写这个函数是为了帮助:
MySummarize <- function(x){
summarise(group_by(x, created.week, Retweet.count = sum(retweet), Custom.count = sum(custom)))
}
痛苦来了!以下是我的工作代码:
testLst <- mget(myUser.tw) %>%
lapply(function(x) MySummarize(x)) %>%
list2env(testLst, envir = .GlobalEnv)
然后我找不到办法:
- 更改 df 的名称以获取 MyRes、MyRes1 作为名称
- 添加一列,所有行都包含上述文本 (MyRes, MyRes1)
- 将结果保存在我的环境中。
信不信由你,我已经关注这个很长时间了。我将不胜感激帮助完成我的整个代码。谢谢
不清楚 "the df" 指的是什么,但是如果 objective 是为了获得一个附加了源列的摘要列表:
library(dplyr)
myUser.tw %>%
mget(.GlobalEnv) %>%
lapply(MySummarize) %>%
bind_rows(.id = "source") %>%
mutate(source = sub("_tw$", "", source)) %>%
split(.$source)
给予:
$MyRes
# A tibble: 2 x 4
# Groups: created.week, Retweet.count [2]
source created.week Retweet.count Custom.count
<chr> <dbl> <dbl> <dbl>
1 MyRes 30 0 0
2 MyRes 31 0 0
$MyRes1
# A tibble: 2 x 4
# Groups: created.week, Retweet.count [2]
source created.week Retweet.count Custom.count
<chr> <dbl> <dbl> <dbl>
1 MyRes1 33 0 0
2 MyRes1 34 0 0
或者如果您想要单个数据框,请省略 split
。
一种可能的方法:
# list of tibbles with tw
myUser.tw.list <- mget(myUser.tw)
# perform lapply over the sequence of positions rather than the list of elements
myUser <- lapply(seq(myUser.tw),
function(i){
myUser.tw.list[i][[1]] %>% group_by(created.week) %>%
summarise(retweet = sum(retweet), custom = sum(custom)) %>%
ungroup() %>%
mutate(Twitter.name = gsub("_tw$", "", names(myUser.tw.list[i])))
})
names(myUser) <- gsub("_tw$", "", myUser.tw)
结果:带名称的 tibbles 列表
> myUser
$MyRes
# A tibble: 2 x 4
created.week retweet custom Twitter.name
<dbl> <dbl> <dbl> <chr>
1 30 0 0 MyRes
2 31 0 0 MyRes
$MyRes1
# A tibble: 2 x 4
created.week retweet custom Twitter.name
<dbl> <dbl> <dbl> <chr>
1 33 0 0 MyRes1
2 34 0 0 MyRes1