如何使用向量将数据帧缩减为单行
How to reduce a data frame into single row with vectors
我有这个DF
email date user_ipaddress other data
1 x@bla.com 2020-03-24 177.95.75.230 xxxx
2 x@bla.com 2020-04-02 177.139.49.93 yyyy
3 x@bla.com 2020-04-02 177.139.49.93 zzzz
并且我想将此数据转换为要存储的形状
整个问题将是一个包含不同电子邮件的大数据框,我想像这样在一行中减少每封电子邮件的所有数据
email date user_ipaddress other data
1 x@bla.com 2020-04-02 c('177.95.75.230','177.139.49.93') c('xxxx','yyyy','zzzz')
实际上,如果有人可以帮助我解决只有一个电子邮件地址的情况,那将挽救我的生命,但请随时帮助解决整个问题
使用
ipadreessVec<-Reduce(append,x =df$network_userid)
我可以获得我的矢量 c('177.95.75.230','177.139.49.93')
但如果我尝试制作
newdf$network_userid<-a
我明白了
Error in `$<-.data.frame`(`*tmp*`, network_userid, value = c("20562206-f557-48a3-861b-5d1e18524bbb", :
replacement has 3 rows, data has 1
任何让我更进一步的答案都会得到批准,即使它不能解决所有问题。
我们可以创建一个按 'email'、'date'
分组的 list
列
library(dplyr)
DF %>%
group_by(email, date) %>%
summarise_all(list)
# A tibble: 2 x 4
# Groups: email [1]
# email date user_ipaddress otherdata
# <chr> <chr> <list> <list>
#1 x@bla.com 2020-03-24 <chr [1]> <chr [1]>
#2 x@bla.com 2020-04-02 <chr [2]> <chr [2]>
或者在 devel
版本中使用 across
和 summarise
DF %>%
group_by(email, date) %>%
summarise(across(everything(), list))
# A tibble: 2 x 4
# Groups: email [1]
# email date user_ipaddress otherdata
# <chr> <chr> <list> <list>
#1 x@bla.com 2020-03-24 <chr [1]> <chr [1]>
#2 x@bla.com 2020-04-02 <chr [2]> <chr [2]>
数据
DF <- structure(list(email = c("x@bla.com", "x@bla.com", "x@bla.com"
), date = c("2020-03-24", "2020-04-02", "2020-04-02"),
user_ipaddress = c("177.95.75.230",
"177.139.49.93", "177.139.49.93"),
otherdata = c("xxxx", "yyyy",
"zzzz")), class = "data.frame", row.names = c("1", "2", "3"))
我可能误会了你,你似乎更可能想要@akrun 节目之类的东西,但从字面上解释你,你可能想要使用 dput
:
的东西
as.data.frame(lapply(df, function(x) capture.output(dput(unique(x)))))
#> email date user_ipaddress
#> 1 "x@bla.com" c("2020-03-24", "2020-04-02") c("177.95.75.230", "177.139.49.93")
#> other
#> 1 c("xxxx", "yyyy", "zzzz")
library('data.table')
通过电子邮件和日期:
setDT(df)[, .(user_ipaddress = paste0(user_ipaddress, collapse = ","),
other = paste0(other_data, collapse = ",")), by = .(email, date)]
# email date user_ipaddress other
# 1: x@bla.com 2020-03-24 177.95.75.230 xxxx
# 2: x@bla.com 2020-04-02 177.139.49.93,177.139.49.93 yyyy,zzzz
仅通过电子邮件:
setDT(df)[, .(date = paste0(date, collapse = ","),
user_ipaddress = paste0(user_ipaddress, collapse = ","),
other = paste0(other_data, collapse = ",")), by = .(email)]
# email date user_ipaddress other
# 1: x@bla.com 2020-03-24,2020-04-02,2020-04-02 177.95.75.230,177.139.49.93,177.139.49.93 xxxx,yyyy,zzzz
数据:
df <- read.table(text='email date user_ipaddress other_data
1 x@bla.com 2020-03-24 177.95.75.230 xxxx
2 x@bla.com 2020-04-02 177.139.49.93 yyyy
3 x@bla.com 2020-04-02 177.139.49.93 zzzz', header = TRUE, stringsAsFactors = FALSE)
也许你可以试试 aggregate
in base R:
dfout <- aggregate(.~email,df,FUN = function(x) list(unique(levels(x))))
这样
> dfout
email date user_ipaddress other data
1 x@bla.com 2020-03-24, 2020-04-02 177.139.49.93, 177.95.75.230 xxxx, yyyy, zzzz
数据
df <- structure(list(email = c("x@bla.com", "x@bla.com", "x@bla.com"
), date = c("2020-03-24", "2020-04-02", "2020-04-02"), user_ipaddress = c("177.95.75.230",
"177.139.49.93", "177.139.49.93"), `other data` = c("xxxx", "yyyy",
"zzzz")), class = "data.frame", row.names = c(NA, -3L))
我有这个DF
email date user_ipaddress other data
1 x@bla.com 2020-03-24 177.95.75.230 xxxx
2 x@bla.com 2020-04-02 177.139.49.93 yyyy
3 x@bla.com 2020-04-02 177.139.49.93 zzzz
并且我想将此数据转换为要存储的形状
整个问题将是一个包含不同电子邮件的大数据框,我想像这样在一行中减少每封电子邮件的所有数据
email date user_ipaddress other data
1 x@bla.com 2020-04-02 c('177.95.75.230','177.139.49.93') c('xxxx','yyyy','zzzz')
实际上,如果有人可以帮助我解决只有一个电子邮件地址的情况,那将挽救我的生命,但请随时帮助解决整个问题
使用
ipadreessVec<-Reduce(append,x =df$network_userid)
我可以获得我的矢量 c('177.95.75.230','177.139.49.93')
但如果我尝试制作
newdf$network_userid<-a
我明白了
Error in `$<-.data.frame`(`*tmp*`, network_userid, value = c("20562206-f557-48a3-861b-5d1e18524bbb", :
replacement has 3 rows, data has 1
任何让我更进一步的答案都会得到批准,即使它不能解决所有问题。
我们可以创建一个按 'email'、'date'
分组的list
列
library(dplyr)
DF %>%
group_by(email, date) %>%
summarise_all(list)
# A tibble: 2 x 4
# Groups: email [1]
# email date user_ipaddress otherdata
# <chr> <chr> <list> <list>
#1 x@bla.com 2020-03-24 <chr [1]> <chr [1]>
#2 x@bla.com 2020-04-02 <chr [2]> <chr [2]>
或者在 devel
版本中使用 across
和 summarise
DF %>%
group_by(email, date) %>%
summarise(across(everything(), list))
# A tibble: 2 x 4
# Groups: email [1]
# email date user_ipaddress otherdata
# <chr> <chr> <list> <list>
#1 x@bla.com 2020-03-24 <chr [1]> <chr [1]>
#2 x@bla.com 2020-04-02 <chr [2]> <chr [2]>
数据
DF <- structure(list(email = c("x@bla.com", "x@bla.com", "x@bla.com"
), date = c("2020-03-24", "2020-04-02", "2020-04-02"),
user_ipaddress = c("177.95.75.230",
"177.139.49.93", "177.139.49.93"),
otherdata = c("xxxx", "yyyy",
"zzzz")), class = "data.frame", row.names = c("1", "2", "3"))
我可能误会了你,你似乎更可能想要@akrun 节目之类的东西,但从字面上解释你,你可能想要使用 dput
:
as.data.frame(lapply(df, function(x) capture.output(dput(unique(x)))))
#> email date user_ipaddress
#> 1 "x@bla.com" c("2020-03-24", "2020-04-02") c("177.95.75.230", "177.139.49.93")
#> other
#> 1 c("xxxx", "yyyy", "zzzz")
library('data.table')
通过电子邮件和日期:
setDT(df)[, .(user_ipaddress = paste0(user_ipaddress, collapse = ","),
other = paste0(other_data, collapse = ",")), by = .(email, date)]
# email date user_ipaddress other
# 1: x@bla.com 2020-03-24 177.95.75.230 xxxx
# 2: x@bla.com 2020-04-02 177.139.49.93,177.139.49.93 yyyy,zzzz
仅通过电子邮件:
setDT(df)[, .(date = paste0(date, collapse = ","),
user_ipaddress = paste0(user_ipaddress, collapse = ","),
other = paste0(other_data, collapse = ",")), by = .(email)]
# email date user_ipaddress other
# 1: x@bla.com 2020-03-24,2020-04-02,2020-04-02 177.95.75.230,177.139.49.93,177.139.49.93 xxxx,yyyy,zzzz
数据:
df <- read.table(text='email date user_ipaddress other_data
1 x@bla.com 2020-03-24 177.95.75.230 xxxx
2 x@bla.com 2020-04-02 177.139.49.93 yyyy
3 x@bla.com 2020-04-02 177.139.49.93 zzzz', header = TRUE, stringsAsFactors = FALSE)
也许你可以试试 aggregate
in base R:
dfout <- aggregate(.~email,df,FUN = function(x) list(unique(levels(x))))
这样
> dfout
email date user_ipaddress other data
1 x@bla.com 2020-03-24, 2020-04-02 177.139.49.93, 177.95.75.230 xxxx, yyyy, zzzz
数据
df <- structure(list(email = c("x@bla.com", "x@bla.com", "x@bla.com"
), date = c("2020-03-24", "2020-04-02", "2020-04-02"), user_ipaddress = c("177.95.75.230",
"177.139.49.93", "177.139.49.93"), `other data` = c("xxxx", "yyyy",
"zzzz")), class = "data.frame", row.names = c(NA, -3L))