遍历数据并创建新的数据框

iterating through data and creating new data frame

我正在通过以下方式使用来自数据库的数据框:

username    elements
username1   """interfaces"".""dual()"""
username1   """interfaces"".""f_capitalaccrualcurrentyear"""
username2   """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta"""
username2   """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta"""
username2   """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta"""
username4   """interfaces"".""dnow_s_downtime_stat_with_lat_long"""
username3   """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta"""

所以,两列,"username" 和 "elements"。因此,用户在一次交易中可以使用一个或多个元素。当有多个元素时,它们在事务中用逗号分隔。我需要将元素分开,每行一个,但仍标记有用户名。最后我希望它是这样的:

username    elements
username1   """interfaces"".""dual()"""
username1   """interfaces"".""f_capitalaccrualcurrentyear"""
username2   """interfaces"".""dnow_completion""
username2   ""interfaces"".""dnow_s_daily_prod_ta"""
username2   """interfaces"".""dnow_completion""
username2   ""interfaces"".""dnow_s_daily_prod_ta"""
username2   """interfaces"".""dnow_completion""
username2   ""interfaces"".""dnow_s_daily_prod_ta"""
username4   """interfaces"".""dnow_s_downtime_stat_with_lat_long"""
username3   """interfaces"".""dnow_completion""
username3   ""interfaces"".""dnow_s_daily_prod_ta"""

我一直在尝试遍历数据框,拆分带有逗号的元素,然后将它们与各自的用户名放回一起。

我一直在尝试下面的代码,但它非常低效。我是 "R" 的新手,所以我的猜测是必须有一种更有效的方法来做到这一点。

interface.data <-data.frame(
    username = c(),
    elements = c()
)
for (row in 1:nrow(input)) { ##input is the frame that comes from the database
     myrowbrk<-input[row,"elements"]
     myrowelements<-chartr(",", "\n", myrowbrk)      
     user<-input[row,"username"]
     interface.newdata <- data.frame(
         username = user,
         elements = c(myrowelements)         
     )
     interface.final<- rbind(interface.data,interface.newdata )
}

output<-interface.final

您可以使用 tidyr 软件包来做到这一点。我的解决方案使用两个步骤来获取所需格式的数据:1) 使用逗号分隔 elements 列和 2) 将格式从宽格式更改为长格式。

library(tidyr)

#Separate the 'elements' column from your 'df' data frame using the comma character
#Set the new variable names as a sequence of 1 to the max number of expected columns
df2 <- separate(data = df, 
                   col = elements, 
                   into = as.character(seq(1,2,1)),
                   sep = ",")
#This code gives a warning because not every row has a string with a comma. 
#Empty entries are filled with NA

#Then change from wide to long format, dropping NA entries
#Drop the column that indicates the name of the column from which the elements entry was obtained (i.e., 1 or 2)
df2 <- df2 %>%
  pivot_longer(cols = "1":"2",
               values_to = "elements",
               values_drop_na = TRUE) %>%
  select(-name)