正确地将 "data.frame" 转换为 "transactions" for arules

Correctly convert "data.frame" to "transactions" for arules

我有以下 data.frame:


    > str(noticias_json, list.len = 10)
    'data.frame':   1771 obs. of  3 variables:
     $ bairro:List of 1771
      ..$ : chr "icarai"
      ..$ : chr "nacoes"
      ..$ : chr  "danilo passos" "serra verde"
      ..$ : chr "icarai"
      ..$ : chr "centro"
      ..$ : chr  "itai" "manoel valinhas"
      ..$ : chr "anchieta"
      ..$ : chr "liberdade"
      ..$ : chr "nossa senhora das gracas"
      ..$ : chr "liberdade"
      .. [list output truncated]
     $ crime :List of 1771
      ..$ : chr "trafico de drogas"
      ..$ : chr "roubo de veiculo"
      ..$ : chr "roubo"
      ..$ : chr "trafico de drogas"
      ..$ : chr "falsidade ideologica"
      ..$ : chr  "trafico de drogas" "porte ilegal de armas" "roubo"
      ..$ : chr  "trafico de drogas" "porte ilegal de armas"
      ..$ : chr  "homicidio" "trafico de drogas" "porte ilegal de armas" "ocultacao de cadaver" ...
      ..$ : chr  "trafico de drogas" "roubo"
      ..$ : chr  "homicidio" "trafico de drogas" "porte ilegal de armas" "estupro"
      .. [list output truncated]
     $ data  : chr  "01-02-2016" "31-02-2016" "01-02-2017" "01-02-2017" ...

我需要为包 "arules" 准备它,以便我可以使用函数 apriori()。我试过:

df_fact <- as.data.frame(unlist(noticias_json))

然后:

df_trans <- as(df_fact, "transactions")

但是如果我尝试检查,我会得到以下输出


    > inspect(df_trans[1:5])
        items                                 transactionID
    [1] {unlist(noticias_json)=icarai}        bairro1      
    [2] {unlist(noticias_json)=nacoes}        bairro2      
    [3] {unlist(noticias_json)=danilo passos} bairro3      
    [4] {unlist(noticias_json)=serra verde}   bairro4      
    [5] {unlist(noticias_json)=icarai}        bairro5   
    

与 Class 的杂货相比,arules 完全不同

<pre>
> inspect(Groceries[1:5])
    items                                                                
[1] {citrus fruit,semi-finished bread,margarine,ready soups}             
[2] {tropical fruit,yogurt,coffee}                                       
[3] {whole milk}                                                         
[4] {pip fruit,yogurt,cream cheese ,meat spreads}                        
[5] {other vegetables,whole milk,condensed milk,long life bakery product}

我不知道我哪里做错了。如果有人可以帮助我,我将非常感激。 提前致谢。

我们可能需要 split 'data' 列并执行 unlist

df_trans <- as(setNames(lapply(split(noticias_json[-3],
              noticias_json$data), unlist), NULL), "transactions")

inspect(df_trans)
#    items                  
#[1] {icarai,               
#     trafico de drogas}    
#[2] {danilo passos,        
#     porte ilegal de armas,
#     roubo,                
#     serra verde,          
#     trafico de drogas}    

数据

noticias_json <- structure(list(bairro = structure(list("icarai", 
   c("danilo passos", 
"serra verde")), class = "AsIs"), crime = structure(list("trafico de drogas", 
    c("trafico de drogas", "porte ilegal de armas", "roubo")), class = "AsIs"), 
    data = c("01-02-2016", "31-02-2016")), .Names = c("bairro", 
"crime", "data"), row.names = c(NA, -2L), class = "data.frame")