如何在包含特定字符串的列中进行数据整理和变异？

Question

很难用语言来形容。因此，做了一个reprex 输入、输出和预期输出低于

我们如何处理数据 1.当我们如下所示进行函数和变异时，每次都会根据列名字符串产生歧义 2. 一旦我们有了唯一的列名，我们如何绑定它们

library(tidyverse)
# Basically, "." means ",". So, better we remove . and PC and convert to Numeric
df1 <- tribble(
  ~`ABC sales 01.01.2019 - 01.02.2019`, ~code,
 "1.019 PC",   2000, # Actually, it 1019 (remove . and PC )
 "100 PC",   2101,
 "3.440 PC",   2002
)

df2 <- tribble(
  ~`ABC sales 01.03.2019 - 01.04.2019`, ~year,
  "6.019 PC",   2019, 
  "20 PC",   2001,
  "043.440 PC",   2002
)

df3 <- tribble(
  ~`ABC sales 01.05.2019 - 01.06.2019`, ~year,
  "1.019 PC",   2000, 
  "701 PC",   2101,
  "6.440 PC",   2002
)

# Input data
input_df = list(df1,df2,df3)

#### function to clean data
# str_replace is used twice because 
# remove PC and dot

data_read = function(file){

  df_ <- df %>% #glimpse()
    # Select the column to remove PC, spaces and .
    # Each time, column name differs so, `ABC sales 01.01.2019 - 01.02.2019` cannot be used
    mutate_at(sales_dot = str_replace(select(contains('ABC')), "PC",""),
              sales = str_replace(sales_dot, "\.",""), # name the new column so that rbind can be applied later
              sales_dot = NULL, # delete the old column
              vars(contains("ABC")) = NULL # delete the old column
    )
  df_
}

# attempt to resolve
# To clean the data from dots and PC
output_df1 <- map(input_df, data_read) # or lapply ?
# rbind 
output = map(output_df1, rbind) # or lapply ?

expected_output <- df3 <- tribble(
  ~sales, ~year,
  "1019",   2000, 
  "100",   2101,
  "3440",   2002,
  "6019",   2019, 
  "20",   2001,
  "043440",   2002,
  "1019",   2000,
  "701",   2101,
  "6440",   2002
)

Answer 1

使用 purrr、dplyr 和 stringr，您可以：

map_df(.x = input_df, ~ .x %>%
        set_names(., c("sales", "year"))) %>%
 mutate(sales = str_remove_all(sales, "[. PC]"))

  sales   year
  <chr>  <dbl>
1 1019    2000
2 100     2101
3 3440    2002
4 6019    2019
5 20      2001
6 043440  2002
7 1019    2000
8 701     2101
9 6440    2002

如何在包含特定字符串的列中进行数据整理和变异？

How to data wrangle and mutate at column containing specific string?

dictionary

r

dataframe

rbind

stringr