根据重复曝光和参与者创建新数据框,并且只添加新数据
Create new dataframe from repeated exposure and participants and only add new data
周二快乐。
我目前正在收集调查数据。调查有时会问同样的问题,有时则不会。为什么?因为有 700 多个问题,要求参与者回答所有这些问题(无需付费)是不太现实的。所以我们正在对项目的子集进行调查。问题就出在这里。有些参与者会多次参加调查(这完全没问题);但是,当他们第二次(或第三次、第四次等)回答同一个问题时,我不想记录他们的回答。但是,当他们回答新问题时,我想存储该数据。我对如何做到这一点的想法是创建一个包含所有参与者信息的不断增长的主数据集,然后当查询新数据时参与者是否已经对之前的调查做出了回应,然后只将他们的新信息添加到数据文件中待分析。然后,当下一批调查结果进来时重复这个过程。在我的脑海中,在更新要分析的数据框之后,它可以用来查询任何进来的新数据。
所以让我尝试演示工作流程以帮助引导讨论,甚至帮助某人确定解决方案。
*note2: Dplyr也可能是相关包装。同样,标记,但如果它不相关我可以删除。
library(dplyr)
survey1 = structure(list(ip = c(111, 222, 333, 444, 555, 666, 777, 888,
999, 1110), gender = c("Female", "Female", "Male", "Female",
"Female", "Female", "Female", "Female", "Male", "Male"), age = c(23,
26, 23, 60, 30, 35, 27, 61, 49, 33), education = c(16, 18, 4,
18, 16, 19, 19, 14, 22, 16), race = c("White", "White", "Asian",
"White", "White", "White", "White", "White", "White", "White"
), Q4 = c("Dresser", "dresser", "drawers", "Dresser", "Dresser",
"Dresser", "Dresser", "dresser", "dresser", "dresser"), Q4a = c("Dresser",
"dresser", "drawers", "Dresser", "Dresser", "Dresser", "Dresser",
"dresser", "dresser", "dresser"), Q417 = c("Crib", "crib", "crib",
"Baby crib", "Crib", "Crib", "Crib", "crib", "crib", "crib"),
Q417a = c("Crib", "crib", "crib", "Baby crib", "Crib", "Crib",
"Crib", "crib", "crib", "crib"), Q536 = c("Couch", "couch",
"couch", "Couch or sofa", "Couch", "Couch", "Leather couch",
"sofa", "couch", "sofa"), Q536a = c("Sofa", "couch", "couch",
"Couch or sofa", "Couch", "Couch", "Couch", "sofa", "couch",
"sofe"), Q491 = c("Roof", "roof", "house", "Roof", "Roof",
"Roof", "Roof", "roof", "roof", "roof"), Q491a = c("Roof tile",
"roof", "roof", "Roof", "Roof", "Roof", "Roof", "rooof",
"roof", "roof"), Q452 = c("Rug", "rug", "rug", "Oriental carpet",
"Rug", "Rug", "Rug", "rug", "rug", "rug"), Q452a = c("Rug",
"rug", "rug", "Carpet", "Rug", "Rug", "Rug", "carpet", "rug",
"rug")), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))#ready in survey 2
survey2= structure(list(ip = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), gender = c("Female",
"Female", "Male", "Female", "Female", "Female", "Female", "Female",
"Male", "Male"), age = c(23, 26, 23, 60, 30, 35, 27, 61, 49,
33), education = c(16, 18, 4, 18, 16, 19, 19, 14, 22, 16), race = c("White",
"White", "Asian", "White", "White", "White", "White", "White",
"White", "White"), Q4 = c("dog", "dog", "dog", "dog", "dog",
"dog", "dog", "dog", "dog", "dog"), Q4a = c("cat", "cat", "cat",
"cat", "cat", "cat", "cat", "cat", "cat", "cat"), Q417 = c("van",
"van", "van", "van", "van", "van", "van", "van", "van", "van"
), Q417a = c("chocolate", "chocolate", "chocolate", "chocolate",
"chocolate", "chocolate", "chocolate", "chocolate", "chocolate",
"chocolate"), Q536 = c("candy", "candy", "candy", "candy", "candy",
"candy", "candy", "candy", "candy", "candy"), Q536a = c("pizza",
"pizza", "pizza", "pizza", "pizza", "pizza", "pizza", "pizza",
"pizza", "pizza"), Q491 = c("ocotpus", "ocotpus", "ocotpus",
"ocotpus", "ocotpus", "ocotpus", "ocotpus", "ocotpus", "ocotpus",
"ocotpus"), Q491a = c("panther", "panther", "panther", "panther",
"panther", "panther", "panther", "panther", "panther", "panther"
), Q452 = c("checkers", "checkers", "checkers", "checkers", "checkers",
"checkers", "checkers", "checkers", "checkers", "checkers"),
Q452a = c("computer", "computer", "computer", "computer",
"computer", "computer", "computer", "computer", "computer",
"computer")), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L), spec = structure(list(cols = list(
ip = structure(list(), class = c("collector_double", "collector"
)), gender = structure(list(), class = c("collector_character",
"collector")), age = structure(list(), class = c("collector_double",
"collector")), education = structure(list(), class = c("collector_double",
"collector")), race = structure(list(), class = c("collector_character",
"collector")), Q4 = structure(list(), class = c("collector_character",
"collector")), Q4a = structure(list(), class = c("collector_character",
"collector")), Q417 = structure(list(), class = c("collector_character",
"collector")), Q417a = structure(list(), class = c("collector_character",
"collector")), Q536 = structure(list(), class = c("collector_character",
"collector")), Q536a = structure(list(), class = c("collector_character",
"collector")), Q491 = structure(list(), class = c("collector_character",
"collector")), Q491a = structure(list(), class = c("collector_character",
"collector")), Q452 = structure(list(), class = c("collector_character",
"collector")), Q452a = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
从目前的数据可以看出,参加调查 1 的参与者都没有参加调查 2。我们从 IP(地址)差异中知道这一点。所以将它们加在一起并不费力。
masterData = rbind(survey1, survey2)
str(masterdata) #reveals tibble [20 x 15]
现在假设我们进行了一项新调查,这里调查 3 的参与者与调查 1 相同。但是,其中 4 个问题重叠,但我们从这些参与者那里获得了 5 个新问题的新数据。我想创建一个新的数据框并只为这些参与者添加新问题。示例:
survey3 =structure(list(X1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ip = c(111,
222, 333, 444, 555, 666, 777, 888, 999, 1110), gender = c("Female",
"Female", "Male", "Female", "Female", "Female", "Female", "Female",
"Male", "Male"), age = c(23, 26, 23, 60, 30, 35, 27, 61, 49,
33), education = c(16, 18, 4, 18, 16, 19, 19, 14, 22, 16), race = c("White",
"White", "Asian", "White", "White", "White", "White", "White",
"White", "White"), Q4 = c("Dresser", "dresser", "drawers", "Dresser",
"Dresser", "Dresser", "Dresser", "dresser", "dresser", "dresser"
), Q4a = c("Dresser", "dresser", "drawers", "Dresser", "Dresser",
"Dresser", "Dresser", "dresser", "dresser", "dresser"), Q417 = c("Crib",
"crib", "crib", "Baby crib", "Crib", "Crib", "Crib", "crib",
"crib", "crib"), Q417a = c("Crib", "crib", "crib", "Baby crib",
"Crib", "Crib", "Crib", "crib", "crib", "crib"), Q15 = c("waffle",
"waffle", "waffle", "waffle", "waffle", "waffle", "waffle", "waffle",
"waffle", "waffle"), Q16 = c("egg", "egg", "egg", "egg", "egg",
"egg", "egg", "egg", "egg", "egg"), Q17 = c("bacon", "bacon",
"bacon", "bacon", "bacon", "bacon", "bacon", "bacon", "bacon",
"bacon"), Q18 = c("pancake", "pancake", "pancake", "pancake",
"pancake", "pancake", "pancake", "pancake", "pancake", "pancake"
), Q19 = c("smoothie", "smoothie", "smoothie", "smoothie", "smoothie",
"smoothie", "smoothie", "smoothie", "smoothie", "smoothie")), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L), spec = structure(list(
cols = list(X1 = structure(list(), class = c("collector_double",
"collector")), ip = structure(list(), class = c("collector_double",
"collector")), gender = structure(list(), class = c("collector_character",
"collector")), age = structure(list(), class = c("collector_double",
"collector")), education = structure(list(), class = c("collector_double",
"collector")), race = structure(list(), class = c("collector_character",
"collector")), Q4 = structure(list(), class = c("collector_character",
"collector")), Q4a = structure(list(), class = c("collector_character",
"collector")), Q417 = structure(list(), class = c("collector_character",
"collector")), Q417a = structure(list(), class = c("collector_character",
"collector")), Q15 = structure(list(), class = c("collector_character",
"collector")), Q16 = structure(list(), class = c("collector_character",
"collector")), Q17 = structure(list(), class = c("collector_character",
"collector")), Q18 = structure(list(), class = c("collector_character",
"collector")), Q19 = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
此合并所需的输出如下所示:
desiredoutput = structure(list(ip = c(111, 222, 333, 444, 555, 666, 777, 888,
999, 1110, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10), gender = c("Female",
"Female", "Male", "Female", "Female", "Female", "Female", "Female",
"Male", "Male", "Female", "Female", "Male", "Female", "Female",
"Female", "Female", "Female", "Male", "Male"), age = c(23, 26,
23, 60, 30, 35, 27, 61, 49, 33, 23, 26, 23, 60, 30, 35, 27, 61,
49, 33), education = c(16, 18, 4, 18, 16, 19, 19, 14, 22, 16,
16, 18, 4, 18, 16, 19, 19, 14, 22, 16), race = c("White", "White",
"Asian", "White", "White", "White", "White", "White", "White",
"White", "White", "White", "Asian", "White", "White", "White",
"White", "White", "White", "White"), Q4 = c("Dresser", "dresser",
"drawers", "Dresser", "Dresser", "Dresser", "Dresser", "dresser",
"dresser", "dresser", "dog", "dog", "dog", "dog", "dog", "dog",
"dog", "dog", "dog", "dog"), Q4a = c("Dresser", "dresser", "drawers",
"Dresser", "Dresser", "Dresser", "Dresser", "dresser", "dresser",
"dresser", "cat", "cat", "cat", "cat", "cat", "cat", "cat", "cat",
"cat", "cat"), Q417 = c("Crib", "crib", "crib", "Baby crib",
"Crib", "Crib", "Crib", "crib", "crib", "crib", "van", "van",
"van", "van", "van", "van", "van", "van", "van", "van"), Q417a = c("Crib",
"crib", "crib", "Baby crib", "Crib", "Crib", "Crib", "crib",
"crib", "crib", "chocolate", "chocolate", "chocolate", "chocolate",
"chocolate", "chocolate", "chocolate", "chocolate", "chocolate",
"chocolate"), Q536 = c("Couch", "couch", "couch", "Couch or sofa",
"Couch", "Couch", "Leather couch", "sofa", "couch", "sofa", "candy",
"candy", "candy", "candy", "candy", "candy", "candy", "candy",
"candy", "candy"), Q536a = c("Sofa", "couch", "couch", "Couch or sofa",
"Couch", "Couch", "Couch", "sofa", "couch", "sofe", "pizza",
"pizza", "pizza", "pizza", "pizza", "pizza", "pizza", "pizza",
"pizza", "pizza"), Q491 = c("Roof", "roof", "house", "Roof",
"Roof", "Roof", "Roof", "roof", "roof", "roof", "ocotpus", "ocotpus",
"ocotpus", "ocotpus", "ocotpus", "ocotpus", "ocotpus", "ocotpus",
"ocotpus", "ocotpus"), Q491a = c("Roof tile", "roof", "roof",
"Roof", "Roof", "Roof", "Roof", "rooof", "roof", "roof", "panther",
"panther", "panther", "panther", "panther", "panther", "panther",
"panther", "panther", "panther"), Q452 = c("Rug", "rug", "rug",
"Oriental carpet", "Rug", "Rug", "Rug", "rug", "rug", "rug",
"checkers", "checkers", "checkers", "checkers", "checkers", "checkers",
"checkers", "checkers", "checkers", "checkers"), Q452a = c("Rug",
"rug", "rug", "Carpet", "Rug", "Rug", "Rug", "carpet", "rug",
"rug", "computer", "computer", "computer", "computer", "computer",
"computer", "computer", "computer", "computer", "computer"),
Q15 = c("waffle", "waffle", "waffle", "waffle", "waffle",
"waffle", "waffle", "waffle", "waffle", "waffle", NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA), Q16 = c("egg", "egg", "egg",
"egg", "egg", "egg", "egg", "egg", "egg", "egg", NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA), Q17 = c("bacon", "bacon",
"bacon", "bacon", "bacon", "bacon", "bacon", "bacon", "bacon",
"bacon", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Q18 = c("pancake",
"pancake", "pancake", "pancake", "pancake", "pancake", "pancake",
"pancake", "pancake", "pancake", NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA), Q19 = c("smoothie", "smoothie", "smoothie",
"smoothie", "smoothie", "smoothie", "smoothie", "smoothie",
"smoothie", "smoothie", NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -20L), spec = structure(list(cols = list(
ip = structure(list(), class = c("collector_double", "collector"
)), gender = structure(list(), class = c("collector_character",
"collector")), age = structure(list(), class = c("collector_double",
"collector")), education = structure(list(), class = c("collector_double",
"collector")), race = structure(list(), class = c("collector_character",
"collector")), Q4 = structure(list(), class = c("collector_character",
"collector")), Q4a = structure(list(), class = c("collector_character",
"collector")), Q417 = structure(list(), class = c("collector_character",
"collector")), Q417a = structure(list(), class = c("collector_character",
"collector")), Q536 = structure(list(), class = c("collector_character",
"collector")), Q536a = structure(list(), class = c("collector_character",
"collector")), Q491 = structure(list(), class = c("collector_character",
"collector")), Q491a = structure(list(), class = c("collector_character",
"collector")), Q452 = structure(list(), class = c("collector_character",
"collector")), Q452a = structure(list(), class = c("collector_character",
"collector")), Q15 = structure(list(), class = c("collector_character",
"collector")), Q16 = structure(list(), class = c("collector_character",
"collector")), Q17 = structure(list(), class = c("collector_character",
"collector")), Q18 = structure(list(), class = c("collector_character",
"collector")), Q19 = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
我想要意识到的一件事是希望随着新调查的出现(例如surver4 - survery1000)
使这个过程成为一个迭代过程
任何帮助或想法都将不胜感激,因为我不清楚如何解决这个问题。
我建议以长格式保存数据,即每一行都有一个问题的答案。也把新的调查数据也转成长格式。
让我们假设 masterData
已经有 survey1
和 survey2
,现在您正试图在其中添加 survey3
。您可以将 survey3
合并到 master_data
中,然后只为每个参与者和每个问题保留唯一的行。假设每个参与者都由 ip
、age
、education
和 race
唯一标识,您可以这样做:
library(dplyr)
library(tidyr)
masterData <- masterData %>% pivot_longer(cols = starts_with('Q'))
new_survey <- survey3 %>% pivot_longer(cols = starts_with('Q'))
get_new_master <- function(masterData, new_data) {
bind_rows(masterData, new_data) %>%
distinct(ip, gender, age, education, race, name, .keep_all = TRUE)
}
这里第name
栏是问题编号。然后您可以调用 get_new_master
为 :
masterData <- get_new_master(masterData, new_survey)
现在masterData
有完整的数据,当有另一份调查时,我们可以按照相同的流程进行。
new_survey <- survey4 %>% pivot_longer(cols = starts_with('Q'))
masterData <- get_new_master(masterData, new_survey)
周二快乐。
我目前正在收集调查数据。调查有时会问同样的问题,有时则不会。为什么?因为有 700 多个问题,要求参与者回答所有这些问题(无需付费)是不太现实的。所以我们正在对项目的子集进行调查。问题就出在这里。有些参与者会多次参加调查(这完全没问题);但是,当他们第二次(或第三次、第四次等)回答同一个问题时,我不想记录他们的回答。但是,当他们回答新问题时,我想存储该数据。我对如何做到这一点的想法是创建一个包含所有参与者信息的不断增长的主数据集,然后当查询新数据时参与者是否已经对之前的调查做出了回应,然后只将他们的新信息添加到数据文件中待分析。然后,当下一批调查结果进来时重复这个过程。在我的脑海中,在更新要分析的数据框之后,它可以用来查询任何进来的新数据。
所以让我尝试演示工作流程以帮助引导讨论,甚至帮助某人确定解决方案。
*note2: Dplyr也可能是相关包装。同样,标记,但如果它不相关我可以删除。
library(dplyr)
survey1 = structure(list(ip = c(111, 222, 333, 444, 555, 666, 777, 888,
999, 1110), gender = c("Female", "Female", "Male", "Female",
"Female", "Female", "Female", "Female", "Male", "Male"), age = c(23,
26, 23, 60, 30, 35, 27, 61, 49, 33), education = c(16, 18, 4,
18, 16, 19, 19, 14, 22, 16), race = c("White", "White", "Asian",
"White", "White", "White", "White", "White", "White", "White"
), Q4 = c("Dresser", "dresser", "drawers", "Dresser", "Dresser",
"Dresser", "Dresser", "dresser", "dresser", "dresser"), Q4a = c("Dresser",
"dresser", "drawers", "Dresser", "Dresser", "Dresser", "Dresser",
"dresser", "dresser", "dresser"), Q417 = c("Crib", "crib", "crib",
"Baby crib", "Crib", "Crib", "Crib", "crib", "crib", "crib"),
Q417a = c("Crib", "crib", "crib", "Baby crib", "Crib", "Crib",
"Crib", "crib", "crib", "crib"), Q536 = c("Couch", "couch",
"couch", "Couch or sofa", "Couch", "Couch", "Leather couch",
"sofa", "couch", "sofa"), Q536a = c("Sofa", "couch", "couch",
"Couch or sofa", "Couch", "Couch", "Couch", "sofa", "couch",
"sofe"), Q491 = c("Roof", "roof", "house", "Roof", "Roof",
"Roof", "Roof", "roof", "roof", "roof"), Q491a = c("Roof tile",
"roof", "roof", "Roof", "Roof", "Roof", "Roof", "rooof",
"roof", "roof"), Q452 = c("Rug", "rug", "rug", "Oriental carpet",
"Rug", "Rug", "Rug", "rug", "rug", "rug"), Q452a = c("Rug",
"rug", "rug", "Carpet", "Rug", "Rug", "Rug", "carpet", "rug",
"rug")), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))#ready in survey 2
survey2= structure(list(ip = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), gender = c("Female",
"Female", "Male", "Female", "Female", "Female", "Female", "Female",
"Male", "Male"), age = c(23, 26, 23, 60, 30, 35, 27, 61, 49,
33), education = c(16, 18, 4, 18, 16, 19, 19, 14, 22, 16), race = c("White",
"White", "Asian", "White", "White", "White", "White", "White",
"White", "White"), Q4 = c("dog", "dog", "dog", "dog", "dog",
"dog", "dog", "dog", "dog", "dog"), Q4a = c("cat", "cat", "cat",
"cat", "cat", "cat", "cat", "cat", "cat", "cat"), Q417 = c("van",
"van", "van", "van", "van", "van", "van", "van", "van", "van"
), Q417a = c("chocolate", "chocolate", "chocolate", "chocolate",
"chocolate", "chocolate", "chocolate", "chocolate", "chocolate",
"chocolate"), Q536 = c("candy", "candy", "candy", "candy", "candy",
"candy", "candy", "candy", "candy", "candy"), Q536a = c("pizza",
"pizza", "pizza", "pizza", "pizza", "pizza", "pizza", "pizza",
"pizza", "pizza"), Q491 = c("ocotpus", "ocotpus", "ocotpus",
"ocotpus", "ocotpus", "ocotpus", "ocotpus", "ocotpus", "ocotpus",
"ocotpus"), Q491a = c("panther", "panther", "panther", "panther",
"panther", "panther", "panther", "panther", "panther", "panther"
), Q452 = c("checkers", "checkers", "checkers", "checkers", "checkers",
"checkers", "checkers", "checkers", "checkers", "checkers"),
Q452a = c("computer", "computer", "computer", "computer",
"computer", "computer", "computer", "computer", "computer",
"computer")), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L), spec = structure(list(cols = list(
ip = structure(list(), class = c("collector_double", "collector"
)), gender = structure(list(), class = c("collector_character",
"collector")), age = structure(list(), class = c("collector_double",
"collector")), education = structure(list(), class = c("collector_double",
"collector")), race = structure(list(), class = c("collector_character",
"collector")), Q4 = structure(list(), class = c("collector_character",
"collector")), Q4a = structure(list(), class = c("collector_character",
"collector")), Q417 = structure(list(), class = c("collector_character",
"collector")), Q417a = structure(list(), class = c("collector_character",
"collector")), Q536 = structure(list(), class = c("collector_character",
"collector")), Q536a = structure(list(), class = c("collector_character",
"collector")), Q491 = structure(list(), class = c("collector_character",
"collector")), Q491a = structure(list(), class = c("collector_character",
"collector")), Q452 = structure(list(), class = c("collector_character",
"collector")), Q452a = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
从目前的数据可以看出,参加调查 1 的参与者都没有参加调查 2。我们从 IP(地址)差异中知道这一点。所以将它们加在一起并不费力。
masterData = rbind(survey1, survey2)
str(masterdata) #reveals tibble [20 x 15]
现在假设我们进行了一项新调查,这里调查 3 的参与者与调查 1 相同。但是,其中 4 个问题重叠,但我们从这些参与者那里获得了 5 个新问题的新数据。我想创建一个新的数据框并只为这些参与者添加新问题。示例:
survey3 =structure(list(X1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ip = c(111,
222, 333, 444, 555, 666, 777, 888, 999, 1110), gender = c("Female",
"Female", "Male", "Female", "Female", "Female", "Female", "Female",
"Male", "Male"), age = c(23, 26, 23, 60, 30, 35, 27, 61, 49,
33), education = c(16, 18, 4, 18, 16, 19, 19, 14, 22, 16), race = c("White",
"White", "Asian", "White", "White", "White", "White", "White",
"White", "White"), Q4 = c("Dresser", "dresser", "drawers", "Dresser",
"Dresser", "Dresser", "Dresser", "dresser", "dresser", "dresser"
), Q4a = c("Dresser", "dresser", "drawers", "Dresser", "Dresser",
"Dresser", "Dresser", "dresser", "dresser", "dresser"), Q417 = c("Crib",
"crib", "crib", "Baby crib", "Crib", "Crib", "Crib", "crib",
"crib", "crib"), Q417a = c("Crib", "crib", "crib", "Baby crib",
"Crib", "Crib", "Crib", "crib", "crib", "crib"), Q15 = c("waffle",
"waffle", "waffle", "waffle", "waffle", "waffle", "waffle", "waffle",
"waffle", "waffle"), Q16 = c("egg", "egg", "egg", "egg", "egg",
"egg", "egg", "egg", "egg", "egg"), Q17 = c("bacon", "bacon",
"bacon", "bacon", "bacon", "bacon", "bacon", "bacon", "bacon",
"bacon"), Q18 = c("pancake", "pancake", "pancake", "pancake",
"pancake", "pancake", "pancake", "pancake", "pancake", "pancake"
), Q19 = c("smoothie", "smoothie", "smoothie", "smoothie", "smoothie",
"smoothie", "smoothie", "smoothie", "smoothie", "smoothie")), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L), spec = structure(list(
cols = list(X1 = structure(list(), class = c("collector_double",
"collector")), ip = structure(list(), class = c("collector_double",
"collector")), gender = structure(list(), class = c("collector_character",
"collector")), age = structure(list(), class = c("collector_double",
"collector")), education = structure(list(), class = c("collector_double",
"collector")), race = structure(list(), class = c("collector_character",
"collector")), Q4 = structure(list(), class = c("collector_character",
"collector")), Q4a = structure(list(), class = c("collector_character",
"collector")), Q417 = structure(list(), class = c("collector_character",
"collector")), Q417a = structure(list(), class = c("collector_character",
"collector")), Q15 = structure(list(), class = c("collector_character",
"collector")), Q16 = structure(list(), class = c("collector_character",
"collector")), Q17 = structure(list(), class = c("collector_character",
"collector")), Q18 = structure(list(), class = c("collector_character",
"collector")), Q19 = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
此合并所需的输出如下所示:
desiredoutput = structure(list(ip = c(111, 222, 333, 444, 555, 666, 777, 888,
999, 1110, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10), gender = c("Female",
"Female", "Male", "Female", "Female", "Female", "Female", "Female",
"Male", "Male", "Female", "Female", "Male", "Female", "Female",
"Female", "Female", "Female", "Male", "Male"), age = c(23, 26,
23, 60, 30, 35, 27, 61, 49, 33, 23, 26, 23, 60, 30, 35, 27, 61,
49, 33), education = c(16, 18, 4, 18, 16, 19, 19, 14, 22, 16,
16, 18, 4, 18, 16, 19, 19, 14, 22, 16), race = c("White", "White",
"Asian", "White", "White", "White", "White", "White", "White",
"White", "White", "White", "Asian", "White", "White", "White",
"White", "White", "White", "White"), Q4 = c("Dresser", "dresser",
"drawers", "Dresser", "Dresser", "Dresser", "Dresser", "dresser",
"dresser", "dresser", "dog", "dog", "dog", "dog", "dog", "dog",
"dog", "dog", "dog", "dog"), Q4a = c("Dresser", "dresser", "drawers",
"Dresser", "Dresser", "Dresser", "Dresser", "dresser", "dresser",
"dresser", "cat", "cat", "cat", "cat", "cat", "cat", "cat", "cat",
"cat", "cat"), Q417 = c("Crib", "crib", "crib", "Baby crib",
"Crib", "Crib", "Crib", "crib", "crib", "crib", "van", "van",
"van", "van", "van", "van", "van", "van", "van", "van"), Q417a = c("Crib",
"crib", "crib", "Baby crib", "Crib", "Crib", "Crib", "crib",
"crib", "crib", "chocolate", "chocolate", "chocolate", "chocolate",
"chocolate", "chocolate", "chocolate", "chocolate", "chocolate",
"chocolate"), Q536 = c("Couch", "couch", "couch", "Couch or sofa",
"Couch", "Couch", "Leather couch", "sofa", "couch", "sofa", "candy",
"candy", "candy", "candy", "candy", "candy", "candy", "candy",
"candy", "candy"), Q536a = c("Sofa", "couch", "couch", "Couch or sofa",
"Couch", "Couch", "Couch", "sofa", "couch", "sofe", "pizza",
"pizza", "pizza", "pizza", "pizza", "pizza", "pizza", "pizza",
"pizza", "pizza"), Q491 = c("Roof", "roof", "house", "Roof",
"Roof", "Roof", "Roof", "roof", "roof", "roof", "ocotpus", "ocotpus",
"ocotpus", "ocotpus", "ocotpus", "ocotpus", "ocotpus", "ocotpus",
"ocotpus", "ocotpus"), Q491a = c("Roof tile", "roof", "roof",
"Roof", "Roof", "Roof", "Roof", "rooof", "roof", "roof", "panther",
"panther", "panther", "panther", "panther", "panther", "panther",
"panther", "panther", "panther"), Q452 = c("Rug", "rug", "rug",
"Oriental carpet", "Rug", "Rug", "Rug", "rug", "rug", "rug",
"checkers", "checkers", "checkers", "checkers", "checkers", "checkers",
"checkers", "checkers", "checkers", "checkers"), Q452a = c("Rug",
"rug", "rug", "Carpet", "Rug", "Rug", "Rug", "carpet", "rug",
"rug", "computer", "computer", "computer", "computer", "computer",
"computer", "computer", "computer", "computer", "computer"),
Q15 = c("waffle", "waffle", "waffle", "waffle", "waffle",
"waffle", "waffle", "waffle", "waffle", "waffle", NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA), Q16 = c("egg", "egg", "egg",
"egg", "egg", "egg", "egg", "egg", "egg", "egg", NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA), Q17 = c("bacon", "bacon",
"bacon", "bacon", "bacon", "bacon", "bacon", "bacon", "bacon",
"bacon", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Q18 = c("pancake",
"pancake", "pancake", "pancake", "pancake", "pancake", "pancake",
"pancake", "pancake", "pancake", NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA), Q19 = c("smoothie", "smoothie", "smoothie",
"smoothie", "smoothie", "smoothie", "smoothie", "smoothie",
"smoothie", "smoothie", NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -20L), spec = structure(list(cols = list(
ip = structure(list(), class = c("collector_double", "collector"
)), gender = structure(list(), class = c("collector_character",
"collector")), age = structure(list(), class = c("collector_double",
"collector")), education = structure(list(), class = c("collector_double",
"collector")), race = structure(list(), class = c("collector_character",
"collector")), Q4 = structure(list(), class = c("collector_character",
"collector")), Q4a = structure(list(), class = c("collector_character",
"collector")), Q417 = structure(list(), class = c("collector_character",
"collector")), Q417a = structure(list(), class = c("collector_character",
"collector")), Q536 = structure(list(), class = c("collector_character",
"collector")), Q536a = structure(list(), class = c("collector_character",
"collector")), Q491 = structure(list(), class = c("collector_character",
"collector")), Q491a = structure(list(), class = c("collector_character",
"collector")), Q452 = structure(list(), class = c("collector_character",
"collector")), Q452a = structure(list(), class = c("collector_character",
"collector")), Q15 = structure(list(), class = c("collector_character",
"collector")), Q16 = structure(list(), class = c("collector_character",
"collector")), Q17 = structure(list(), class = c("collector_character",
"collector")), Q18 = structure(list(), class = c("collector_character",
"collector")), Q19 = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
我想要意识到的一件事是希望随着新调查的出现(例如surver4 - survery1000)
使这个过程成为一个迭代过程任何帮助或想法都将不胜感激,因为我不清楚如何解决这个问题。
我建议以长格式保存数据,即每一行都有一个问题的答案。也把新的调查数据也转成长格式。
让我们假设 masterData
已经有 survey1
和 survey2
,现在您正试图在其中添加 survey3
。您可以将 survey3
合并到 master_data
中,然后只为每个参与者和每个问题保留唯一的行。假设每个参与者都由 ip
、age
、education
和 race
唯一标识,您可以这样做:
library(dplyr)
library(tidyr)
masterData <- masterData %>% pivot_longer(cols = starts_with('Q'))
new_survey <- survey3 %>% pivot_longer(cols = starts_with('Q'))
get_new_master <- function(masterData, new_data) {
bind_rows(masterData, new_data) %>%
distinct(ip, gender, age, education, race, name, .keep_all = TRUE)
}
这里第name
栏是问题编号。然后您可以调用 get_new_master
为 :
masterData <- get_new_master(masterData, new_survey)
现在masterData
有完整的数据,当有另一份调查时,我们可以按照相同的流程进行。
new_survey <- survey4 %>% pivot_longer(cols = starts_with('Q'))
masterData <- get_new_master(masterData, new_survey)