如何根据 date/day 在 r 中拆分和制作新的 csv 文件?

How to split and make new csv files based on date/day in r?

您好,我有一个 8GB 的​​文件,我需要对其进行一些分析。但是我的内存不是那么好。为了高效地工作,我决定使用以下代码根据行拆分我的 csv 文件:

library(tidyverse)

sample_df <- readr::read_csv("sample.csv") #Read in the csv file
dput(sample_df)

#break the large CSV so RAM and Rstudio doesn't crash

groups <- (split(sample_df, (seq(nrow(sample_df))-1) %/% 20)) #here I want 20 rows per file until last row is reached

for (i in seq_along(groups)) {
  write.csv(groups[[i]], paste0("sample_output_file", i, ".csv")) #iterate and write file
}

在我的高级导师要求我根据每个 date/days 进行分析之前,这一直很有效。我 运行 遇到了一个问题,因为通过按行拆分,我最终将日期分散到多个 csvs。当我尝试读取 3-4 个 csvs 以根据每天进行分析时,这会产生低 RAM 和内存管理的问题。

示例文件在这里:https://github.com/THsTestingGround/SO_splitbydate_question/blob/master/sample.csv

那么有人可以帮助我如何根据日期拆分我最初阅读的示例 csv 文件吗?我想将所有 Aprl1 放在一个 csv 文件中,然后将 Aprl2 放在另一个文件中,依此类推。我确实尝试过,但没能成功。

另外我想知道 readr::read_csv_chunked 是否可以在任何方面帮助我们?从文档中我看不到任何具体内容。

这里是 csv 文件的dput

dput(sample_df)
structure(list(createdAt = c("Fri Apr 01 04:04:32 +0000 2020", 
"Fri Apr 01 04:04:36 +0000 2020", "Fri Apr 01 04:04:37 +0000 2020", 
"Fri Apr 02 04:04:40 +0000 2020", "Fri Apr 02 04:04:44 +0000 2020", 
"Fri Apr 02 04:04:46 +0000 2020", "Fri Apr 02 04:04:54 +0000 2020", 
"Fri Apr 02 04:04:56 +0000 2020", "Fri Apr 02 04:05:07 +0000 2020", 
"Fri Apr 02 04:05:12 +0000 2020", "Fri Apr 03 04:05:12 +0000 2020", 
"Fri Apr 03 04:05:19 +0000 2020", "Fri Apr 03 04:05:27 +0000 2020", 
"Fri Apr 03 04:05:33 +0000 2020", "Fri Apr 03 04:05:36 +0000 2020", 
"Fri Apr 03 04:06:11 +0000 2020", "Fri Apr 03 04:07:08 +0000 2020", 
"Fri Apr 03 04:07:14 +0000 2020", "Fri Apr 03 04:07:15 +0000 2020", 
"Fri Apr 03 04:07:20 +0000 2020", "Fri Apr 03 04:07:30 +0000 2020", 
"Fri Apr 03 04:07:51 +0000 2020", "Fri Apr 03 04:08:04 +0000 2020", 
"Fri Apr 03 04:08:09 +0000 2020", "Fri Apr 03 04:08:15 +0000 2020", 
"Fri Apr 03 04:08:22 +0000 2020", "Fri Apr 03 04:08:36 +0000 2020", 
"Fri Apr 03 04:08:46 +0000 2020", "Fri Apr 03 04:08:46 +0000 2020", 
"Fri Apr 03 04:09:01 +0000 2020", "Fri Apr 03 04:09:08 +0000 2020", 
"Fri Apr 03 04:09:10 +0000 2020", "Fri Apr 03 04:09:15 +0000 2020", 
"Fri Apr 03 04:09:26 +0000 2020", "Fri Apr 03 04:09:27 +0000 2020", 
"Fri Apr 03 04:09:28 +0000 2020", "Fri Apr 03 04:09:28 +0000 2020", 
"Fri Apr 03 04:09:35 +0000 2020", "Fri Apr 03 04:09:36 +0000 2020", 
"Fri Apr 03 04:09:41 +0000 2020", "Fri Apr 03 04:09:45 +0000 2020", 
"Fri Apr 03 04:10:16 +0000 2020", "Fri Apr 03 04:10:19 +0000 2020", 
"Fri Apr 03 04:10:22 +0000 2020", "Fri Apr 03 04:10:26 +0000 2020", 
"Fri Apr 03 04:10:31 +0000 2020", "Fri Apr 03 04:10:48 +0000 2020", 
"Fri Apr 04 04:11:19 +0000 2020", "Fri Apr 04 04:11:32 +0000 2020", 
"Fri Apr 04:11:44 +0000 2020"), timestamp = c(1.58589e+12, 1.58589e+12, 
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 
1.58589e+12, 1.58589e+12, 1.58589e+12), id_str = c(1.24593e+18, 
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 
1.24593e+18, 1.25e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18), text = c("Finally. Make your own mask. Protect yourself and others. #coronavirus", 
"@ArvinderSoin do you feel the use of only masks for IPD rounds, in an environment where no patients have been teste…", 
"India, you actually deserve him for electing him.\n\nAb batti bhujao aur #corona bhagav.\n\nNo testing kits, no masks,…", 
"great picture to sum up everything\n#mask #maskefficiency #noclothmask #maskprotection #surgicalmask #N95 #FFP1…", 
"The greatest hazard to public health is official misinformation.\n\nAsian countries were wearing masks from the begin…", 
"#Florida official says @3M is selling face masks to foreign countries instead of his state amid #COVID19 crisis.\n", 
"Wearing masks is one of the protective measures preventing catching the novel #Coronavirus as the pandemic spreads…", 
"It took Americans two and a half months to start wearing masks. Think about why, maybe it could explain why the peo…", 
"#coronavirus watching me put on the same surgical mask 2 shifts in a row\n\n#COVID<U+30FC>19 #nurse", 
"Back in stock! NIOSH N95, go to our website.\nOnly 11,000 masks \n\n#facemask #facemasks #N95…", 
"Hence the vital importance of wearing masks when outside - #coronavirus #coronavirusindia #COVID2019india…", 
"@Read5000YrLeap @SenSchumer buy trump facemasks. support trump 2020 and be safe. ships from midwest. #Boycott3M… ", 
"When going out for essential activities, members of the public should wear reusable, non-medical cloth face coverin…", 
"@jmcmaccarr buy trump facemasks. support trump 2020 and be safe. ships from midwest. #Boycott3M @seanhannity…", 
"It took Americans two and a half months to start wearing masks. Think about why, maybe it could explain why the peo…", 
"@CNN Just #WearMask People    wearing a mask Nationwide ... SAVES…", 
"That is less than 4 million per week.  In Taiwan, everyone is allocated 3 surgical masks per week.  For Australia t…", 
"@Constitution999 @ChuckCallesto @realDonaldTrump buy trump facemasks. support trump 2020 and be safe. ships from mi…", 
"Regard the debate of face mask in general public, the evidence of effectiveness is quite clear #Covid19…", 
"Normalize putting on of masks. #COVID19 came to change the world order.", 
"@TwitterSafety the Honduran gov’t is lying on Twitter. Saying that they are making thousands of masks, protective v…", 
"Trump explaining that if you need a mask you can go to Walmart. Also that Costco has some great deals on caskets an…", 
"When lockdown is over... I just may add this to my “don’t forget..” along with my wallet, gloves, mask, hand saniti…", 
"Make your own mask: #covid19\n", "Please, everyone should wear a mask in public. Use whatever you can get hold of. Something is better than nothing (…", 
"@kittywuv1 So incredibly mesmerizing, even with the custom #covid19 mask!<U+0001F970><U+0001F60D><U+0001F618><U+0001F637><U+0001F497>", 
"@BeauTFC Happy to report that we’ve developed a 3-D printed mask. Passed N95 equivalent fit-test with Bitrex (surgi…", 
"On a lighter note. \n\nIt is questionable if these common surgical masks and cloth masks will protect us from…", 
"Medical workers face big mask shortage. This UF doctor came up with way to make many \n\n…", 
"Homemade face coverings. Well, I tried it didn't come out straight but it should work. <U+0001F637> #homemade #facecoverings…", 
"#covid19 In Africa, \"where are no masks, no treatment, no reanimation\", \"the same way experimental treatment for AI…", 
"@theblondeMD Happy to report that we’ve developed a 3-D printed mask. Passed N95 equivalent fit-test with Bitrex (s…", 
"I wouldn’t do a thing anyone from #China says to do. The masks they keep sending around the world are faulty, they…", 
"@TIME [covid19],important:\n1.from_air-&gt;mask-&gt;mask_reuse.\n2.from_touch-&gt;clean_hands.\n\nps1.20200328.…", 
"@3M stop selling masks to foreign companies. We WILL remember this!\n#COVID19Pandemic \n#covid19\n#N95masks", 
"Awareness for using mask by @WHO #recommendations @CMOTamilNadu #COVID19 #Corona @MoHFW_INDIA #TNHealth #CVB…", 
"@Rakshitwa @beingdumber @taapsee Nitish Kumar asked for 10 lakh N95 masks but got 50,000. Sought five lakh PPE kits…", 
"@CNN You mean the masks everyone was saying #Covit19 #COVID<U+30FC>19 #coronavirus can pass right through as per what was…", 
"2 BILLION masks = global production capacity in 2.5 MONTHS = quantity of what China imported in 5 WEEKS since Jan…", 
"@CDCgov @CDCDirector @SF_DPH Please remember those with #COPD #LungDisease #HeartDisease when requiring #masks for…", 
"If you have to go out and can’t avoid being around people, wear a mask.  Masks are a complement to social distancin…", 
"@CTVVancouver According to Dr \"doom\" Bonnie Henry, masks aren't of any use to the general public, in fact, she clai…", 
"@maddow Next time you talk about the government stating everyone needs to wear a mask ask a government official whe…", 
"Wear a mask in you are unwell or taking care of a person with suspected 2019-nCoV infection.\nInfo source: WHO…", 
"7/9 For those who need a #COVID19 mask ASAP and have no talent, time or materials to make a mask. We give you the e…", 
"jasminesade_art\nIs taking orders for masks (w/ filter pocket) \nMsg jasminesade_art if interested <U+0001F496> \n.\n.\n.\n.\n.\n.", 
"What China do to cut down the spread dramatically are only to make people stay at home and wear masks!!!!!@PHE_uk…", 
"@CNN hey i thought we were boycotting China\nthen why the Americans need Chinese masks?\ngo fuck yourself \n#BoycottChina #coronavirus", 
"@CNN @CillizzaCNN [covid19],important:\n1.from_air-&gt;mask-&gt;mask_reuse.\n2.from_touch-&gt;clean_hands.\n\nps1.20200328.…", 
"@kr3at #WearMask Everyone  !!!\n\n\nSimply  wearing a mask Nationwide ... SAVES #CZECHOSLOVAKIA…"
), retweetCount = c(1372, 9, NA, 8, 30, NA, NA, NA, NA, NA, 34, 
NA, NA, NA, NA, NA, 192, NA, NA, NA, 50, NA, 221, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, 17, 1948, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, 53, NA, 1948, NA), favorite_count = c(3488, 
23, NA, 7, 46, NA, NA, NA, NA, NA, 62, NA, NA, NA, NA, NA, 710, 
NA, NA, NA, 48, NA, 506, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
29, 4963, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 164, 
NA, 4963, NA), url = c("twitter.com/33617860/status/1245925124483809280", 
"twitter.com/1106803026/status/1245925141046935552", "twitter.com/421517829/status/1245925143479595008", 
"twitter.com/1245594213795778560/status/1245925159724171264", 
"twitter.com/2178012643/status/1245925173858975744", "twitter.com/1220529001241989120/status/1245925183010963456", 
"twitter.com/1115874631/status/1245925217790124032", "twitter.com/1243781317747077120/status/1245925225327235072", 
"twitter.com/2729830110/status/1245925273230438400", "twitter.com/1240114893178667008/status/1245925291374964736", 
"twitter.com/88875512/status/1245925292972969984", "twitter.com/1245907384993812480/status/1245925320282136576", 
"twitter.com/3431854829/status/1245925357116481536", "twitter.com/1245907384993812480/status/1245925380973871104", 
"twitter.com/1243781317747077120/status/1245925393095217152", 
"twitter.com/1230706447257751552/status/1245925541644992512", 
"twitter.com/4437322348/status/1245925779117985792", "twitter.com/1245907384993812480/status/1245925802442555392", 
"twitter.com/829633267942903808/status/1245925807211663360", 
"twitter.com/403961389/status/1245925829755969536", "twitter.com/17183161/status/1245925869010292736", 
"twitter.com/1408320152/status/1245925960550993920", "twitter.com/1245663286881902592/status/1245926011679600640", 
"twitter.com/244306637/status/1245926036321103872", "twitter.com/24327965/status/1245926059318448128", 
"twitter.com/1164222471639318528/status/1245926089068646400", 
"twitter.com/16328861/status/1245926148967727104", "twitter.com/6125082/status/1.24592618943e+18", 
"twitter.com/3685052935/status/1245926191850065920", "twitter.com/868528766355558400/status/1245926251455365120", 
"twitter.com/1223273206636851200/status/1245926283093012480", 
"twitter.com/16328861/status/1245926292274311168", "twitter.com/1160039103905390592/status/1245926310670565376", 
"twitter.com/1236738668905127936/status/1245926356468162560", 
"twitter.com/400431217/status/1245926363833532416", "twitter.com/1244269086088945664/status/1245926365116809216", 
"twitter.com/850227053139853312/status/1245926366781902848", 
"twitter.com/244314850/status/1245926393822605312", "twitter.com/1244446404178665472/status/1245926398578978816", 
"twitter.com/3184694718/status/1245926421601509376", "twitter.com/82208845/status/1245926438143807488", 
"twitter.com/1216588869530836992/status/1245926569303891968", 
"twitter.com/4770303330/status/1245926579936432128", "twitter.com/1245580876047499264/status/1245926591806361600", 
"twitter.com/904740870817120256/status/1245926610181574656", 
"twitter.com/934146138/status/1245926629022433280", "twitter.com/1223547711468777472/status/1245926703257366528", 
"twitter.com/840838036707393536/status/1245926832618131456", 
"twitter.com/1236738668905127936/status/1245926888087773184", 
"twitter.com/1230706447257751552/status/1245926935042994176"), 
    friendCount = c(1018, 326, 1205, 48, 3690, 1584, 55, 42, 
    580, 11, 3610, 13, 110, 13, 42, 382, 43, 13, 106, 4195, 599, 
    8, 89, 414, 280, 931, 5001, 1602, 1327, 227, 310, 5001, 26, 
    65, 2371, 31, 523, 228, 8, 671, 499, 1324, 333, 5, 852, 5457, 
    7, 48, 65, 382), screenNames = c("DayssiOK", "DrAmbrishMithal", 
    "LuvAminaKausar", "Sunnie09370280", "balajis", "World_In_Mins", 
    "CGTNOfficial", "a7BdaSSeyL4czNw", "ShellBell915", "remedair", 
    "RitasArtCafe", "trumpfacemasks", "SCC_OES", "trumpfacemasks", 
    "a7BdaSSeyL4czNw", "REX38225222", "e2p71828", "trumpfacemasks", 
    "lamsonlinshen", "SteveJumaaa", "patfloTO", "tenforadollar", 
    "sashir_milne", "rdesai711", "agrothey", "foreskinjim1", 
    "rover223", "scanman", "AlDubest2Evry1", "HurtadoMarleen", 
    "johnmik63542947", "rover223", "CowlSolomon", "spacetinyearth", 
    "jmegown52302", "DrPonnarasu", "pankajupa120", "JoaoNewman", 
    "LalalaHK1", "SaturniaC", "NYCMediaMix", "ToscasReturn", 
    "JamesDallas9175", "cornzal", "CEDRdigital", "NadraRae", 
    "SiluMa4", "1Wa49R41L3pVzQj", "spacetinyearth", "REX38225222"
    ), userID = c(33617860, 1106803026, 421517829, 1.24559e+18, 
    2178012643, 1.22e+18, 1115874631, 1.24e+18, 2729830110, 1.24e+18, 
    88875512, 1.24591e+18, 3431854829, 1.24591e+18, 1.24e+18, 
    1.23071e+18, 4437322348, 1.24591e+18, 8.29633e+17, 403961389, 
    17183161, 1408320152, 1.24566e+18, 244306637, 24327965, 1.16422e+18, 
    16328861, 6125082, 3685052935, 8.68529e+17, 1.22327e+18, 
    16328861, 1.16004e+18, 1.24e+18, 400431217, 1.24427e+18, 
    8.50227e+17, 244314850, 1.24445e+18, 3184694718, 82208845, 
    1.22e+18, 4770303330, 1.24558e+18, 9.04741e+17, 934146138, 
    1.22355e+18, 8.40838e+17, 1.24e+18, 1.23071e+18), language = c("en", 
    "en", "en", "en", "en", "en", "en", "en", "en", "en", "en", 
    "en", "en", "en", "en", "en", "en", "en", "en", "en", "en", 
    "en", "en", "en", "en", "en", "en", "en", "en", "en", "en", 
    "en", "en", "en", "en", "en", "en", "en", "en", "en", "en", 
    "en", "en", "en", "en", "en", "en", "en", "en", "en"), replyToScreenName = c("None", 
    "ArvinderSoin", "None", "None", "None", "World_In_Mins", 
    "None", "None", "None", "None", "None", "Read5000YrLeap", 
    "None", "jmcmaccarr", "None", "CNN", "None", "Constitution999", 
    "None", "None", "TwitterSafety", "None", "None", "None", 
    "None", "kittywuv1", "BeauTFC", "None", "None", "None", "None", 
    "theblondeMD", "None", "TIME", "3M", "None", "Rakshitwa", 
    "CNN", "None", "CDCgov", "None", "CTVVancouver", "maddow", 
    "None", "CEDRdigital", "None", "None", "CNN", "CNN", "kr3at"
    ), replyToID = c("None", "1.13442E+18", "None", "None", "None", 
    "1.22053E+18", "None", "None", "None", "None", "None", "154243839", 
    "None", "48150879", "None", "759251", "None", "1.04747E+18", 
    "None", "None", "95731075", "None", "None", "None", "None", 
    "1.21653E+18", "1.05676E+18", "None", "None", "None", "None", 
    "230792524", "None", "14293310", "378197959", "None", "9.81585E+17", 
    "759251", "None", "146569971", "None", "16313405", "16129920", 
    "None", "9.04741E+17", "None", "None", "759251", "759251", 
    "139283160"), retweetUserScreenName = c(NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
    ), retweetUserID = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), followersCount = c(1452, 
    3844, 2398, 1, 179896, 1283, 14036740, 24, 329, 3, 7133, 
    2, 1050, 2, 24, 121, 4, 2, 38, 2533, 235, 2, 5, 148, 2312, 
    265, 1572, 8067, 1265, 167, 13, 1574, 1, 2, 972, 1, 107, 
    7, 0, 73, 295, 1160, 849, 1, 7519, 1749, 0, 4, 2, 121), userMentions = c(NA, 
    "ArvinderSoin", NA, NA, NA, "3M", NA, NA, NA, NA, NA, "Read5000YrLeap", 
    NA, "jmcmaccarr", NA, "CNN", NA, "Constitution999", NA, NA, 
    "TwitterSafety", NA, NA, NA, NA, "kittywuv1", "BeauTFC", 
    NA, NA, NA, NA, "theblondeMD", NA, "TIME", "3M", "WHO", "Rakshitwa", 
    "CNN", NA, "CDCgov", NA, "CTVVancouver", "maddow", NA, NA, 
    NA, NA, "CNN", "CNN", "kr3at"), userMentionsID = c(NA, 1.13442e+18, 
    NA, NA, NA, 378197959, NA, NA, NA, NA, NA, 154243839, NA, 
    48150879, NA, 759251, NA, 1.05e+18, NA, NA, 95731075, NA, 
    NA, NA, NA, 1.21653e+18, 1.05676e+18, NA, NA, NA, NA, 230792524, 
    NA, 14293310, 378197959, 14499829, 9.81585e+17, 759251, NA, 
    146569971, NA, 16313405, 16129920, NA, NA, NA, NA, 759251, 
    759251, 139283160), hashtag1 = c("coronavirus", NA, "corona", 
    "mask", NA, "Florida", "Coronavirus", NA, "coronavirus", 
    "facemask", "coronavirus", "Boycott3M", NA, "Boycott3M", 
    NA, "WearMask", NA, NA, "Covid19", "COVID19", NA, NA, NA, 
    "covid19", NA, "covid19", NA, NA, NA, "homemade", "covid19", 
    NA, "China", NA, "COVID19Pandemic", "recommendations", NA, 
    "Covit19", NA, "COPD", NA, NA, NA, NA, "COVID19", NA, NA, 
    "BoycottChina", NA, "WearMask"), hashtag2 = c(NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA), mediatype = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), mediaURL = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA)), class = c("spec_tbl_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -50L), spec = structure(list(
    cols = list(createdAt = structure(list(), class = c("collector_character", 
    "collector")), timestamp = structure(list(), class = c("collector_double", 
    "collector")), id_str = structure(list(), class = c("collector_double", 
    "collector")), text = structure(list(), class = c("collector_character", 
    "collector")), retweetCount = structure(list(), class = c("collector_double", 
    "collector")), favorite_count = structure(list(), class = c("collector_double", 
    "collector")), url = structure(list(), class = c("collector_character", 
    "collector")), friendCount = structure(list(), class = c("collector_double", 
    "collector")), screenNames = structure(list(), class = c("collector_character", 
    "collector")), userID = structure(list(), class = c("collector_double", 
    "collector")), language = structure(list(), class = c("collector_character", 
    "collector")), replyToScreenName = structure(list(), class = c("collector_character", 
    "collector")), replyToID = structure(list(), class = c("collector_character", 
    "collector")), retweetUserScreenName = structure(list(), class = c("collector_logical", 
    "collector")), retweetUserID = structure(list(), class = c("collector_logical", 
    "collector")), followersCount = structure(list(), class = c("collector_double", 
    "collector")), userMentions = structure(list(), class = c("collector_character", 
    "collector")), userMentionsID = structure(list(), class = c("collector_double", 
    "collector")), hashtag1 = structure(list(), class = c("collector_character", 
    "collector")), hashtag2 = structure(list(), class = c("collector_logical", 
    "collector")), mediatype = structure(list(), class = c("collector_logical", 
    "collector")), mediaURL = structure(list(), class = c("collector_logical", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))
> groups <- (split(sample_df, (seq(nrow(sample_df))-1) %/% 20)) #here I want 20 rows per file until last row is reached
> for (i in seq_along(groups)) {
+   write.csv(groups[[i]], paste0("sample_output_file", i, ".csv")) #iterate and write file
+ }

我们可以从 createdAt 创建一个变量,然后对 group_split 执行 group_split 到 data.frame 的 list。在这里,我们可以使用 str_replace 提取特定子字符串,方法是删除第一个单词后跟 space,同时捕获下一个单词 space,一些数字并将其用于替换。

library(dplyr)
library(stringr)
sample_df %>% 
  mutate(month_day = str_replace(createdAt, 
           "^\w+\s+(\w+\s+\d+).*", "\1")) %>%
  group_split(month_day)

注意:不需要 mutate,因为 month_day 可以在 group_split 本身

中即时创建