使用嵌套循环将数据附加到 CSV

Append data to CSV using a nested loop

我正在尝试使用函数 append_to_csv.

将包含 Twitter 数据的列表 json_response 中的数据附加到 CSV 文件

我了解json_response的结构。它包含关注两个政客的用户的数据;分别有 5 个和 13 个用户。 1) author_idcreated_attweet_idtextdata 中。 2) description/bio['includes']['users']。 3) url/image_url['includes']['media']。但是我的嵌套循环没有将任何数据附加到 sample_data.csv?它不会抛出任何错误。跟我的身份有关系吗?

print(json.dumps(json_response, indent=4, sort_keys=True))  # look at json_response object.
[
    {
        "data": [
            {
                "author_id": "2877379617",
                "created_at": "2021-03-25T12:11:14.000Z",
                "id": "1375057688355336195",
                "text": "@prettynobodyco She blocked me in 2015 - for pointing out that Tim Kaine enables sexual assault in the military and the evidence was his killing of the MJIA and publicly stated that Military commanders should remain in charge of military rape cases. She's Tanden level awful. Congrats!"
            },
            {
                "author_id": "1265018154444562440",
                "created_at": "2021-03-22T19:48:59.000Z",
                "id": "1374085719472361474",
                "text": "@MehcatCat @AlasscanIsBack @PattyArquette @timkaine Funny, they blocked me. \ud83e\udd23\ud83e\udd23"
            },
            {
                "author_id": "2378324935",
                "created_at": "2021-03-07T21:32:13.000Z",
                "id": "1368675879312887810",
                "text": "@DrWinarick @KatieOGrady4 I apologize for any drama. Katie O Grady blocked me because we had a disagreement about Tim Kaine on one of your older posts. I guess I can't please everyone haha. :/"
            },
            {
                "author_id": "821870502943817729",
                "created_at": "2021-02-12T23:53:59.000Z",
                "id": "1360376637385244673",
                "text": "She blocked me a long ass time ago when I asked her why we shoulf care about Tim Kaine's personal view on abortion if it didn't impact legislation"
            },
            {
                "attachments": {
                    "media_keys": [
                        "16_1341045032732770306"
                    ]
                },
                "author_id": "17232340",
                "created_at": "2020-12-21T15:37:07.000Z",
                "id": "1341045038420275205",
                "text": "@DSingh4Biden @moomintroll8 @timkaine @GovernorVA That's why I replied to you. She blocked me previously, for what silliness I can't remember. Tough being a troll AND a snowflake!"
            }
        ],
        "includes": {
            "media": [
                {
                    "media_key": "16_1341045032732770306",
                    "type": "animated_gif"
                }
            ],
            "users": [
                {
                    "created_at": "2014-11-15T02:23:57.000Z",
                    "description": "",
                    "id": "2877379617",
                    "name": "Laura Saylor",
                    "username": "lauraleesaylor"
                },
                {
                    "created_at": "2020-05-25T20:33:36.000Z",
                    "description": "Weird Writer & Lunatic Linguist\nWicked Witch of the East\nshe/her",
                    "id": "1265018154444562440",
                    "name": "Zauberkind",
                    "username": "Zauberkind2"
                },
                {
                    "created_at": "2014-03-08T07:22:31.000Z",
                    "description": "#Resist, #BLM, #Vaxxed, liberal, autistic, kidney transplant survivor, political nerd, mental health advocate, fighter for equality, truth, justice, etc.",
                    "id": "2378324935",
                    "name": "Trevor \"Trev\" McKee Achilles",
                    "username": "MrTAchilles"
                },
                {
                    "created_at": "2017-01-19T00:02:52.000Z",
                    "description": "statist /  Progressive Gun Nut/ Single and hating it\n\n / \n\nstraight????? /\n\npronouns / brain worm survivor\n\n",
                    "id": "821870502943817729",
                    "name": "Puppet Enthusiast",
                    "username": "nihilisticpillo"
                },
                {
                    "created_at": "2008-11-07T15:09:46.000Z",
                    "description": "Liberal-Veteran-Dog Lover | Taste for irony, but in moderation | Humor is reason gone mad. ~Groucho Marx | I follow & unfollow back #VeteransResist #Resist",
                    "id": "17232340",
                    "name": "anti-Fascist Jim",
                    "username": "JimnBL"
                }
            ]
        },
        "meta": {
            "newest_id": "1375057688355336195",
            "next_token": "b26v89c19zqg8o3fos5vyedr54ngvtx3nuqvnx6pglrb1",
            "oldest_id": "1341045038420275205",
            "result_count": 5
        }
    },
    {
        "data": [
            {
                "author_id": "737885223858384896",
                "created_at": "2021-03-26T21:56:02.000Z",
                "id": "1375567243082338314",
                "text": "@hogan_1969 @LindseyGrahamSC LOL She Blocked me.. could not admit the truth could she now. okay so where is her source for the shirts? and that is what he said. I (quote) We immediately surge the border all those seeking asylum. What about his lie about the cages? no Answer lol."
            },
            {
                "author_id": "847612931487416323",
                "created_at": "2021-03-26T21:55:24.000Z",
                "id": "1375567083791073283",
                "text": "@hogan_1969 @TeichTerry @thehill @LindseyGrahamSC @hogan_1969 just blocked me for showing her the actual numbers \ud83e\udd23\n\n#LiberalsHateFacts"
            },
            {
                "author_id": "18634205",
                "created_at": "2021-03-08T12:29:00.000Z",
                "id": "1368901564363051010",
                "text": "Huh.  Made me think if @LeaderMcConnell @LindseyGrahamSC @marcorubio @SenTedCruz feel trapped under the thumb of Trumpy.  And who else? @IvankaTrump? @MELANIATRUMP ? @DonaldJTrumpJr ? I\u2019d say Eric, but he blocked me."
            },
            {
                "author_id": "27327319",
                "created_at": "2021-03-02T11:53:16.000Z",
                "id": "1366718245521211393",
                "text": "@fedupinNHtoo @LindseyGrahamSC Exactly. I asked that question of a Republican on Facebook last night and she blocked me"
            },
            {
                "author_id": "917634626247647232",
                "created_at": "2021-02-28T18:16:45.000Z",
                "id": "1366089974907432961",
                "text": "@gop this is for you! @tedcruz @LindseyGrahamSC @MittRomney @mikepompeo\n#BitchyMcC blocked me!\ud83d\udc4d\nWatch \"Jack Off Jill - Hypocrite + lyrics\" on YouTube"
            },
            {
                "author_id": "1231059979844456448",
                "created_at": "2021-02-26T04:25:49.000Z",
                "id": "1365156089554067459",
                "text": "@KelleyALynch1 @marwilliamson @therecount @LindseyGrahamSC She's fine with that just as she's fine with Biden's Nazis in Ukraine. She wants war with Russia, too. She blocked me for this tweet because she couldn't even condemn Biden's Nazis in Ukraine. She's a fauxgressive warmonger, a wolf in sheep's clothing. \n"
            },
            {
                "author_id": "1315477593303310336",
                "created_at": "2021-02-23T00:00:41.000Z",
                "id": "1364002202843451399",
                "text": "@MistyKitty3 @BlairMurray83 @FrankAmari2 @LindseyGrahamSC \ud83e\udd23 Someone didn\u2019t like what I said and blocked me."
            },
            {
                "author_id": "1069115263671562240",
                "created_at": "2021-02-22T04:36:06.000Z",
                "id": "1363709124891070467",
                "text": "@trinkity88 @LindseyGrahamSC Apparently, @Trinkitty88 blocked me because FACTS are TOO HARD to handle!\ud83e\udd23\ud83e\udd23\ud83e\udd23\ud83e\udd23\ud83e\udd23\ud83e\udd23"
            },
            {
                "author_id": "1303321972227690496",
                "created_at": "2021-02-20T19:38:49.000Z",
                "id": "1363211526316969985",
                "text": "@horsin64 @GovMurphy @LindseyGrahamSC You blocked me because you\u2019re a nifkin. It\u2019s not cyber tough you Nancy I\u2019d say it to your face. American lives matter before anyone else. America first and you don\u2019t like it because you have trump derangement. You\u2019re a psycho"
            },
            {
                "author_id": "27943005",
                "created_at": "2021-02-19T20:00:38.000Z",
                "id": "1362854626924650497",
                "text": "@TonyRom31334975 @staceyabrams @AnnaForFlorida @LindseyGrahamSC The guy blocked me on Twitter and had to unblock me after the Knight First Amendment Institute sued him and won> I am certain It won't talk to me, but imagine..hehe?!"
            },
            {
                "attachments": {
                    "media_keys": [
                        "3_1361344652264280068"
                    ]
                },
                "author_id": "1126249378279297027",
                "created_at": "2021-02-15T16:00:32.000Z",
                "id": "1361344654395011079",
                "text": "@Jamie1074 @Breaking911 You know what\n\nIt's funny that they blocked me because I actually did agree with them on Lindsey Graham...\n\nCome on, man !"
            },
            {
                "author_id": "1207432044390699008",
                "created_at": "2021-02-14T07:58:21.000Z",
                "id": "1360860918687559681",
                "text": "@LindseyGrahamSC I really don't know why you haven't blocked me yet. Pile of human shit. I just read a letter that John McCain wrote me and for some reason it made me think about you and what he would think about your behavior. I guarantee you'd be in for an ass whippin'. Dick."
            },
            {
                "author_id": "926909484",
                "created_at": "2021-02-13T20:53:03.000Z",
                "id": "1360693490880032770",
                "text": "@LadyReverbs @themariefonseca @styvanswift @LindseyGrahamSC Lady, you might be able to see Marie\u2019s tweets. She blocked me. She may call this a victory for Trump. The reality is that seven members of the @GOP voted to convict. They are the true patriots of the Republican Party."
            }
        ],
        "includes": {
            "media": [
                {
                    "media_key": "3_1361344652264280068",
                    "type": "photo",
                    "url": ""
                }
            ],
            "users": [
                {
                    "created_at": "2016-06-01T05:55:21.000Z",
                    "description": "Biden Inflation the worst in 30 years. His Handlers trying to Rebrand Brandon is Hilarious.",
                    "id": "737885223858384896",
                    "name": "Biden is a complete mess and you know it.",
                    "username": "zelda3024"
                },
                {
                    "created_at": "2017-03-31T00:54:05.000Z",
                    "description": "Love God, Love Family, Love Country, Love Freedom - if we put those things first everything else will be great. MAGA",
                    "id": "847612931487416323",
                    "name": "Joey Bagadonuts",
                    "username": "AmericanGr8ness"
                },
                {
                    "created_at": "2009-01-05T15:25:55.000Z",
                    "description": "small & local garlic farmer; independent American; old surfer dude; working to find and speak truth to power; \ud83c\uddfa\ud83c\uddf8; mahalo and Maluhia",
                    "id": "18634205",
                    "name": "MacGregorGarlic",
                    "username": "MacGregorGarlic"
                },
                {
                    "created_at": "2009-03-28T22:53:28.000Z",
                    "description": "Let's Go Darwin!",
                    "id": "27327319",
                    "name": "Karen Kennedy",
                    "username": "KayKay68"
                },
                {
                    "created_at": "2017-10-10T06:15:18.000Z",
                    "description": "Mom\ud83d\udc95Cannactivist\ud83c\udf3fSecularHumanist\ud83c\udf10 BLM\u270a\ud83c\udfff\ud83c\udf08Ally\ud83e\udd8bCPTSD\u2695\ufe0f FTD\ud83e\udd14MeToo\ud83c\udf38ProChoice\ud83d\udc93CRPS\ud83d\ude23ClimateChange\ud83c\udf0e DACA\ud83c\uddfa\ud83c\uddf2AdoptDontShop\ud83d\udc3e#Steelers \ud83d\udda4\ud83d\udc9b #Vaxxed2TheMax\u270a\ud83d\udc9a",
                    "id": "917634626247647232",
                    "name": "Raven The Hemptress #LegalizeGlobally\ud83d\udc9a\ud83c\udf3f\u267f",
                    "username": "Kraven_Raven24"
                },
                {
                    "created_at": "2020-02-22T03:35:56.000Z",
                    "description": "Monetarism is the underlying cause of our disease; human progress and peace through development is the cure. Eurasian integration will benefit all of humanity!",
                    "id": "1231059979844456448",
                    "name": "\ud83c\udd70pocalypsis \ud83c\udd70pocalypseos \u2014 BRI Is The Future",
                    "username": "apocalypseos"
                },
                {
                    "created_at": "2020-10-12T02:21:21.000Z",
                    "description": "Father of two beautiful boys. Believer in the Constitution of the United States. Protector of my own rights. #Meatatarian",
                    "id": "1315477593303310336",
                    "name": "\ud83e\udd85 Steven Duggin \u2665\ufe0f \ud83c\uddfa\ud83c\uddf8\ud83d\uddfd",
                    "username": "itsStevenDuggin"
                },
                {
                    "created_at": "2018-12-02T06:25:16.000Z",
                    "description": "",
                    "id": "1069115263671562240",
                    "name": "Barhag",
                    "username": "TheBarhag"
                },
                {
                    "created_at": "2020-09-08T13:19:17.000Z",
                    "description": "Not the liberals cup of tea",
                    "id": "1303321972227690496",
                    "name": "Christy",
                    "username": "Christy54177764"
                },
                {
                    "created_at": "2009-03-31T19:34:24.000Z",
                    "description": "NY-grown, FL-tanned, scribe, word nerd, TV junkie, game show champ, yenta, wife, twin mama, hot sauce collector, Bloody Mary maven &, says @NYPost, savvy gadfly",
                    "id": "27943005",
                    "name": "Lesley Abravanel",
                    "username": "lesleyabravanel"
                },
                {
                    "created_at": "2019-05-08T22:15:51.000Z",
                    "description": "\u2600\ufe0f I post Yuuko Aioi pictures daily \u2600\ufe0f\n\nI also like being wholesome, making new friends, posting about games, my everyday life, cats, NASCAR, good vibes, fumos!",
                    "id": "1126249378279297027",
                    "name": "Vaxen #DailyYuuko \u2603\ufe0f",
                    "username": "YuukoEnjoyer"
                },
                {
                    "created_at": "2019-12-18T22:47:10.000Z",
                    "description": "The Republican party is bad for America. The Conservatives are Trump bootlickers who are afraid to stand up to him. This great nation is in serious trouble.",
                    "id": "1207432044390699008",
                    "name": "Angry Patriot",
                    "username": "AngryPatriot20"
                },
                {
                    "created_at": "2012-11-05T05:19:37.000Z",
                    "description": "Employment lawyer. Represent employers and employees. 30 years ago, my mentor told me to seek the truth as a lawyer. Still do that. Tweets are not legal advice.",
                    "id": "926909484",
                    "name": "Alfred Southerland",
                    "username": "TexasEEOLaw"
                }
            ]
        },
        "meta": {
            "newest_id": "1375567243082338314",
            "next_token": "b26v89c19zqg8o3fosnr8q7zstmzppg3jgd1cvynkb919",
            "oldest_id": "1360693490880032770",
            "result_count": 13
        }
    }
]

# Create file
csvFile = open("sample_data.csv", "a", newline="", encoding='utf-8')
csvWriter = csv.writer(csvFile)

# Create headers for the data I want to save. I only want to save these columns in my dataset
csvWriter.writerow(
    ["author_id", "created_at", "tweet_id", "text", "bio", "image_url"])
csvFile.close()



def append_to_csv(json_response, csvFile):
    # counter variable
    global author_id, created_at, tweet_id, text, bio, image_url

    # open CSV file
    csvFile = open(csvFile, "a", newline="", encoding='utf-8')
    csvWriter = csv.writer(csvFile)

    # loop through each tweet
    for each_dict in json_response:
        
        # loop 1. author ID, time created, tweet ID tweet text
        for tweet in each_dict['data']:

            # 1. Author ID
            author_id = tweet['author_id']

            # 2. Time created
            created_at = dateutil.parser.parse(tweet['created_at'])

            # 3. Tweet ID
            tweet_id = tweet['id']

            # 4. Tweet text
            text = tweet['text']
            
            # loop 2. description/bio loop
            for dic in each_dict['includes']['users']:

                # 5. description
                if 'description' in dic:
                    bio = dic['description']
                else:
                    bio = " "

                    # loop 3. image_url/url loop
                    for element in each_dict['includes']['media']:

                        # 6. image url
                        if 'url' in element:
                            image_url = element['url']
                        else:
                            image_url = " "

                    # assemble all data in a list
                    res = [author_id, created_at, tweet_id, text, bio, image_url]
                    csvWriter.writerow(res)

                    # close CSV file
                    csvFile.close()


append_to_csv(json_response, "sample_data.csv")

可以看出df只包含预定义的列名。

# import sample_data.csv as df
df = pd.read_csv(r'path...\sample_data.csv')

print(df)
Empty DataFrame
Columns: [author_id, created_at, tweet_id, text, bio, image_url]
Index: []

已编辑:更改 # 3 loopcsvFile.close() 中的缩进。

def append_to_csv(json_response, csvFile):
    # counter variable
    global author_id, created_at, tweet_id, text, bio, image_url

    # open CSV file
    csvFile = open(csvFile, "a", newline="", encoding='utf-8')
    csvWriter = csv.writer(csvFile)

    # loop through each tweet
    for each_dict in json_response:

        # loop 1. author ID, time created, tweet ID tweet text
        for tweet in each_dict['data']:

            # 1. Author ID
            author_id = tweet['author_id']

            # 2. Time created
            created_at = dateutil.parser.parse(tweet['created_at'])

            # 3. Tweet ID
            tweet_id = tweet['id']

            # 4. Tweet text
            text = tweet['text']

            # loop 2. description/bio loop
            for dic in each_dict['includes']['users']:

                # 5. description
                if 'description' in dic:
                    bio = dic['description']
                else:
                    bio = " "

                # loop 3. image_url/url loop
                for element in each_dict['includes']['media']:

                    # 6. image url
                    if 'url' in element:
                        image_url = element['url']
                    else:
                        image_url = " "

                    # assemble all data in a list
                    res = [author_id, created_at, tweet_id, text, bio, image_url]
                    csvWriter.writerow(res)

    # close CSV file
    csvFile.close()

现在的问题是,append_to_csv 为关注第一个政客的 5 个用户添加了 5 次相同的推文,为关注第二个政客的 13 个用户添加了 13 次,导致 df 194 行而不是 18 行。

看起来 if 'description' in dic: 的 else 分支从未执行过。如果您的代码缩进正确,那么 csvWriter.writerow 部分也不会因此而执行。

这表明没有内容写入您的文件。


代码风格点评:

  • 使用with open(file) as file_variable: 而不是手动使用打开和关闭。这可以为您省去一些麻烦,例如当确实执行 else 分支并且文件将多次关闭时你会遇到的麻烦:)

json_response 中有 两个 each_dict 个对象。他们分别有 5 条和 13 条 推文 (each_dict['data'])。 另外,each_dict['includes']['users']中分别有5个和13个元素

你得到了 194 个元素,因为在 for each_dict in json_response: 的第一次迭代中你保存了数据 5x5=25 次(loop 2 对 [=] 中的每条推文执行了 5 次28=]循环 1)。在第二次迭代中,您保存数据 13x13=169 次(loop 2loop 1 中的每条推文执行 13 次)。

您应该将数据附加到 loop 2 之外的 csv。也就是说,

for each_dict in json_response:

    for tweet in each_dict['data']:
        # ...
        
        for dic in each_dict['includes']['users']:
            # ...
        
        res = [author_id, created_at, tweet_id, text, bio, image_url]
        csvWriter.writerow(res)

此外,我建议使用 pandas 数据框来存储您需要的信息并保存到 csv。它使代码更具可读性,您不必担心打开缓冲区。请参阅下面我的建议,包括重命名:

import pandas as pd

df = pd.DataFrame()

for each_dict in json_response:
    
    for tweet in each_dict['data']:
        row = {}
        row["author_id"] = tweet['author_id']
        row["created_at"] = dateutil.parser.parse(tweet['created_at'])
        row["tweet_id"] = tweet['id']
        row["text"] = tweet['text']
        
        for user in each_dict['includes']['users']:
            if user["id"] == row["author_id"]:
                row["bio"] = user['description']#.encode('utf-16','surrogatepass').decode('utf-16') # uncomment this if you get UnicodeError
        
        for media in each_dict['includes']['media']:
            row['image_url'] = media.get('url', ' ')

        df = df.append(row, ignore_index=True)  
        # Note, since the dataframe is initially empty with no columns, appending a dictionary (i.e, row) will automatically generate the header based on the dictionary's keys.  

df.to_csv('path/to/file.csv')

输出

               tweet_id            author_id                created_at   ...
0   1375057688355336195           2877379617  2021-03-25T12:11:14.000Z   ...
1   1374085719472361474  1265018154444562440  2021-03-22T19:48:59.000Z   ...
...
17  1360693490880032770            926909484  2021-02-13T20:53:03.000Z   ...