KeyError : 0 - When looping through a dataframe

Question

提前感谢您对此的帮助。我对此还很陌生，我真的不知道自己在做什么。我已经尝试了多种方法来做到这一点，但我不断收到错误。我试过使用 iterrows、iloc、loc 等等，但没有成功。我不明白如何获取每一行数据并使用该行中的值发送电子邮件。

代码：

email_list 
+---+-------------+-----------------+------+------------+---------------------------+---------------+----------------------------------+
|   | Client Name | Staff Name      | Role | Due Date   | Submission ID             | Staff Email   | Generated Due Docs ID            |
+---+-------------+-----------------+------+------------+---------------------------+---------------+----------------------------------+
| 1 | H.Pot       | JohannaNameLast | IP   | 2020-04-01 | H.POT-Johanna-IP-4/1/2020 | xyz@gmail.com | h.potjohannanamelastip2020-04-01 |
+---+-------------+-----------------+------+------------+---------------------------+---------------+----------------------------------+
| 2 | S.Man       | DaveSmith       | TS   | 2020-04-01 | S.MAN-David-TS-4/1/2020   | abc@gmail.com | s.mandabc2020-04-01              |
+---+-------------+-----------------+------+------------+---------------------------+---------------+----------------------------------+
| 3 | S.Man       | LouisLastName   | IP   | 2020-04-01 | S.MAN-Louis-IP-4/1/2020   | def@gmail.com | s.manlouislastnameip2020-04-01   |
+---+-------------+-----------------+------+------------+---------------------------+---------------+----------------------------------+
| 5 | T.Hul       | KellyDLastName  | IP   | 2020-04-01 | T.HUL-Kelly-IP-4/1/2020   | ghi@gmail.com | t.hulkelleydlastnameip2020-04-01 |
+---+-------------+-----------------+------+------------+---------------------------+---------------+----------------------------------+

# Get all the Names, Email Addresses, roles and due dates.
all_clients = email_list['Client Name']
all_staff = email_list['Staff Name']
all_roles = email_list['Role']
#all_types = email_list['Form Type']
all_due_dates = email_list['Due Date']
all_emails = email_list['Staff Email']

for idx in range(len(email_list)):
    # Get each records name, email, subject and message
    client = all_clients[idx]
    staff = all_staff[idx]
    role = all_roles[idx]
    #form_type = all_types[idx]
    due_date = all_due_dates[idx]
    email_address = all_emails[idx]

    # Get all the Names, Email Addresses, roles and due dates.
    subject = f"Your monthly summary was Due on {due_date} Days For {client.upper()}"
    message = f"Hi {staff.title()}, \n\nThe {form_type} is due in {due_date} days for {client.upper()}.  Please turn it in before the due date. \n\nThanks, \n\nJudy"


full_email = ("From: {0} <{1}>\n"
                  "To: {2} <{3}>\n"
                  "Subject: {4}\n\n"
                  "{5}"
                  .format(your_name, your_email, staff, email_address, subject, message))
    # In the email field, you can add multiple other emails if you want
    # all of them to receive the same text
try:
    server.sendmail(your_email, [email_address], full_email)
    print('Email to {} successfully sent!\n\n'.format(email_address))
except Exception as e:
    print('Email to {} could not be sent :( because {}\n\n'.format(email_address, str(e)))

# Close the smtp server
server.close()

Answer 1

email_list.iterrows() returns 一个迭代器，它产生索引以及数据框中该索引的行。所以迭代可以这样进行：

for idx, row in email_list.iterrows():
    # Get each records name, email, subject and message
    client = row['Client Name']
    staff = row['Staff Name']
    role = row['Role']
    #form_type = row['Form Type']
    due_date = row['Due Date']
    email_address = row['Staff Email']

    # Get all the Names, Email Addresses, roles and due dates.
    subject = f"Your monthly summary was Due on {due_date} Days For {client.upper()}"
    message = f"Hi {staff.title()}, \n\nThe {form_type} is due in {due_date} days for {client.upper()}.  Please turn it in before the due date. \n\nThanks, \n\nJudy"

您可以了解更多关于 pandas.DataFrame.iterrows() here

Answer 2

提取所有列然后获取相应元素是一种错误的模式来自每个这样的变量（包含一个列）。

使用以下模式：

for idx, row in email_list.iterrows():
    row.Role
    row['Staff Name']

如果您不使用 idx，请改用 _。

这个变体比你的要快得多。上面的代码实际执行这里是单次迭代（超过行），而您的代码执行：

也是对行号的单次迭代，
但随后您的代码会针对单个元素执行 n 次查找，在每列 .

让我们回到我的代码示例。有 2 种变体可以访问当前行的元素：

row.Role - 如果列名不包含 "special" 个字符（例如空格）。
row['Staff Name'] - 在其他（更复杂的）情况下。

而你得到KeyError: 0.

的原因

注意：

您的行的索引以 1 开头（最左边的列，没有标题),
但是在你的循环中 idx 从 0,
访问每个 "column variable" 实际上只是由 索引值，不是 "wanted" 元素的整数位置。

所以错误发生在循环的第一圈，当你：

有idx == 0,
没有列变量（实际上是一个系列）包含一个元素索引 == 0.

实际上Pandas在这里使用了2个不同的名字（Key和index value) 对于同一件事，因此可以讨论这在何种程度上消息是可读的。你无能为力。你只需要知道它。

或者，如果您出于某种原因希望保留代码的当前版本，仅将 for 指令更改为：

for idx in range(1, len(email_list) + 1):
    ...

那么这个循环将从idx == 1开始，应该不会出错，只要因为您的索引是连续个数字，从 1.

开始

但正如我注意到的，您的指数：

以 1、2 和 3 开头（到目前为止还不错），
但是有一个 "gap"，你没有索引为 4.

KeyError : 0 - When looping through a dataframe

KeyError : 0 - When looping through a dataframe

python

loops

pandas

keyerror