使用列表索引在 Pandas 中创建列

Using an list index to create column in Pandas

我坚持这个的时间比我愿意承认的要长。我正在尝试使用列表的索引来创建基于 Day 列的新列。我敢肯定这非常简单。我真正想做的就是计算今天和其他日子之间的天差。

甚至可能有一种方法可以用日期时间得到我的结果,但我还没有找到任何一种解决方案。

import pandas as pd
from datetime import datetime


today = datetime.today().strftime('%Y/%m/%d')
todays_week_day = str.upper(str(datetime.today().strftime('%a')))

# Lets assume today is "THU" for this example

todays_week_day = "THU"

day_abrivs = list(["SUN", "MON", "TUE", "WED", "THU", "FRI", "SAT"])

todays_week_day_num = day_abrivs.index(todays_week_day)


df=
    attendance          day
 0     1546             FRI 
 1     1978             SAT 
 2     2150             SUN

df['day_num'] = day_abrivs.index(df['day'])
df['day_diff'] = df['day_num'] - todays_week_day_num

# This gives the following error on the Day_Num col so I don't even get to the Day_diff

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Python_Projects\Shell-B\venv\lib\site-packages\pandas\core\generic.py", line 1537, in __nonzero__
    raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

期望的输出如下:

df=
    attendance          day     day_num    day_diff
 0     1546             FRI        5          1
 1     1978             SAT        6          2
 2     2150             SUN        0         -4

您之所以会收到该错误,主要是因为您没有将单个字符串值传递给 index 方法,而是传递了一个 Serie。所以我推荐使用Series.apply的方法来获取每一天的标识。看看这个:

# Your initial dataframe
df = pd.read_csv(io.StringIO("""
atendance,day
1546,FRI
1978,SAT
2150,SUN
"""))

df['day_num'] = df['day'].apply(lambda d: day_abrivs.index(d))
df['day_diff'] = df['day_num'] - todays_week_day_num
print(df)

输出:

atendance day day_num day_diff
0 1546 FRI 5 1
1 1978 SAT 6 2
2 2150 SUN 0 -4

你不应该使用apply,这里你可以制作一个映射字典:

day_abrivs_dic = {k:v for v,k in enumerate(day_abrivs)}
# {'SUN': 0, 'MON': 1, 'TUE': 2, 'WED': 3, 'THU': 4, 'FRI': 5, 'SAT': 6}

df['day_num'] = df['day'].map(day_abrivs_dic)

df['day_diff'] = df['day_num'] - todays_week_day_num

输出:

   attendance  day  day_num  day_diff
0        1546  FRI        5         1
1        1978  SAT        6         2
2        2150  SUN        0        -4