使用列表索引在 Pandas 中创建列
Using an list index to create column in Pandas
我坚持这个的时间比我愿意承认的要长。我正在尝试使用列表的索引来创建基于 Day 列的新列。我敢肯定这非常简单。我真正想做的就是计算今天和其他日子之间的天差。
甚至可能有一种方法可以用日期时间得到我的结果,但我还没有找到任何一种解决方案。
import pandas as pd
from datetime import datetime
today = datetime.today().strftime('%Y/%m/%d')
todays_week_day = str.upper(str(datetime.today().strftime('%a')))
# Lets assume today is "THU" for this example
todays_week_day = "THU"
day_abrivs = list(["SUN", "MON", "TUE", "WED", "THU", "FRI", "SAT"])
todays_week_day_num = day_abrivs.index(todays_week_day)
df=
attendance day
0 1546 FRI
1 1978 SAT
2 2150 SUN
df['day_num'] = day_abrivs.index(df['day'])
df['day_diff'] = df['day_num'] - todays_week_day_num
# This gives the following error on the Day_Num col so I don't even get to the Day_diff
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Python_Projects\Shell-B\venv\lib\site-packages\pandas\core\generic.py", line 1537, in __nonzero__
raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
期望的输出如下:
df=
attendance day day_num day_diff
0 1546 FRI 5 1
1 1978 SAT 6 2
2 2150 SUN 0 -4
您之所以会收到该错误,主要是因为您没有将单个字符串值传递给 index
方法,而是传递了一个 Serie
。所以我推荐使用Series.apply
的方法来获取每一天的标识。看看这个:
# Your initial dataframe
df = pd.read_csv(io.StringIO("""
atendance,day
1546,FRI
1978,SAT
2150,SUN
"""))
df['day_num'] = df['day'].apply(lambda d: day_abrivs.index(d))
df['day_diff'] = df['day_num'] - todays_week_day_num
print(df)
输出:
atendance
day
day_num
day_diff
0
1546
FRI
5
1
1
1978
SAT
6
2
2
2150
SUN
0
-4
你不应该使用apply,这里你可以制作一个映射字典:
day_abrivs_dic = {k:v for v,k in enumerate(day_abrivs)}
# {'SUN': 0, 'MON': 1, 'TUE': 2, 'WED': 3, 'THU': 4, 'FRI': 5, 'SAT': 6}
df['day_num'] = df['day'].map(day_abrivs_dic)
df['day_diff'] = df['day_num'] - todays_week_day_num
输出:
attendance day day_num day_diff
0 1546 FRI 5 1
1 1978 SAT 6 2
2 2150 SUN 0 -4
我坚持这个的时间比我愿意承认的要长。我正在尝试使用列表的索引来创建基于 Day 列的新列。我敢肯定这非常简单。我真正想做的就是计算今天和其他日子之间的天差。
甚至可能有一种方法可以用日期时间得到我的结果,但我还没有找到任何一种解决方案。
import pandas as pd
from datetime import datetime
today = datetime.today().strftime('%Y/%m/%d')
todays_week_day = str.upper(str(datetime.today().strftime('%a')))
# Lets assume today is "THU" for this example
todays_week_day = "THU"
day_abrivs = list(["SUN", "MON", "TUE", "WED", "THU", "FRI", "SAT"])
todays_week_day_num = day_abrivs.index(todays_week_day)
df=
attendance day
0 1546 FRI
1 1978 SAT
2 2150 SUN
df['day_num'] = day_abrivs.index(df['day'])
df['day_diff'] = df['day_num'] - todays_week_day_num
# This gives the following error on the Day_Num col so I don't even get to the Day_diff
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Python_Projects\Shell-B\venv\lib\site-packages\pandas\core\generic.py", line 1537, in __nonzero__
raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
期望的输出如下:
df=
attendance day day_num day_diff
0 1546 FRI 5 1
1 1978 SAT 6 2
2 2150 SUN 0 -4
您之所以会收到该错误,主要是因为您没有将单个字符串值传递给 index
方法,而是传递了一个 Serie
。所以我推荐使用Series.apply
的方法来获取每一天的标识。看看这个:
# Your initial dataframe
df = pd.read_csv(io.StringIO("""
atendance,day
1546,FRI
1978,SAT
2150,SUN
"""))
df['day_num'] = df['day'].apply(lambda d: day_abrivs.index(d))
df['day_diff'] = df['day_num'] - todays_week_day_num
print(df)
输出:
atendance | day | day_num | day_diff | |
---|---|---|---|---|
0 | 1546 | FRI | 5 | 1 |
1 | 1978 | SAT | 6 | 2 |
2 | 2150 | SUN | 0 | -4 |
你不应该使用apply,这里你可以制作一个映射字典:
day_abrivs_dic = {k:v for v,k in enumerate(day_abrivs)}
# {'SUN': 0, 'MON': 1, 'TUE': 2, 'WED': 3, 'THU': 4, 'FRI': 5, 'SAT': 6}
df['day_num'] = df['day'].map(day_abrivs_dic)
df['day_diff'] = df['day_num'] - todays_week_day_num
输出:
attendance day day_num day_diff
0 1546 FRI 5 1
1 1978 SAT 6 2
2 2150 SUN 0 -4