Python 3 根据其他列的变量创建列
Python 3 Creating a column based on variables from other columns
我有一个包含年、月和星期几的数据集。但是,它缺少该月的实际日期(即从第 1 天到第 30 天)。数据集如下:
# Year Month Day_Of_Week
22024 2002 January Tuesday
22101 2002 January Wednesday
22146 2002 January Thursday
22201 2002 January Friday
22247 2002 January Saturday
22280 2002 January Sunday
22335 2002 January Monday
22383 2002 January Tuesday
22384 2002 January Wednesday
22424 2002 January Thursday
22459 2002 January Friday
22511 2002 January Saturday
22598 2002 January Sunday
22599 2002 January Monday
22686 2002 January Tuesday
22687 2002 January Wednesday
22688 2002 January Wednesday
22689 2002 January Wednesday
22761 2002 January Wednesday
22762 2002 January Wednesday
22763 2002 January Wednesday
22764 2002 January Wednesday
22765 2002 January Thursday
22766 2002 January Thursday
22767 2002 January Thursday
22768 2002 January Thursday
22814 2002 January Friday
22815 2002 January Friday
22816 2002 January Friday
22817 2002 January Friday
22818 2002 January Friday
查找日期的逻辑非常简单。 table 中的第一条记录是第 1 天的记录。第二条记录是第 2 天的记录,只要 "Day_Of_Week" 与前一条记录发生变化,我们就会增加天数。
当月份是 "January" 时,我们计算 31 天,"February" 我们计算 28 天,依此类推。
使用 pandas,我想创建一个名为 "Crash_Day" 的新列。如何按照上述逻辑遍历记录并在新列中填充记录?
如何构造一个 for 循环左右来读取每一列的记录并相应地填充新列?
到目前为止,这是我的代码
import pandas as pd
crash_data = pd.read_csv('data.csv')
print('Length: {} rows.'.format(len(crash_data)))
print(crash_data.head())
如果有人有兴趣看数据,在下面link:
Data
如果所有日期都是连续的并且它们之间没有缺失,可以使用 lambda 函数与比较 shift
ed values by ne
(!=
) for starts of each consecutive value and then use cumsum
for counter
:
df['day'] = (df.groupby(['Year','Month'])['Day_Of_Week']
.transform(lambda x: x.ne(x.shift()).cumsum()))
备选方案:
s = df['Day_Of_Week'].ne(df['Day_Of_Week'].shift())
df['day'] = s.groupby([df['Year'],df['Month']]).cumsum().astype(int)
print (df)
Year Month Day_Of_Week day
22024 2002 January Tuesday 1
22101 2002 January Wednesday 2
22146 2002 January Thursday 3
22201 2002 January Friday 4
22247 2002 January Saturday 5
22280 2002 January Sunday 6
22335 2002 January Monday 7
22383 2002 January Tuesday 8
22384 2002 January Wednesday 9
22424 2002 January Thursday 10
22459 2002 January Friday 11
22511 2002 January Saturday 12
22598 2002 January Sunday 13
22599 2002 January Monday 14
22686 2002 January Tuesday 15
22687 2002 January Wednesday 16
22688 2002 January Wednesday 16
22689 2002 January Wednesday 16
22761 2002 January Wednesday 16
22762 2002 January Wednesday 16
22763 2002 January Wednesday 16
22764 2002 January Wednesday 16
22765 2002 January Thursday 17
22766 2002 January Thursday 17
22767 2002 January Thursday 17
22768 2002 January Thursday 17
22814 2002 January Friday 18
22815 2002 January Friday 18
22816 2002 January Friday 18
22817 2002 January Friday 18
22818 2002 January Friday 18
22817 2002 February Wednesday 1
22818 2002 February Wednesday 1
我有一个包含年、月和星期几的数据集。但是,它缺少该月的实际日期(即从第 1 天到第 30 天)。数据集如下:
# Year Month Day_Of_Week
22024 2002 January Tuesday
22101 2002 January Wednesday
22146 2002 January Thursday
22201 2002 January Friday
22247 2002 January Saturday
22280 2002 January Sunday
22335 2002 January Monday
22383 2002 January Tuesday
22384 2002 January Wednesday
22424 2002 January Thursday
22459 2002 January Friday
22511 2002 January Saturday
22598 2002 January Sunday
22599 2002 January Monday
22686 2002 January Tuesday
22687 2002 January Wednesday
22688 2002 January Wednesday
22689 2002 January Wednesday
22761 2002 January Wednesday
22762 2002 January Wednesday
22763 2002 January Wednesday
22764 2002 January Wednesday
22765 2002 January Thursday
22766 2002 January Thursday
22767 2002 January Thursday
22768 2002 January Thursday
22814 2002 January Friday
22815 2002 January Friday
22816 2002 January Friday
22817 2002 January Friday
22818 2002 January Friday
查找日期的逻辑非常简单。 table 中的第一条记录是第 1 天的记录。第二条记录是第 2 天的记录,只要 "Day_Of_Week" 与前一条记录发生变化,我们就会增加天数。 当月份是 "January" 时,我们计算 31 天,"February" 我们计算 28 天,依此类推。
使用 pandas,我想创建一个名为 "Crash_Day" 的新列。如何按照上述逻辑遍历记录并在新列中填充记录?
如何构造一个 for 循环左右来读取每一列的记录并相应地填充新列?
到目前为止,这是我的代码
import pandas as pd
crash_data = pd.read_csv('data.csv')
print('Length: {} rows.'.format(len(crash_data)))
print(crash_data.head())
如果有人有兴趣看数据,在下面link: Data
如果所有日期都是连续的并且它们之间没有缺失,可以使用 lambda 函数与比较 shift
ed values by ne
(!=
) for starts of each consecutive value and then use cumsum
for counter
:
df['day'] = (df.groupby(['Year','Month'])['Day_Of_Week']
.transform(lambda x: x.ne(x.shift()).cumsum()))
备选方案:
s = df['Day_Of_Week'].ne(df['Day_Of_Week'].shift())
df['day'] = s.groupby([df['Year'],df['Month']]).cumsum().astype(int)
print (df)
Year Month Day_Of_Week day
22024 2002 January Tuesday 1
22101 2002 January Wednesday 2
22146 2002 January Thursday 3
22201 2002 January Friday 4
22247 2002 January Saturday 5
22280 2002 January Sunday 6
22335 2002 January Monday 7
22383 2002 January Tuesday 8
22384 2002 January Wednesday 9
22424 2002 January Thursday 10
22459 2002 January Friday 11
22511 2002 January Saturday 12
22598 2002 January Sunday 13
22599 2002 January Monday 14
22686 2002 January Tuesday 15
22687 2002 January Wednesday 16
22688 2002 January Wednesday 16
22689 2002 January Wednesday 16
22761 2002 January Wednesday 16
22762 2002 January Wednesday 16
22763 2002 January Wednesday 16
22764 2002 January Wednesday 16
22765 2002 January Thursday 17
22766 2002 January Thursday 17
22767 2002 January Thursday 17
22768 2002 January Thursday 17
22814 2002 January Friday 18
22815 2002 January Friday 18
22816 2002 January Friday 18
22817 2002 January Friday 18
22818 2002 January Friday 18
22817 2002 February Wednesday 1
22818 2002 February Wednesday 1