如果在数据框中找到重复项,则更改日期时间
Change datetime if duplicate found in dataframe
这是一个复杂的问题:我有一个包含 python 日期时间列的数据框。但是,即使行中的其他值不同,某些日期时间也可能重复。原因是数据的记录只有1毫秒的粒度eg:
DateTimes VWPfgbl
26541610 2014-12-04 20:59:04.553000 152.271875
26541611 2014-12-04 20:59:04.553000 152.271875
26541612 2014-12-04 20:59:04.553000 152.271875
26541613 2014-12-04 20:59:08.369000 152.272308
26541614 2014-12-04 20:59:09.321000 152.270476
26541615 2014-12-04 20:59:09.550000 152.261818
26541616 2014-12-04 20:59:09.550000 152.265714
26541617 2014-12-04 20:59:09.552000 152.268000
26541618 2014-12-04 20:59:09.552000 152.265714
26541619 2014-12-04 20:59:09.552000 152.240000
26541620 2014-12-04 20:59:09.552000 152.253333
26541621 2014-12-04 20:59:09.552000 152.251875
26541622 2014-12-04 20:59:09.552000 152.241538
26541623 2014-12-04 20:59:09.552000 152.245625
26541624 2014-12-04 20:59:09.552000 152.245714
26541625 2014-12-04 20:59:09.552000 152.233571
我想要的是将 1 微秒添加到列中的下一个副本,依此类推以创建如下内容:
DateTimes VWPfgbl
26541610 2014-12-04 20:59:04.553000 152.271875
26541611 2014-12-04 20:59:04.553001 152.271875
26541612 2014-12-04 20:59:04.553002 152.271875
26541613 2014-12-04 20:59:08.369000 152.272308
26541614 2014-12-04 20:59:09.321000 152.270476
26541615 2014-12-04 20:59:09.550000 152.261818
26541616 2014-12-04 20:59:09.550001 152.265714
26541617 2014-12-04 20:59:09.552000 152.268000
26541618 2014-12-04 20:59:09.552001 152.265714
26541619 2014-12-04 20:59:09.552002 152.240000
26541620 2014-12-04 20:59:09.552003 152.253333
26541621 2014-12-04 20:59:09.552004 152.251875
26541622 2014-12-04 20:59:09.552005 152.241538
26541623 2014-12-04 20:59:09.552006 152.245625
26541624 2014-12-04 20:59:09.552007 152.245714
26541625 2014-12-04 20:59:09.552008 152.233571
不太确定如何解决这个问题,也许可以通过循环保存它之前看到的日期时间列表,如果重复修改为新值并将 dict 键的值修改一个。
dict = {}
for x in range(0, df.shape[0]-1)
if df.DateTimes[x] in dict:
df.DateTimes[x] = df.DateTimes[x] + datetime.timedelta(microseconds=df.DateTimes[x])
dict[df.DateTimes[x]] = dict[df.DateTimes[x]] + 1
else:
df.DateTimes[x] = 1
如有任何帮助,我们将不胜感激。
您可以使用 groupby-cumcount
为每组中的项目编号。
然后将这些数字转换为具有微秒分辨率的 NumPy timedelta64。然后可以将此 NumPy 数组添加到 df['DateTimes']
以创建所需的值。
import numpy as np
import pandas as pd
df = pd.read_table('data', sep='\s{2,}')
df['DateTimes'] = pd.to_datetime(df['DateTimes'])
microseconds = df.groupby(['DateTimes']).cumcount()
df['DateTimes'] += np.array(microseconds, dtype='m8[us]')
print(df)
产量
DateTimes VWPfgbl
26541610 2014-12-04 20:59:04.553000 152.271875
26541611 2014-12-04 20:59:04.553001 152.271875
26541612 2014-12-04 20:59:04.553002 152.271875
26541613 2014-12-04 20:59:08.369000 152.272308
26541614 2014-12-04 20:59:09.321000 152.270476
26541615 2014-12-04 20:59:09.550000 152.261818
26541616 2014-12-04 20:59:09.550001 152.265714
26541617 2014-12-04 20:59:09.552000 152.268000
26541618 2014-12-04 20:59:09.552001 152.265714
26541619 2014-12-04 20:59:09.552002 152.240000
26541620 2014-12-04 20:59:09.552003 152.253333
26541621 2014-12-04 20:59:09.552004 152.251875
26541622 2014-12-04 20:59:09.552005 152.241538
26541623 2014-12-04 20:59:09.552006 152.245625
26541624 2014-12-04 20:59:09.552007 152.245714
26541625 2014-12-04 20:59:09.552008 152.233571
这是一个复杂的问题:我有一个包含 python 日期时间列的数据框。但是,即使行中的其他值不同,某些日期时间也可能重复。原因是数据的记录只有1毫秒的粒度eg:
DateTimes VWPfgbl
26541610 2014-12-04 20:59:04.553000 152.271875
26541611 2014-12-04 20:59:04.553000 152.271875
26541612 2014-12-04 20:59:04.553000 152.271875
26541613 2014-12-04 20:59:08.369000 152.272308
26541614 2014-12-04 20:59:09.321000 152.270476
26541615 2014-12-04 20:59:09.550000 152.261818
26541616 2014-12-04 20:59:09.550000 152.265714
26541617 2014-12-04 20:59:09.552000 152.268000
26541618 2014-12-04 20:59:09.552000 152.265714
26541619 2014-12-04 20:59:09.552000 152.240000
26541620 2014-12-04 20:59:09.552000 152.253333
26541621 2014-12-04 20:59:09.552000 152.251875
26541622 2014-12-04 20:59:09.552000 152.241538
26541623 2014-12-04 20:59:09.552000 152.245625
26541624 2014-12-04 20:59:09.552000 152.245714
26541625 2014-12-04 20:59:09.552000 152.233571
我想要的是将 1 微秒添加到列中的下一个副本,依此类推以创建如下内容:
DateTimes VWPfgbl
26541610 2014-12-04 20:59:04.553000 152.271875
26541611 2014-12-04 20:59:04.553001 152.271875
26541612 2014-12-04 20:59:04.553002 152.271875
26541613 2014-12-04 20:59:08.369000 152.272308
26541614 2014-12-04 20:59:09.321000 152.270476
26541615 2014-12-04 20:59:09.550000 152.261818
26541616 2014-12-04 20:59:09.550001 152.265714
26541617 2014-12-04 20:59:09.552000 152.268000
26541618 2014-12-04 20:59:09.552001 152.265714
26541619 2014-12-04 20:59:09.552002 152.240000
26541620 2014-12-04 20:59:09.552003 152.253333
26541621 2014-12-04 20:59:09.552004 152.251875
26541622 2014-12-04 20:59:09.552005 152.241538
26541623 2014-12-04 20:59:09.552006 152.245625
26541624 2014-12-04 20:59:09.552007 152.245714
26541625 2014-12-04 20:59:09.552008 152.233571
不太确定如何解决这个问题,也许可以通过循环保存它之前看到的日期时间列表,如果重复修改为新值并将 dict 键的值修改一个。
dict = {}
for x in range(0, df.shape[0]-1)
if df.DateTimes[x] in dict:
df.DateTimes[x] = df.DateTimes[x] + datetime.timedelta(microseconds=df.DateTimes[x])
dict[df.DateTimes[x]] = dict[df.DateTimes[x]] + 1
else:
df.DateTimes[x] = 1
如有任何帮助,我们将不胜感激。
您可以使用 groupby-cumcount
为每组中的项目编号。
然后将这些数字转换为具有微秒分辨率的 NumPy timedelta64。然后可以将此 NumPy 数组添加到 df['DateTimes']
以创建所需的值。
import numpy as np
import pandas as pd
df = pd.read_table('data', sep='\s{2,}')
df['DateTimes'] = pd.to_datetime(df['DateTimes'])
microseconds = df.groupby(['DateTimes']).cumcount()
df['DateTimes'] += np.array(microseconds, dtype='m8[us]')
print(df)
产量
DateTimes VWPfgbl
26541610 2014-12-04 20:59:04.553000 152.271875
26541611 2014-12-04 20:59:04.553001 152.271875
26541612 2014-12-04 20:59:04.553002 152.271875
26541613 2014-12-04 20:59:08.369000 152.272308
26541614 2014-12-04 20:59:09.321000 152.270476
26541615 2014-12-04 20:59:09.550000 152.261818
26541616 2014-12-04 20:59:09.550001 152.265714
26541617 2014-12-04 20:59:09.552000 152.268000
26541618 2014-12-04 20:59:09.552001 152.265714
26541619 2014-12-04 20:59:09.552002 152.240000
26541620 2014-12-04 20:59:09.552003 152.253333
26541621 2014-12-04 20:59:09.552004 152.251875
26541622 2014-12-04 20:59:09.552005 152.241538
26541623 2014-12-04 20:59:09.552006 152.245625
26541624 2014-12-04 20:59:09.552007 152.245714
26541625 2014-12-04 20:59:09.552008 152.233571