在 pandas 数据帧行中搜索重复值 Python
Searching for duplicate values in rows of pandas dataframes Python
我想向下面的 pandas 数据框添加一个函数,它显示具有 open,high,low,close
值的行完全相同。对于这种情况,数据框中的最后一行为 True。我还想编写一段代码,显示具有相同列值的列数,如果您查看第 3 行和第 4 行,low
列的值 37350
连续重复两次.所以我想做一个函数来说明连续重复的最大数量以及它开始和结束的行的索引。
import pandas as pd
import numpy as np
import time
import datetime
A =[[1645661520000, 37352.0, 37376.5, 37352.0, 37376.0, 15.56119087],
[1645661580000, 37376.0, 37414.0, 37376.0, 37414.0, 49.38248589],
[1645661640000, 37414.0, 37414.0, 37350.0, 37350.0, 45.70306699],
[1645661700000, 37350.0, 37374.0, 37350.0, 37373.5, 14.4306948],
[1645661760000, 37373.5, 37388.0, 37373.5, 37388.0, 3.59340947],
[1645661820000, 37388.0, 37388.0, 37388.0, 37388.0, 21.45525727]]
column_names = ["Unix","Open", "High","Low", "Close", "Volume"]
df = pd.DataFrame(A, columns=column_names)
#Dates = Local_timezone(df["Unix"].to_numpy()/1000)
df.insert(1,"Date", pd.to_datetime(df["Unix"].to_numpy()/1000,unit='s'))
预期输出
Rows with all duplicate values: 6 # 1645661820000, 37388.0, 37388.0, 37388.0, 37388.0, 21.45525727
我们可以使用nunique
cond = df[["Open", "High","Low", "Close"]].apply(pd.Series.nunique,1).eq(1)
Out[344]:
0 False
1 False
2 False
3 False
4 False
5 True
dtype: bool
#row = df['cond']
我想向下面的 pandas 数据框添加一个函数,它显示具有 open,high,low,close
值的行完全相同。对于这种情况,数据框中的最后一行为 True。我还想编写一段代码,显示具有相同列值的列数,如果您查看第 3 行和第 4 行,low
列的值 37350
连续重复两次.所以我想做一个函数来说明连续重复的最大数量以及它开始和结束的行的索引。
import pandas as pd
import numpy as np
import time
import datetime
A =[[1645661520000, 37352.0, 37376.5, 37352.0, 37376.0, 15.56119087],
[1645661580000, 37376.0, 37414.0, 37376.0, 37414.0, 49.38248589],
[1645661640000, 37414.0, 37414.0, 37350.0, 37350.0, 45.70306699],
[1645661700000, 37350.0, 37374.0, 37350.0, 37373.5, 14.4306948],
[1645661760000, 37373.5, 37388.0, 37373.5, 37388.0, 3.59340947],
[1645661820000, 37388.0, 37388.0, 37388.0, 37388.0, 21.45525727]]
column_names = ["Unix","Open", "High","Low", "Close", "Volume"]
df = pd.DataFrame(A, columns=column_names)
#Dates = Local_timezone(df["Unix"].to_numpy()/1000)
df.insert(1,"Date", pd.to_datetime(df["Unix"].to_numpy()/1000,unit='s'))
预期输出
Rows with all duplicate values: 6 # 1645661820000, 37388.0, 37388.0, 37388.0, 37388.0, 21.45525727
我们可以使用nunique
cond = df[["Open", "High","Low", "Close"]].apply(pd.Series.nunique,1).eq(1)
Out[344]:
0 False
1 False
2 False
3 False
4 False
5 True
dtype: bool
#row = df['cond']