while 循环不断地重新检查 Pandas 数据帧中的变化
while loop to constantly recheck changes in Pandas dataframe
我有两个相同的数据帧 new
和 old
。 new
数据框将全天随机更新。下面的代码检查是否有任何更改。
import pandas as pd
import numpy as np
new = {'name': ['Sheldon', 'Penny', 'Amy', 'Bernadette', 'Raj', 'Howard'],
'episodes': [42, 24, 31, 29, 37, 40],
'gender': ['male', 'female', 'female', 'female', 'male', 'male']}
old = {'name': ['Sheldon', 'Penny', 'Amy', 'Bernadette', 'Raj', 'Howard'],
'episodes': [12, 32, 31, 32, 37, 40],
'gender': ['male', 'female', 'female', 'female', 'male', 'male']}
df1 = pd.DataFrame(new, columns = ['name','episodes', 'gender'])
df = pd.DataFrame(old, columns = ['name','episodes', 'gender'])
while True:
df1 = pd.DataFrame(new, columns = ['name','episodes', 'gender'])
print(df[~df.episodes.eq(df1.episodes)])
df1 = df
我需要在 while
循环中编写条件,其中 df[~df.episodes.eq(df1.episodes)]
仅在检测到更改时打印。打印新数据后,它会将两个数据帧设置为相同的值(因为不再需要旧数据)并重新检查更改。上面的代码将打印:
Columns: [name, episodes, gender]
Index: []
Empty DataFrame
Columns: [name, episodes, gender]
Index: []
Empty DataFrame
Columns: [name, episodes, gender]
Index: []
Empty DataFrame
因此,如果实际上已经打印了更改,就会错过。您能否建议一种更有效的方法来完成此操作。
== 编辑 ==
根据@BENY 的回答,如果我这样做:
import pandas as pd
import numpy as np
new = {'name': ['Sheldon', 'Penny', 'Amy', 'Bernadette', 'Raj', 'Sheldon'],
'episodes': [42, 24, 31, 29, 37, 40],
'gender': ['male', 'female', 'female', 'female', 'male', 'male']}
old = {'name': ['Sheldon', 'Penny', 'Amy', 'Bernadette', 'Raj', 'Sheldon'],
'episodes': [12, 32, 31, 32, 37, 40],
'gender': ['male', 'female', 'female', 'female', 'male', 'male']}
df1 = pd.DataFrame(new, columns = ['name','episodes', 'gender'])
df = pd.DataFrame(old, columns = ['name','episodes', 'gender'])
while True:
df1 = pd.DataFrame(new, columns = ['name','episodes', 'gender'])
out = df.merge(df1[['name','episodes']],on=['name','episodes'],how='left',indicator=True).loc[lambda x : x['_merge']=='left_only']
print(out)
df = df1
它将在整个 whileloop 中打印出来:
name episodes gender _merge
0 Sheldon 12 male left_only
1 Penny 32 female left_only
3 Bernadette 32 female left_only
name episodes gender _merge
0 Sheldon 12 male left_only
1 Penny 32 female left_only
3 Bernadette 32 female left_only
name episodes gender _merge
0 Sheldon 12 male left_only
1 Penny 32 female left_only
3 Bernadette 32 female left_only
有什么方法可以只打印一次吗?直到有另一个变化。如果我 df= df1
将打印如下,我会错过更改:
Columns: [name, episodes, gender, _merge]
Index: []
Empty DataFrame
Columns: [name, episodes, gender, _merge]
我需要在检测到更改的地方干净地获取此数据。
让我们试试 merge
out = df.merge(df1[['name','episodes']],on=['name','episodes'],how='left',indicator=True).loc[lambda x : x['_merge']=='left_only']
name episodes gender _merge
0 Sheldon 12 male left_only
1 Penny 32 female left_only
3 Bernadette 32 female left_only
如果您想比较 2 个数据帧并检查任何 changes/differences,为什么不使用 DataFrame.compare()
函数?
这是基于您的示例数据的示例输出:
df.compare(df1)
输出:
episodes
self other
0 12.0 42.0
1 32.0 24.0
3 32.0 29.0
默认情况下,它只突出显示差异。这里显示只有episodes
列有差异。
self
对应df
的值,other
对应df1
的值
左边的索引,即。 0
、1
和3
显示不同的行索引。
如果想显示整个原形,也可以使用keep_shape=
参数,如下:
df.compare(df1, keep_shape=True)
输出:
name episodes gender
self other self other self other
0 NaN NaN 12.0 42.0 NaN NaN
1 NaN NaN 32.0 24.0 NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN 32.0 29.0 NaN NaN
4 NaN NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN NaN
仅显示不同的值。 NaN
值是没有差异的值。
当然如果你喜欢,你也可以选择显示所有的值,包括相等的值,如下:
df.compare(df1, keep_shape=True, keep_equal=True)
输出
name episodes gender
self other self other self other
0 Sheldon Sheldon 12 42 male male
1 Penny Penny 32 24 female female
2 Amy Amy 31 31 female female
3 Bernadette Bernadette 32 29 female female
4 Raj Raj 37 37 male male
5 Howard Howard 40 40 male male
此选项允许您并排比较以检查差异。无论如何,发现差异并不容易。
我建议您使用默认选项首先仅显示差异(可能记下有差异的行的索引),并且可选地,仅当您想要详细检查另一侧值时才使用其他 2 个选项(相等)。
要在 while
循环下使用,您可以使用:
while True:
df1 = pd.DataFrame(new, columns = ['name','episodes', 'gender'])
out = df.compare(df1)
print(out)
df = df1
编辑
如果想在看到name
的同时保持只看到其他列的差异,可以设置索引append=True
,如下:
df.set_index('name', append=True).compare(df1.set_index('name', append=True))
输出
episodes
self other
name
0 Sheldon 12.0 42.0
1 Penny 32.0 24.0
3 Bernadette 32.0 29.0
这样,您可以看到 name
和行索引有差异。
我有两个相同的数据帧 new
和 old
。 new
数据框将全天随机更新。下面的代码检查是否有任何更改。
import pandas as pd
import numpy as np
new = {'name': ['Sheldon', 'Penny', 'Amy', 'Bernadette', 'Raj', 'Howard'],
'episodes': [42, 24, 31, 29, 37, 40],
'gender': ['male', 'female', 'female', 'female', 'male', 'male']}
old = {'name': ['Sheldon', 'Penny', 'Amy', 'Bernadette', 'Raj', 'Howard'],
'episodes': [12, 32, 31, 32, 37, 40],
'gender': ['male', 'female', 'female', 'female', 'male', 'male']}
df1 = pd.DataFrame(new, columns = ['name','episodes', 'gender'])
df = pd.DataFrame(old, columns = ['name','episodes', 'gender'])
while True:
df1 = pd.DataFrame(new, columns = ['name','episodes', 'gender'])
print(df[~df.episodes.eq(df1.episodes)])
df1 = df
我需要在 while
循环中编写条件,其中 df[~df.episodes.eq(df1.episodes)]
仅在检测到更改时打印。打印新数据后,它会将两个数据帧设置为相同的值(因为不再需要旧数据)并重新检查更改。上面的代码将打印:
Columns: [name, episodes, gender]
Index: []
Empty DataFrame
Columns: [name, episodes, gender]
Index: []
Empty DataFrame
Columns: [name, episodes, gender]
Index: []
Empty DataFrame
因此,如果实际上已经打印了更改,就会错过。您能否建议一种更有效的方法来完成此操作。
== 编辑 ==
根据@BENY 的回答,如果我这样做:
import pandas as pd
import numpy as np
new = {'name': ['Sheldon', 'Penny', 'Amy', 'Bernadette', 'Raj', 'Sheldon'],
'episodes': [42, 24, 31, 29, 37, 40],
'gender': ['male', 'female', 'female', 'female', 'male', 'male']}
old = {'name': ['Sheldon', 'Penny', 'Amy', 'Bernadette', 'Raj', 'Sheldon'],
'episodes': [12, 32, 31, 32, 37, 40],
'gender': ['male', 'female', 'female', 'female', 'male', 'male']}
df1 = pd.DataFrame(new, columns = ['name','episodes', 'gender'])
df = pd.DataFrame(old, columns = ['name','episodes', 'gender'])
while True:
df1 = pd.DataFrame(new, columns = ['name','episodes', 'gender'])
out = df.merge(df1[['name','episodes']],on=['name','episodes'],how='left',indicator=True).loc[lambda x : x['_merge']=='left_only']
print(out)
df = df1
它将在整个 whileloop 中打印出来:
name episodes gender _merge
0 Sheldon 12 male left_only
1 Penny 32 female left_only
3 Bernadette 32 female left_only
name episodes gender _merge
0 Sheldon 12 male left_only
1 Penny 32 female left_only
3 Bernadette 32 female left_only
name episodes gender _merge
0 Sheldon 12 male left_only
1 Penny 32 female left_only
3 Bernadette 32 female left_only
有什么方法可以只打印一次吗?直到有另一个变化。如果我 df= df1
将打印如下,我会错过更改:
Columns: [name, episodes, gender, _merge]
Index: []
Empty DataFrame
Columns: [name, episodes, gender, _merge]
我需要在检测到更改的地方干净地获取此数据。
让我们试试 merge
out = df.merge(df1[['name','episodes']],on=['name','episodes'],how='left',indicator=True).loc[lambda x : x['_merge']=='left_only']
name episodes gender _merge
0 Sheldon 12 male left_only
1 Penny 32 female left_only
3 Bernadette 32 female left_only
如果您想比较 2 个数据帧并检查任何 changes/differences,为什么不使用 DataFrame.compare()
函数?
这是基于您的示例数据的示例输出:
df.compare(df1)
输出:
episodes
self other
0 12.0 42.0
1 32.0 24.0
3 32.0 29.0
默认情况下,它只突出显示差异。这里显示只有episodes
列有差异。
self
对应df
的值,other
对应df1
左边的索引,即。 0
、1
和3
显示不同的行索引。
如果想显示整个原形,也可以使用keep_shape=
参数,如下:
df.compare(df1, keep_shape=True)
输出:
name episodes gender
self other self other self other
0 NaN NaN 12.0 42.0 NaN NaN
1 NaN NaN 32.0 24.0 NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN 32.0 29.0 NaN NaN
4 NaN NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN NaN
仅显示不同的值。 NaN
值是没有差异的值。
当然如果你喜欢,你也可以选择显示所有的值,包括相等的值,如下:
df.compare(df1, keep_shape=True, keep_equal=True)
输出
name episodes gender
self other self other self other
0 Sheldon Sheldon 12 42 male male
1 Penny Penny 32 24 female female
2 Amy Amy 31 31 female female
3 Bernadette Bernadette 32 29 female female
4 Raj Raj 37 37 male male
5 Howard Howard 40 40 male male
此选项允许您并排比较以检查差异。无论如何,发现差异并不容易。
我建议您使用默认选项首先仅显示差异(可能记下有差异的行的索引),并且可选地,仅当您想要详细检查另一侧值时才使用其他 2 个选项(相等)。
要在 while
循环下使用,您可以使用:
while True:
df1 = pd.DataFrame(new, columns = ['name','episodes', 'gender'])
out = df.compare(df1)
print(out)
df = df1
编辑
如果想在看到name
的同时保持只看到其他列的差异,可以设置索引append=True
,如下:
df.set_index('name', append=True).compare(df1.set_index('name', append=True))
输出
episodes
self other
name
0 Sheldon 12.0 42.0
1 Penny 32.0 24.0
3 Bernadette 32.0 29.0
这样,您可以看到 name
和行索引有差异。