Python: 统计不间断间隔的次数
Python: count number of uninterrupded intervals
考虑一个由 0 和 1 组成的数组 Y。例如:Y = (0,1,1,0)。我想统计 0s 和 1s 的不间断间隔数。在我们的示例中,n0 = 2 和 n1 = 1。我有一个执行所需操作的脚本。虽然它不是很优雅。有人知道更流畅或更 pythonic 的版本吗?
import pandas as pd
import numpy as np
# storage
counter = {}
# number of random draws
n = 10
# dataframe of random draw between 0 and 1
Y = pd.DataFrame(np.random.choice(2, n))
# where are the 0s and 1s
idx_0 = Y[Y[0] == 0].index
idx_1 = Y[Y[0] == 1].index
# count intervals of uninterrupted 0s
j = 0
for i in idx_0:
if i+1 < n:
if Y.loc[i+1, 0] == 1:
j += 1
else:
continue
if Y.loc[n-1, 0] == 0:
j += 1
counter['n_0'] = j
# count intervals of uninterrupted 1s
j = 0
for i in idx_1:
if i+1 < n:
if Y.loc[i+1, 0] == 0:
j += 1
else:
continue
if Y.loc[n-1, 0] == 1:
j += 1
counter['n_1'] = j
numbers = [0, 1, 1, 0]
def runs(x, numbers):
number_string = ''.join([str(n) for n in numbers])
return len([r for r in number_string.split('1' if x == 0 else '0') if r])
print(runs(0, numbers))
print(runs(1, numbers))
使用数据帧更新:
import pandas as pd
import numpy as np
# storage
counter = {}
# number of random draws
n = 10
# dataframe of random draw between 0 and 1
Y = pd.DataFrame(np.random.choice(2, n))
print([v[0] for v in Y.values.tolist()])
def runs(x, numbers):
number_string = ''.join([str(n) for n in numbers])
return len([len(r) for r in number_string.split('1' if x == 0 else '0') if r])
values = [v[0] for v in Y.values.tolist()]
print(values)
print('Runs of 0: {}'.format(runs(0, values)))
print('Runs of 1: {}'.format(runs(1, values))
利用 pandas 方法的更简洁的解决方案:
counter = Y[0][Y[0].diff() != 0].value_counts()
Y[0].diff()
统计连续元素的差值
diff != 0
标记值变化的索引
Y[idx].value_counts()
统计每个值出现的频率
10 个随机元素 [0, 1, 1, 0, 1, 1, 1, 1, 1, 1] 的示例结果:
1 2
0 2
Name: 0, dtype: int64
如果您坚持使用 'n_0' 和 'n_1' 键,您可以将它们重命名为
counter = counter.rename(index={i: f'n_{i}' for i in range(2)})
您也可以使用 dict(counter)
将其转换为字典,即使 pandas 对象具有与 counter[key]
相同的功能,为您提供相应的值。
考虑一个由 0 和 1 组成的数组 Y。例如:Y = (0,1,1,0)。我想统计 0s 和 1s 的不间断间隔数。在我们的示例中,n0 = 2 和 n1 = 1。我有一个执行所需操作的脚本。虽然它不是很优雅。有人知道更流畅或更 pythonic 的版本吗?
import pandas as pd
import numpy as np
# storage
counter = {}
# number of random draws
n = 10
# dataframe of random draw between 0 and 1
Y = pd.DataFrame(np.random.choice(2, n))
# where are the 0s and 1s
idx_0 = Y[Y[0] == 0].index
idx_1 = Y[Y[0] == 1].index
# count intervals of uninterrupted 0s
j = 0
for i in idx_0:
if i+1 < n:
if Y.loc[i+1, 0] == 1:
j += 1
else:
continue
if Y.loc[n-1, 0] == 0:
j += 1
counter['n_0'] = j
# count intervals of uninterrupted 1s
j = 0
for i in idx_1:
if i+1 < n:
if Y.loc[i+1, 0] == 0:
j += 1
else:
continue
if Y.loc[n-1, 0] == 1:
j += 1
counter['n_1'] = j
numbers = [0, 1, 1, 0]
def runs(x, numbers):
number_string = ''.join([str(n) for n in numbers])
return len([r for r in number_string.split('1' if x == 0 else '0') if r])
print(runs(0, numbers))
print(runs(1, numbers))
使用数据帧更新:
import pandas as pd
import numpy as np
# storage
counter = {}
# number of random draws
n = 10
# dataframe of random draw between 0 and 1
Y = pd.DataFrame(np.random.choice(2, n))
print([v[0] for v in Y.values.tolist()])
def runs(x, numbers):
number_string = ''.join([str(n) for n in numbers])
return len([len(r) for r in number_string.split('1' if x == 0 else '0') if r])
values = [v[0] for v in Y.values.tolist()]
print(values)
print('Runs of 0: {}'.format(runs(0, values)))
print('Runs of 1: {}'.format(runs(1, values))
利用 pandas 方法的更简洁的解决方案:
counter = Y[0][Y[0].diff() != 0].value_counts()
Y[0].diff()
统计连续元素的差值diff != 0
标记值变化的索引Y[idx].value_counts()
统计每个值出现的频率
10 个随机元素 [0, 1, 1, 0, 1, 1, 1, 1, 1, 1] 的示例结果:
1 2
0 2
Name: 0, dtype: int64
如果您坚持使用 'n_0' 和 'n_1' 键,您可以将它们重命名为
counter = counter.rename(index={i: f'n_{i}' for i in range(2)})
您也可以使用 dict(counter)
将其转换为字典,即使 pandas 对象具有与 counter[key]
相同的功能,为您提供相应的值。