识别数据集的二进制状态(频率on/off)
Identify binary state of data set (frequency on/off)
我有一个大型数据集,其值范围为 [-3,3],我使用 0 作为边界的硬限制。
当数据以 56kHz 的频率从 -3,3 振荡时,数据的二进制值为 1。这意味着数据将从 -3 变为 3 并返回每 N 个数据值,其中 N 通常 < 20。
当数据始终为 3 时,数据的二进制值为 0(这通常可以持续 400+ 个样本)
我似乎无法将数据分组到它们的二进制类别中,也不知道该组有多少个样本。
示例数据:
1.84 |
2.96 |
2.8 |
3.12 |
. | I want this to be grouped as a 0
. |
3.11 |_____
-3.42 |
-2.45 |
-1.49 |
3.12 |
2.99 | I want this to be grouped as a 1
1.97 |
-1.11 |
-2.33 |
. |
. | Keeps going until for N cycles
逻辑高电平状态之间的周期通常很小(<20 个样本)。
我目前的代码:
state = "X"
for i in range(0, len(data['input'])):
currentBinaryState = inputBinaryState(data['input'][i]); # Returns -3 or +3 appropriately
if(currentBinaryState != previousBinaryState):
# A cycle is very unlikely to last more than 250 samples
if y > 250 and currentBinaryState == "LOW": # Been low for a long time
if state == "_high":
groupedData['input'].append( ("HIGH", x) )
x = 0
state = "_low"
else:
# Is on carrier wave (logic 1)
if state == "_low":
# Just finished low
groupedData['input'].append( ("LOW", x) )
x = 0
state = "_high"
y = 0
显然,结果并不像我预期的那样,因为 LOW 组非常小。
[('HIGH', 600), ('LOW', 8), ('HIGH', 1168), ('LOW', 9), ('HIGH', 1168), ('LOW', 8), ('HIGH', 1168), ('LOW', 8), ('HIGH', 1168), ('LOW', 9), ('HIGH', 1168), ('LOW', 8), ('HIGH', 1168), ('LOW', 8), ('HIGH', 1168), ('LOW', 9)]
我知道我可以在信号处理 SA 上问这个问题,但我认为这个问题更面向编程。我希望我充分解释了这个问题,如果有任何问题,请提问。谢谢。
下面是一个link到实际样本数据:
https://drive.google.com/folderview?id=0ByJDNIfaTeEfemVjSU9hNkNpQ3c&usp=sharing
从视觉上看,数据的边界在哪里,一目了然。
更新 1
我更新了我的代码以使其更清晰,因为单个字母变量对我的理智没有帮助。
previousBinaryState = "X"
x = 0
sinceLastChange = 0
previousGroup = inputBinaryState(data['input'][0])
lengthAssert = 0
for i in range(0, len(data['input'])):
currentBinaryState = inputBinaryState(data['input'][i]);
if(currentBinaryState != previousBinaryState): # Changed from -3 -> +3 or +3 -> -3
#print sinceLastChange
if sinceLastChange > 250 and previousGroup == "HIGH" and currentBinaryState == "LOW": # Finished LOW group
groupedData['input'].append( ("LOW", x) )
lengthAssert += x
x = 0
previousGroup = "LOW"
elif sinceLastChange > 20 and previousGroup == "LOW": # Finished HIGH group
groupedData['input'].append( ("HIGH", x) )
lengthAssert += x
x = 0
previousGroup = "HIGH"
sinceLastChange = 0
else:
sinceLastChange += 1
previousBinaryState = currentBinaryState
x += 1
其中,对于示例数据,输出:
8
7
8
7
7
596 <- Clearly a LOW group
7
8
7
8
7
7
8
7
8
7
7
8
7
8
7
7
8
7
8
.
.
.
问题是 HIGH 组的持续时间比应有的长:
[('HIGH', 600), ('LOW', 1176), ('HIGH', 1177), ('LOW', 1176), ('HIGH', 1176), ('LOW', 1177), ('HIGH', 1176), ('LOW', 1176)]
- 只制作了8组,但剧情清楚地显示了更多。这些小组的人数似乎是应有人数的两倍。
我终于找到了解决办法。我花了太长时间来解决问题,这似乎是一个相当简单的问题,但现在可以解决了。
它不会选取数据集中的最后一组,但没关系。
previousBinaryState = "X"
x = 0
sinceLastChange = 0
previousGroup = inputBinaryState(data['input'][0])
lengthAssert = 0
for i in range(0, len(data['input'])):
currentBinaryState = inputBinaryState(data['input'][i]);
if(currentBinaryState != previousBinaryState): # Changed from -3 -> +3 or +3 -> -3
#print sinceLastChange
if sinceLastChange > 250 and previousGroup == "HIGH" and currentBinaryState == "LOW": # Finished LOW group
groupedData['input'].append( ("LOW", x) )
lengthAssert += x
x = 0
previousGroup = "LOW"
sinceLastChange = 0
else:
if sinceLastChange > 20 and previousGroup == "LOW":
groupedData['input'].append( ("HIGH", x) )
lengthAssert += x
x = 0
previousGroup = "HIGH"
sinceLastChange = 0
sinceLastChange += 1
previousBinaryState = currentBinaryState
x += 1
20 是 HIGH 状态下的最大循环数,250 是该组处于 LOW 状态的最大样本数。
[('HIGH', 25), ('LOW', 575), ('HIGH', 602), ('LOW', 574), ('HIGH', 602), ('LOW', 575), ('HIGH', 601), ('LOW', 575), ('HIGH', 602), ('LOW', 574), ('HIGH', 602), ('LOW', 575), ('HIGH', 601), ('LOW', 575), ('HIGH', 602), ('LOW', 574)]
将其与图表和实际数据进行比较时,它似乎是正确的。
我有一个大型数据集,其值范围为 [-3,3],我使用 0 作为边界的硬限制。
当数据以 56kHz 的频率从 -3,3 振荡时,数据的二进制值为 1。这意味着数据将从 -3 变为 3 并返回每 N 个数据值,其中 N 通常 < 20。
当数据始终为 3 时,数据的二进制值为 0(这通常可以持续 400+ 个样本)
我似乎无法将数据分组到它们的二进制类别中,也不知道该组有多少个样本。
示例数据:
1.84 |
2.96 |
2.8 |
3.12 |
. | I want this to be grouped as a 0
. |
3.11 |_____
-3.42 |
-2.45 |
-1.49 |
3.12 |
2.99 | I want this to be grouped as a 1
1.97 |
-1.11 |
-2.33 |
. |
. | Keeps going until for N cycles
逻辑高电平状态之间的周期通常很小(<20 个样本)。
我目前的代码:
state = "X"
for i in range(0, len(data['input'])):
currentBinaryState = inputBinaryState(data['input'][i]); # Returns -3 or +3 appropriately
if(currentBinaryState != previousBinaryState):
# A cycle is very unlikely to last more than 250 samples
if y > 250 and currentBinaryState == "LOW": # Been low for a long time
if state == "_high":
groupedData['input'].append( ("HIGH", x) )
x = 0
state = "_low"
else:
# Is on carrier wave (logic 1)
if state == "_low":
# Just finished low
groupedData['input'].append( ("LOW", x) )
x = 0
state = "_high"
y = 0
显然,结果并不像我预期的那样,因为 LOW 组非常小。
[('HIGH', 600), ('LOW', 8), ('HIGH', 1168), ('LOW', 9), ('HIGH', 1168), ('LOW', 8), ('HIGH', 1168), ('LOW', 8), ('HIGH', 1168), ('LOW', 9), ('HIGH', 1168), ('LOW', 8), ('HIGH', 1168), ('LOW', 8), ('HIGH', 1168), ('LOW', 9)]
我知道我可以在信号处理 SA 上问这个问题,但我认为这个问题更面向编程。我希望我充分解释了这个问题,如果有任何问题,请提问。谢谢。
下面是一个link到实际样本数据:
https://drive.google.com/folderview?id=0ByJDNIfaTeEfemVjSU9hNkNpQ3c&usp=sharing
从视觉上看,数据的边界在哪里,一目了然。
更新 1
我更新了我的代码以使其更清晰,因为单个字母变量对我的理智没有帮助。
previousBinaryState = "X"
x = 0
sinceLastChange = 0
previousGroup = inputBinaryState(data['input'][0])
lengthAssert = 0
for i in range(0, len(data['input'])):
currentBinaryState = inputBinaryState(data['input'][i]);
if(currentBinaryState != previousBinaryState): # Changed from -3 -> +3 or +3 -> -3
#print sinceLastChange
if sinceLastChange > 250 and previousGroup == "HIGH" and currentBinaryState == "LOW": # Finished LOW group
groupedData['input'].append( ("LOW", x) )
lengthAssert += x
x = 0
previousGroup = "LOW"
elif sinceLastChange > 20 and previousGroup == "LOW": # Finished HIGH group
groupedData['input'].append( ("HIGH", x) )
lengthAssert += x
x = 0
previousGroup = "HIGH"
sinceLastChange = 0
else:
sinceLastChange += 1
previousBinaryState = currentBinaryState
x += 1
其中,对于示例数据,输出:
8
7
8
7
7
596 <- Clearly a LOW group
7
8
7
8
7
7
8
7
8
7
7
8
7
8
7
7
8
7
8
.
.
.
问题是 HIGH 组的持续时间比应有的长:
[('HIGH', 600), ('LOW', 1176), ('HIGH', 1177), ('LOW', 1176), ('HIGH', 1176), ('LOW', 1177), ('HIGH', 1176), ('LOW', 1176)]
- 只制作了8组,但剧情清楚地显示了更多。这些小组的人数似乎是应有人数的两倍。
我终于找到了解决办法。我花了太长时间来解决问题,这似乎是一个相当简单的问题,但现在可以解决了。
它不会选取数据集中的最后一组,但没关系。
previousBinaryState = "X"
x = 0
sinceLastChange = 0
previousGroup = inputBinaryState(data['input'][0])
lengthAssert = 0
for i in range(0, len(data['input'])):
currentBinaryState = inputBinaryState(data['input'][i]);
if(currentBinaryState != previousBinaryState): # Changed from -3 -> +3 or +3 -> -3
#print sinceLastChange
if sinceLastChange > 250 and previousGroup == "HIGH" and currentBinaryState == "LOW": # Finished LOW group
groupedData['input'].append( ("LOW", x) )
lengthAssert += x
x = 0
previousGroup = "LOW"
sinceLastChange = 0
else:
if sinceLastChange > 20 and previousGroup == "LOW":
groupedData['input'].append( ("HIGH", x) )
lengthAssert += x
x = 0
previousGroup = "HIGH"
sinceLastChange = 0
sinceLastChange += 1
previousBinaryState = currentBinaryState
x += 1
20 是 HIGH 状态下的最大循环数,250 是该组处于 LOW 状态的最大样本数。
[('HIGH', 25), ('LOW', 575), ('HIGH', 602), ('LOW', 574), ('HIGH', 602), ('LOW', 575), ('HIGH', 601), ('LOW', 575), ('HIGH', 602), ('LOW', 574), ('HIGH', 602), ('LOW', 575), ('HIGH', 601), ('LOW', 575), ('HIGH', 602), ('LOW', 574)]
将其与图表和实际数据进行比较时,它似乎是正确的。