识别数据集的二进制状态(频率on/off)

Identify binary state of data set (frequency on/off)

我有一个大型数据集,其值范围为 [-3,3],我使用 0 作为边界的硬限制。

当数据以 56kHz 的频率从 -3,3 振荡时,数据的二进制值为 1。这意味着数据将从 -3 变为 3 并返回每 N 个数据值,其中 N 通常 < 20。

当数据始终为 3 时,数据的二进制值为 0(这通常可以持续 400+ 个样本)

我似乎无法将数据分组到它们的二进制类别中,也不知道该组有多少个样本。

示例数据:

1.84    |
2.96    |
2.8     |
3.12    |
.       |  I want this to be grouped as a 0
.       |
3.11    |_____
-3.42   |
-2.45   |
-1.49   |
3.12    |
2.99    |  I want this to be grouped as a 1
1.97    |
-1.11   |
-2.33   |
.       |  
.       |  Keeps going until for N cycles

逻辑高电平状态之间的周期通常很小(<20 个样本)。

我目前的代码:

state = "X"
for i in range(0, len(data['input'])):    
    currentBinaryState = inputBinaryState(data['input'][i]); # Returns -3 or +3 appropriately

    if(currentBinaryState != previousBinaryState):

        # A cycle is very unlikely to last more than 250 samples
        if y > 250 and currentBinaryState == "LOW": # Been low for a long time
            if state == "_high":
                groupedData['input'].append( ("HIGH", x) )
                x = 0

            state = "_low"

        else:
            # Is on carrier wave (logic 1)
            if state == "_low":
                # Just finished low
                groupedData['input'].append( ("LOW", x) )
                x = 0

            state = "_high"


        y = 0

显然,结果并不像我预期的那样,因为 LOW 组非常小。

[('HIGH', 600), ('LOW', 8), ('HIGH', 1168), ('LOW', 9), ('HIGH', 1168), ('LOW', 8), ('HIGH', 1168), ('LOW', 8), ('HIGH', 1168), ('LOW', 9), ('HIGH', 1168), ('LOW', 8), ('HIGH', 1168), ('LOW', 8), ('HIGH', 1168), ('LOW', 9)]

我知道我可以在信号处理 SA 上问这个问题,但我认为这个问题更面向编程。我希望我充分解释了这个问题,如果有任何问题,请提问。谢谢。


下面是一个link到实际样本数据:

https://drive.google.com/folderview?id=0ByJDNIfaTeEfemVjSU9hNkNpQ3c&usp=sharing

从视觉上看,数据的边界在哪里,一目了然。


更新 1

我更新了我的代码以使其更清晰,因为单个字母变量对我的理智没有帮助。

previousBinaryState = "X"
x = 0
sinceLastChange = 0
previousGroup = inputBinaryState(data['input'][0])
lengthAssert = 0
for i in range(0, len(data['input'])):    
    currentBinaryState = inputBinaryState(data['input'][i]);

    if(currentBinaryState != previousBinaryState): # Changed from -3 -> +3 or +3 -> -3 

        #print sinceLastChange

        if sinceLastChange > 250 and previousGroup == "HIGH" and currentBinaryState == "LOW": # Finished LOW group
            groupedData['input'].append( ("LOW", x) )
            lengthAssert += x
            x = 0
            previousGroup = "LOW"

        elif sinceLastChange > 20 and previousGroup == "LOW": # Finished HIGH group
            groupedData['input'].append( ("HIGH", x) )
            lengthAssert += x
            x = 0
            previousGroup = "HIGH"

        sinceLastChange = 0

    else:
        sinceLastChange += 1

    previousBinaryState = currentBinaryState
    x += 1  

其中,对于示例数据,输出:

8
7
8
7
7
596   <- Clearly a LOW group
7
8
7
8
7
7
8
7
8
7
7
8
7
8
7
7
8
7
8
.
.
.

问题是 HIGH 组的持续时间比应有的长:

[('HIGH', 600), ('LOW', 1176), ('HIGH', 1177), ('LOW', 1176), ('HIGH', 1176), ('LOW', 1177), ('HIGH', 1176), ('LOW', 1176)]

我终于找到了解决办法。我花了太长时间来解决问题,这似乎是一个相当简单的问题,但现在可以解决了。

它不会选取数据集中的最后一组,但没关系。

previousBinaryState = "X"
x = 0
sinceLastChange = 0
previousGroup = inputBinaryState(data['input'][0])
lengthAssert = 0
for i in range(0, len(data['input'])):    
    currentBinaryState = inputBinaryState(data['input'][i]);

    if(currentBinaryState != previousBinaryState): # Changed from -3 -> +3 or +3 -> -3 

        #print sinceLastChange

        if sinceLastChange > 250 and previousGroup == "HIGH" and currentBinaryState == "LOW": # Finished LOW group
            groupedData['input'].append( ("LOW", x) )
            lengthAssert += x
            x = 0
            previousGroup = "LOW"

        sinceLastChange = 0

    else:
        if sinceLastChange > 20 and previousGroup == "LOW":
            groupedData['input'].append( ("HIGH", x) )
            lengthAssert += x
            x = 0
            previousGroup = "HIGH"
            sinceLastChange = 0

        sinceLastChange += 1

    previousBinaryState = currentBinaryState
    x += 1           

20 是 HIGH 状态下的最大循环数,250 是该组处于 LOW 状态的最大样本数。

[('HIGH', 25), ('LOW', 575), ('HIGH', 602), ('LOW', 574), ('HIGH', 602), ('LOW', 575), ('HIGH', 601), ('LOW', 575), ('HIGH', 602), ('LOW', 574), ('HIGH', 602), ('LOW', 575), ('HIGH', 601), ('LOW', 575), ('HIGH', 602), ('LOW', 574)]

将其与图表和实际数据进行比较时,它似乎是正确的。