Python 计算数组中形成随机形状线的单元格数量

Python count amount of cells forming a line with random shape in an array

  1. 上下文:我使用卫星图像进行过滤,根据是否存在雪(0 表示雪,1 表示无雪)转换为 1 和 0 的数组。我的代码创建了一个 NaNs 的数组,如果至少有一个邻居是非雪像素(在交叉模式中,下图中涂成红色的单元格),则搜索每个雪像素,并在nan数组。一旦我对我的整个矩阵执行此操作,我最终会得到一行单元格 = 1 的行,其余为 nans。
  2. 问题:我最终得到一个矩阵,里面有几行。我算作一条线的是在直接邻域中至少有两个单元格等于 1。这意味着对于每个线单元格,如果周围 8 个单元格中的任何一个单元格内部有 1,则它们正在形成一条线(下图显示雪(紫色)和非雪单元格(黄色)之间的边界。
  3. 我有什么:我写了一个算法来计算一行中的单元格数量并记录它的starting/ending个单元格(见下图,红线通过的单元格数量)所以我可以过滤我的线条在最后。
  4. 我想要的是:我的代码可以运行,但速度非常慢。我已尽我所能对其进行编码,但 O 想知道是否有更高效的方法?

Ps:不好意思解释的啰啰嗦嗦,不好意思解释清楚。该代码将向您展示它是如何工作的,生成的数字应该会更清楚。

生成“线”矩阵的一些代码:

import numpy as np
import rasterio as rio
import numpy as np
import glob
import matplotlib.pyplot as plt
import pandas as pd

# Create an array with a line of 1s (not a straight line)
array = np.array([[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,  1., np.nan, np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
       [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,  1., np.nan, np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
       [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,  1., np.nan, np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
       [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,  1., np.nan, np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
       [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,  1., np.nan, np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
       [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,  1., np.nan, np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
       [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,  1.,  1., np.nan, np.nan, np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
       [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,  1.,  1., np.nan, np.nan, np.nan, np.nan, np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
       [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,  1., np.nan, np.nan,  1., np.nan, np.nan, np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
       [np.nan, np.nan, np.nan, np.nan, np.nan,  1., np.nan, np.nan,  1., np.nan,  1., np.nan, np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
       [np.nan, np.nan, np.nan, np.nan, np.nan,  1., np.nan, np.nan,  1., np.nan,  1., np.nan, np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
       [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,  1., np.nan, np.nan,  1., np.nan, np.nan, np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
       [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,  1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
       [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,  1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
       [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,  1., np.nan, np.nan, np.nan,  1., np.nan, np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
       [np.nan, np.nan, np.nan, np.nan,  1.,  1., np.nan, np.nan, np.nan,  1., np.nan,  1., np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
       [np.nan, np.nan, np.nan, np.nan, np.nan,  1.,  1.,  1.,  1., np.nan,  1., np.nan, np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
       [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,  1., np.nan, np.nan, np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
       [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,  1., np.nan, np.nan, np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
       [np.nan, np.nan, np.nan, np.nan,  1.,  1.,  1.,  1.,  1., np.nan, np.nan, np.nan, np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
       [np.nan, np.nan, np.nan, np.nan,  np.nan,  np.nan,  np.nan,  np.nan,  np.nan, np.nan, np.nan, np.nan, np.nan,
        np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]])
array = array.reshape((1,array.shape[0],array.shape[1]))

# Create a dataframe that will store the start and endpoint of the lines, as well as their length
templines = pd.DataFrame(columns=['i','Rstart','Cstart','Rend','Cend','Length'])
# Create a 2nd dataframe that will be used to snip together the lines that end and start at the same point
lines = templines.copy()

# Create the straight and tilted cross search patterns
crossr = [-1, 0, 0, 1]
crossc = [0, -1, 1, 0]
xr = [-1, -1, 1, 1]
xc = [-1, 1, -1, 1]

# Constrain our search to an Area of Interest (AOI), useful when dealing with bigger arrays.
r1 = 0
r2 = array.shape[1]
c1 = 0
c2 = array.shape[2]

# Initialize our search indexes
i = 0
j = r1
k = c1

# Create an empty array which will be filled with 1s everytime a pixel is on the line (ensures our programe is replicating the lines' paths)
test = np.zeros((r2-r1, c2-c1))

# Start the search
while i < array.shape[0]:
  j = r1
  while j < r2:
    k = c1
    while k < c2: 
      
      # If we find a boundary pixel, start searching its neighbors' values
      if array[i, j, k] == 1:
            
            # We initialize a counter. If exactly 2 cells in the square centered on our cell are 1s, we are at the start of a line
            counter = 0
            for c in range(j-1, j+2):
              for cc in range(k-1, k+2):
                if array[i, c, cc] == 1:
                  counter +=1
            
            # If the pixel is the start of a line (meaning that on a 3x3 cells matrix, 2 cells are equal to 1)
            if counter == 2:

              # Gather the indexes of the next line cell
              # Vertical Cross search
              for c in range(0, len(crossr)):
                if array[i,j+crossr[c], k+crossc[c]] == 1:
                  jj = j+crossr[c]
                  kk = k+crossc[c]
                  test[jj-r1, kk-c1] = 1
                  break
                
                # Tilted cross search
                elif array[i,j+xr[c], k+xc[c]] == 1:
                  jj = j+xr[c]
                  kk = k+xc[c]
                  test[jj-r1, kk-c1] = 1
                  break
              
              # Save the indexes of the previous cell in line
              source = [j,k]
              source = np.vstack((source,[jj,kk]))

              linelength = 1
              
              # When we find an extremity of a line, we "travel along" the line by finding the next neighboring cell
              while jj > r1 and jj < r2:
                while kk > c1 and kk < c2:

                  flag = 0
                  for c in range(0, len(crossr)):

                    # If we find a cell = 1, with different indexes than the previous searching cell
                    if array[i,jj+crossr[c], kk+crossc[c]] == 1 and [jj+crossr[c],kk+crossc[c]] not in source.tolist():
                      source = np.vstack((source,[jj,kk]))
                      jj = jj+crossr[c]
                      kk = kk+crossc[c]
                      test[jj-r1, kk-c1] = 1
                      linelength += 1
                      flag = 1
                      break
                    
                    # Tilted cross search
                    elif array[i,jj+xr[c], kk+xc[c]] == 1 and [jj+xr[c],kk+xc[c]] not in source.tolist():
                      source = np.vstack((source,[jj,kk]))
                      jj = jj+xr[c]
                      kk = kk+xc[c]  
                      test[jj-r1, kk-c1] = 1
                      linelength += 1
                      flag = 1
                      break

                    # If there is no other cell on the line, flag is 0 so we stop the search
                    
                  # We gather the information of that line in our dataframe
                  if flag == 0:
                      print('End of Line')
                      templines.loc[len(templines.index)] = [i, j, k, jj, kk, linelength]
                      break
                                            
                  if flag == 0:
                    break
                    
                if flag == 0:
                  break

      k += 1
    j += 1
  i += 1   
  

    
# We snip together lines that are ending and starting at the same point
for m in range(len(templines)):
    for n in range(m+1, len(templines)):
        
        if templines.Rend[m] == templines.Rstart[n] and templines.Cend[m] == templines.Cstart[n]:
            lines.loc[len(lines.index)] = [templines.i[m], templines.Rstart[m], templines.Cstart[m], templines.Rend[n],
                                           templines.Cend[n], templines.Length[m]+templines.Length[n]]
    
  
# We check that the program "traveled" along the line
plt.figure()
plt.imshow(array[0,:,:])
plt.figure()
plt.imshow(test[:,:])
  

图像处理中有个思路,就是找一组连续的像素点,或者connected component。将图像分解成连接的组件后,您可以找到每个组件的大小,并过滤掉较小的组件。

在 scipy 包中有一种快速的方法,称为 scipy.ndimage.label,您可以像这样应用它:

import numpy as np
import matplotlib.pyplot as plt
import scipy.ndimage

# array = ...

# array is assumed to be 2D here

array[np.isnan(array)] = 0

# diagonally adjacent pixels count as the same feature
structure = [
    [1,1,1],
    [1,1,1],
    [1,1,1]
]
array_labelled, num_features = scipy.ndimage.label(array, structure)

size_for_feature = {}
for feature_num in range(1, num_features + 1):
    size_for_feature[feature_num] = (array_labelled == feature_num).sum()

# Filter out features which are too small
threshold = 10
size_for_feature = {
    feature_num: size
    for feature_num, size in size_for_feature.items()
    if size > threshold
}

array_out = np.zeros_like(array)
for feature_num in size_for_feature:
    array_out[array_labelled == feature_num] = 1
plt.imshow(array_out)

输出:

此解决方案并不完全符合您的要求 - 但它足够相似,可能适用于您的应用程序。