Python 计算数组中形成随机形状线的单元格数量
Python count amount of cells forming a line with random shape in an array
- 上下文:我使用卫星图像进行过滤,根据是否存在雪(0 表示雪,1 表示无雪)转换为 1 和 0 的数组。我的代码创建了一个
NaNs
的数组,如果至少有一个邻居是非雪像素(在交叉模式中,下图中涂成红色的单元格),则搜索每个雪像素,并在nan
数组。一旦我对我的整个矩阵执行此操作,我最终会得到一行单元格 = 1 的行,其余为 nans。
- 问题:我最终得到一个矩阵,里面有几行。我算作一条线的是在直接邻域中至少有两个单元格等于 1。这意味着对于每个线单元格,如果周围 8 个单元格中的任何一个单元格内部有
1
,则它们正在形成一条线(下图显示雪(紫色)和非雪单元格(黄色)之间的边界。
- 我有什么:我写了一个算法来计算一行中的单元格数量并记录它的starting/ending个单元格(见下图,红线通过的单元格数量)所以我可以过滤我的线条在最后。
- 我想要的是:我的代码可以运行,但速度非常慢。我已尽我所能对其进行编码,但 O 想知道是否有更高效的方法?
Ps:不好意思解释的啰啰嗦嗦,不好意思解释清楚。该代码将向您展示它是如何工作的,生成的数字应该会更清楚。
生成“线”矩阵的一些代码:
import numpy as np
import rasterio as rio
import numpy as np
import glob
import matplotlib.pyplot as plt
import pandas as pd
# Create an array with a line of 1s (not a straight line)
array = np.array([[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., 1., np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., 1., np.nan, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, 1., np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, 1., np.nan, 1., np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, 1., np.nan, 1., np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, 1., np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, 1., np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, 1., 1., np.nan, np.nan, np.nan, 1., np.nan, 1., np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, 1., 1., 1., 1., np.nan, 1., np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, 1., 1., 1., 1., 1., np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]])
array = array.reshape((1,array.shape[0],array.shape[1]))
# Create a dataframe that will store the start and endpoint of the lines, as well as their length
templines = pd.DataFrame(columns=['i','Rstart','Cstart','Rend','Cend','Length'])
# Create a 2nd dataframe that will be used to snip together the lines that end and start at the same point
lines = templines.copy()
# Create the straight and tilted cross search patterns
crossr = [-1, 0, 0, 1]
crossc = [0, -1, 1, 0]
xr = [-1, -1, 1, 1]
xc = [-1, 1, -1, 1]
# Constrain our search to an Area of Interest (AOI), useful when dealing with bigger arrays.
r1 = 0
r2 = array.shape[1]
c1 = 0
c2 = array.shape[2]
# Initialize our search indexes
i = 0
j = r1
k = c1
# Create an empty array which will be filled with 1s everytime a pixel is on the line (ensures our programe is replicating the lines' paths)
test = np.zeros((r2-r1, c2-c1))
# Start the search
while i < array.shape[0]:
j = r1
while j < r2:
k = c1
while k < c2:
# If we find a boundary pixel, start searching its neighbors' values
if array[i, j, k] == 1:
# We initialize a counter. If exactly 2 cells in the square centered on our cell are 1s, we are at the start of a line
counter = 0
for c in range(j-1, j+2):
for cc in range(k-1, k+2):
if array[i, c, cc] == 1:
counter +=1
# If the pixel is the start of a line (meaning that on a 3x3 cells matrix, 2 cells are equal to 1)
if counter == 2:
# Gather the indexes of the next line cell
# Vertical Cross search
for c in range(0, len(crossr)):
if array[i,j+crossr[c], k+crossc[c]] == 1:
jj = j+crossr[c]
kk = k+crossc[c]
test[jj-r1, kk-c1] = 1
break
# Tilted cross search
elif array[i,j+xr[c], k+xc[c]] == 1:
jj = j+xr[c]
kk = k+xc[c]
test[jj-r1, kk-c1] = 1
break
# Save the indexes of the previous cell in line
source = [j,k]
source = np.vstack((source,[jj,kk]))
linelength = 1
# When we find an extremity of a line, we "travel along" the line by finding the next neighboring cell
while jj > r1 and jj < r2:
while kk > c1 and kk < c2:
flag = 0
for c in range(0, len(crossr)):
# If we find a cell = 1, with different indexes than the previous searching cell
if array[i,jj+crossr[c], kk+crossc[c]] == 1 and [jj+crossr[c],kk+crossc[c]] not in source.tolist():
source = np.vstack((source,[jj,kk]))
jj = jj+crossr[c]
kk = kk+crossc[c]
test[jj-r1, kk-c1] = 1
linelength += 1
flag = 1
break
# Tilted cross search
elif array[i,jj+xr[c], kk+xc[c]] == 1 and [jj+xr[c],kk+xc[c]] not in source.tolist():
source = np.vstack((source,[jj,kk]))
jj = jj+xr[c]
kk = kk+xc[c]
test[jj-r1, kk-c1] = 1
linelength += 1
flag = 1
break
# If there is no other cell on the line, flag is 0 so we stop the search
# We gather the information of that line in our dataframe
if flag == 0:
print('End of Line')
templines.loc[len(templines.index)] = [i, j, k, jj, kk, linelength]
break
if flag == 0:
break
if flag == 0:
break
k += 1
j += 1
i += 1
# We snip together lines that are ending and starting at the same point
for m in range(len(templines)):
for n in range(m+1, len(templines)):
if templines.Rend[m] == templines.Rstart[n] and templines.Cend[m] == templines.Cstart[n]:
lines.loc[len(lines.index)] = [templines.i[m], templines.Rstart[m], templines.Cstart[m], templines.Rend[n],
templines.Cend[n], templines.Length[m]+templines.Length[n]]
# We check that the program "traveled" along the line
plt.figure()
plt.imshow(array[0,:,:])
plt.figure()
plt.imshow(test[:,:])
图像处理中有个思路,就是找一组连续的像素点,或者connected component。将图像分解成连接的组件后,您可以找到每个组件的大小,并过滤掉较小的组件。
在 scipy 包中有一种快速的方法,称为 scipy.ndimage.label
,您可以像这样应用它:
import numpy as np
import matplotlib.pyplot as plt
import scipy.ndimage
# array = ...
# array is assumed to be 2D here
array[np.isnan(array)] = 0
# diagonally adjacent pixels count as the same feature
structure = [
[1,1,1],
[1,1,1],
[1,1,1]
]
array_labelled, num_features = scipy.ndimage.label(array, structure)
size_for_feature = {}
for feature_num in range(1, num_features + 1):
size_for_feature[feature_num] = (array_labelled == feature_num).sum()
# Filter out features which are too small
threshold = 10
size_for_feature = {
feature_num: size
for feature_num, size in size_for_feature.items()
if size > threshold
}
array_out = np.zeros_like(array)
for feature_num in size_for_feature:
array_out[array_labelled == feature_num] = 1
plt.imshow(array_out)
输出:
此解决方案并不完全符合您的要求 - 但它足够相似,可能适用于您的应用程序。
- 上下文:我使用卫星图像进行过滤,根据是否存在雪(0 表示雪,1 表示无雪)转换为 1 和 0 的数组。我的代码创建了一个
NaNs
的数组,如果至少有一个邻居是非雪像素(在交叉模式中,下图中涂成红色的单元格),则搜索每个雪像素,并在nan
数组。一旦我对我的整个矩阵执行此操作,我最终会得到一行单元格 = 1 的行,其余为 nans。 - 问题:我最终得到一个矩阵,里面有几行。我算作一条线的是在直接邻域中至少有两个单元格等于 1。这意味着对于每个线单元格,如果周围 8 个单元格中的任何一个单元格内部有
1
,则它们正在形成一条线(下图显示雪(紫色)和非雪单元格(黄色)之间的边界。 - 我有什么:我写了一个算法来计算一行中的单元格数量并记录它的starting/ending个单元格(见下图,红线通过的单元格数量)所以我可以过滤我的线条在最后。
- 我想要的是:我的代码可以运行,但速度非常慢。我已尽我所能对其进行编码,但 O 想知道是否有更高效的方法?
Ps:不好意思解释的啰啰嗦嗦,不好意思解释清楚。该代码将向您展示它是如何工作的,生成的数字应该会更清楚。
生成“线”矩阵的一些代码:
import numpy as np
import rasterio as rio
import numpy as np
import glob
import matplotlib.pyplot as plt
import pandas as pd
# Create an array with a line of 1s (not a straight line)
array = np.array([[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., 1., np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., 1., np.nan, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, 1., np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, 1., np.nan, 1., np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, 1., np.nan, 1., np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, 1., np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, 1., np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, 1., 1., np.nan, np.nan, np.nan, 1., np.nan, 1., np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, 1., 1., 1., 1., np.nan, 1., np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, 1., 1., 1., 1., 1., np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]])
array = array.reshape((1,array.shape[0],array.shape[1]))
# Create a dataframe that will store the start and endpoint of the lines, as well as their length
templines = pd.DataFrame(columns=['i','Rstart','Cstart','Rend','Cend','Length'])
# Create a 2nd dataframe that will be used to snip together the lines that end and start at the same point
lines = templines.copy()
# Create the straight and tilted cross search patterns
crossr = [-1, 0, 0, 1]
crossc = [0, -1, 1, 0]
xr = [-1, -1, 1, 1]
xc = [-1, 1, -1, 1]
# Constrain our search to an Area of Interest (AOI), useful when dealing with bigger arrays.
r1 = 0
r2 = array.shape[1]
c1 = 0
c2 = array.shape[2]
# Initialize our search indexes
i = 0
j = r1
k = c1
# Create an empty array which will be filled with 1s everytime a pixel is on the line (ensures our programe is replicating the lines' paths)
test = np.zeros((r2-r1, c2-c1))
# Start the search
while i < array.shape[0]:
j = r1
while j < r2:
k = c1
while k < c2:
# If we find a boundary pixel, start searching its neighbors' values
if array[i, j, k] == 1:
# We initialize a counter. If exactly 2 cells in the square centered on our cell are 1s, we are at the start of a line
counter = 0
for c in range(j-1, j+2):
for cc in range(k-1, k+2):
if array[i, c, cc] == 1:
counter +=1
# If the pixel is the start of a line (meaning that on a 3x3 cells matrix, 2 cells are equal to 1)
if counter == 2:
# Gather the indexes of the next line cell
# Vertical Cross search
for c in range(0, len(crossr)):
if array[i,j+crossr[c], k+crossc[c]] == 1:
jj = j+crossr[c]
kk = k+crossc[c]
test[jj-r1, kk-c1] = 1
break
# Tilted cross search
elif array[i,j+xr[c], k+xc[c]] == 1:
jj = j+xr[c]
kk = k+xc[c]
test[jj-r1, kk-c1] = 1
break
# Save the indexes of the previous cell in line
source = [j,k]
source = np.vstack((source,[jj,kk]))
linelength = 1
# When we find an extremity of a line, we "travel along" the line by finding the next neighboring cell
while jj > r1 and jj < r2:
while kk > c1 and kk < c2:
flag = 0
for c in range(0, len(crossr)):
# If we find a cell = 1, with different indexes than the previous searching cell
if array[i,jj+crossr[c], kk+crossc[c]] == 1 and [jj+crossr[c],kk+crossc[c]] not in source.tolist():
source = np.vstack((source,[jj,kk]))
jj = jj+crossr[c]
kk = kk+crossc[c]
test[jj-r1, kk-c1] = 1
linelength += 1
flag = 1
break
# Tilted cross search
elif array[i,jj+xr[c], kk+xc[c]] == 1 and [jj+xr[c],kk+xc[c]] not in source.tolist():
source = np.vstack((source,[jj,kk]))
jj = jj+xr[c]
kk = kk+xc[c]
test[jj-r1, kk-c1] = 1
linelength += 1
flag = 1
break
# If there is no other cell on the line, flag is 0 so we stop the search
# We gather the information of that line in our dataframe
if flag == 0:
print('End of Line')
templines.loc[len(templines.index)] = [i, j, k, jj, kk, linelength]
break
if flag == 0:
break
if flag == 0:
break
k += 1
j += 1
i += 1
# We snip together lines that are ending and starting at the same point
for m in range(len(templines)):
for n in range(m+1, len(templines)):
if templines.Rend[m] == templines.Rstart[n] and templines.Cend[m] == templines.Cstart[n]:
lines.loc[len(lines.index)] = [templines.i[m], templines.Rstart[m], templines.Cstart[m], templines.Rend[n],
templines.Cend[n], templines.Length[m]+templines.Length[n]]
# We check that the program "traveled" along the line
plt.figure()
plt.imshow(array[0,:,:])
plt.figure()
plt.imshow(test[:,:])
图像处理中有个思路,就是找一组连续的像素点,或者connected component。将图像分解成连接的组件后,您可以找到每个组件的大小,并过滤掉较小的组件。
在 scipy 包中有一种快速的方法,称为 scipy.ndimage.label
,您可以像这样应用它:
import numpy as np
import matplotlib.pyplot as plt
import scipy.ndimage
# array = ...
# array is assumed to be 2D here
array[np.isnan(array)] = 0
# diagonally adjacent pixels count as the same feature
structure = [
[1,1,1],
[1,1,1],
[1,1,1]
]
array_labelled, num_features = scipy.ndimage.label(array, structure)
size_for_feature = {}
for feature_num in range(1, num_features + 1):
size_for_feature[feature_num] = (array_labelled == feature_num).sum()
# Filter out features which are too small
threshold = 10
size_for_feature = {
feature_num: size
for feature_num, size in size_for_feature.items()
if size > threshold
}
array_out = np.zeros_like(array)
for feature_num in size_for_feature:
array_out[array_labelled == feature_num] = 1
plt.imshow(array_out)
输出:
此解决方案并不完全符合您的要求 - 但它足够相似,可能适用于您的应用程序。