提高图像处理的准确性以计算真菌孢子
Improve accuracy of image processing to count fungus spores
我正在尝试使用 Pythony 计算显微镜样本中疾病孢子的数量,但到目前为止没有取得太大成功。
因为孢子的颜色和背景很像,很多都很接近
跟随样品的显微照相。
图像处理代码:
import numpy as np
import argparse
import imutils
import cv2
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
help="path to the input image")
ap.add_argument("-o", "--output", required=True,
help="path to the output image")
args = vars(ap.parse_args())
counter = {}
image_orig = cv2.imread(args["image"])
height_orig, width_orig = image_orig.shape[:2]
image_contours = image_orig.copy()
colors = ['Yellow']
for color in colors:
image_to_process = image_orig.copy()
counter[color] = 0
if color == 'Yellow':
lower = np.array([70, 150, 140]) #rgb(151, 143, 80)
upper = np.array([110, 240, 210]) #rgb(212, 216, 106)
image_mask = cv2.inRange(image_to_process, lower, upper)
image_res = cv2.bitwise_and(
image_to_process, image_to_process, mask=image_mask)
image_gray = cv2.cvtColor(image_res, cv2.COLOR_BGR2GRAY)
image_gray = cv2.GaussianBlur(image_gray, (5, 5), 50)
image_edged = cv2.Canny(image_gray, 100, 200)
image_edged = cv2.dilate(image_edged, None, iterations=1)
image_edged = cv2.erode(image_edged, None, iterations=1)
cnts = cv2.findContours(
image_edged.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if imutils.is_cv2() else cnts[1]
for c in cnts:
if cv2.contourArea(c) < 1100:
continue
hull = cv2.convexHull(c)
if color == 'Yellow':
cv2.drawContours(image_contours, [hull], 0, (0, 0, 255), 1)
counter[color] += 1
print("{} esporos {}".format(counter[color], color))
cv2.imwrite(args["output"], image_contours)
算法统计11 spores
但图片中包含27个孢子
图像处理结果显示孢子被分组
如何使它更准确?
这些真菌孢子的大小大致相等,如果你不关心精确度,而不是跳下扩大边界和分水岭的兔子洞,你可以 要做的是对您当前的算法进行非常简单的更改并获得更高的准确性。
这个场景中的孢子看起来大小相似,形状大致相同。鉴于此,您可以使用等高线的面积来使用孢子的平均面积找到占据该区域的孢子的近似数量。孢子不能完全填满这些任意形状,因此您必须考虑到这一点。您可以通过找到背景颜色并从轮廓区域中移除背景颜色占据的区域来实现。在这样的场景中,您应该非常接近 单元格区域的真实答案。
所以回顾一下:
Find average area of spore,
Find background color
Find contour area,
subtract background color pixels/area from contour
approximate_spore_count = ceil(contour_area / (average_area_of_spore))
你在这里使用 ceil 来处理这样一个事实,即你的孢子可能小于单独发现的平均值,尽管你也可以设置一个特定的条件来处理这个问题,但是你必须做一个决定是否要计算孢子的分数或四舍五入为等高线面积 > 平均孢子面积的整数。
但是,您可能会注意到,如果您能弄清楚背景颜色,并且孢子的形状大致相同且颜色均匀,那么简单地减去背景颜色的面积会在性能上做得更好 从整个图像中取出,然后用平均孢子大小除以剩余的面积。这比使用膨胀要快得多。
您应该考虑的另一件事是使用 OpenCV 的 built in Blob detection, which if you go for the area approach, might be able to help you with the edge cases that the gradient in your background might present. Using blob detection you can just detect blobs, and divide the total blob area by average spore area. You can follow this tutorial to understand how to use it in python. You may also find the success of a 使用 opencv 的轮廓对您的用例有帮助,尽管我认为它不一定能解决您的结块问题。
TLDR:你的孢子大小和颜色深浅大致相同,你的背景大致均匀,使用平均孢子面积并除以孢子颜色占据的面积以获得更准确的数据计数
补遗:
如果您无法找到平均孢子面积,那么如果您知道孢子的平均“孤独度”(明显分离度),您可以使用它来对 contours/blobs按面积,然后根据“孤独”概率(n)取底部n%的孢子,取平均值。只要“孤独”在很大程度上不依赖于孢子大小,这应该是对平均孢子大小的相当准确的测量。这是可行的,因为如果你假设孢子的均匀分布是“孤独的”,那么你可以将其视为一个随机样本,如果你知道孤独的平均百分比,那么你可能会得到非常高的孤独百分比孢子,如果你按大小取 %n 个分类的孢子(或稍微缩小 n 以降低意外抓住大孢子的机会)。如果您知道缩放系数,理论上您只需要执行一次。
首先,我们将在下面使用一些初步代码:
import numpy as np
import cv2
from matplotlib import pyplot as plt
from skimage.morphology import extrema
from skimage.morphology import watershed as skwater
def ShowImage(title,img,ctype):
if ctype=='bgr':
b,g,r = cv2.split(img) # get b,g,r
rgb_img = cv2.merge([r,g,b]) # switch it to rgb
plt.imshow(rgb_img)
elif ctype=='hsv':
rgb = cv2.cvtColor(img,cv2.COLOR_HSV2RGB)
plt.imshow(rgb)
elif ctype=='gray':
plt.imshow(img,cmap='gray')
elif ctype=='rgb':
plt.imshow(img)
else:
raise Exception("Unknown colour type")
plt.title(title)
plt.show()
作为参考,这是您的原始图片:
#Read in image
img = cv2.imread('cells.jpg')
ShowImage('Original',img,'bgr')
Otsu's method 是一种分割颜色的方法。该方法假设图像像素的强度可以绘制成双峰直方图,并找到该直方图的最佳分隔符。我使用下面的方法。
#Convert to a single, grayscale channel
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
#Threshold the image to binary using Otsu's method
ret, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
ShowImage('Grayscale',gray,'gray')
ShowImage('Applying Otsu',thresh,'gray')
所有这些小斑点都很烦人,我们可以通过放大来去除它们:
#Adjust iterations until desired result is achieved
kernel = np.ones((3,3),np.uint8)
dilated = cv2.dilate(thresh, kernel, iterations=5)
ShowImage('Dilated',dilated,'gray')
我们现在需要识别分水岭的峰值并给它们单独的标签。这样做的目的是生成一组像素,使得每个单元格中都有一个像素,并且没有两个单元格的标识符像素接触。
为此,我们执行距离变换,然后过滤掉距离单元格中心太远的距离。
#Calculate distance transformation
dist = cv2.distanceTransform(dilated,cv2.DIST_L2,5)
ShowImage('Distance',dist,'gray')
#Adjust this parameter until desired separation occurs
fraction_foreground = 0.6
ret, sure_fg = cv2.threshold(dist,fraction_foreground*dist.max(),255,0)
ShowImage('Surely Foreground',sure_fg,'gray')
就算法而言,上图中的每个白色区域都是一个单独的单元格。
现在我们通过减去最大值来识别未知区域,这些区域将被分水岭算法标记:
# Finding unknown region
unknown = cv2.subtract(dilated,sure_fg.astype(np.uint8))
ShowImage('Unknown',unknown,'gray')
未知区域应在每个单元格周围形成完整的甜甜圈。
接下来,我们为距离变换产生的每个不同区域赋予唯一标签,然后在最终执行分水岭变换之前标记未知区域:
# Marker labelling
ret, markers = cv2.connectedComponents(sure_fg.astype(np.uint8))
ShowImage('Connected Components',markers,'rgb')
# Add one to all labels so that sure background is not 0, but 1
markers = markers+1
# Now, mark the region of unknown with zero
markers[unknown==np.max(unknown)] = 0
ShowImage('markers',markers,'rgb')
dist = cv2.distanceTransform(dilated,cv2.DIST_L2,5)
markers = skwater(-dist,markers,watershed_line=True)
ShowImage('Watershed',markers,'rgb')
现在细胞总数是独特标记的数量减去1(忽略背景):
len(set(markers.flatten()))-1
在这种情况下,我们得到 23。
您可以通过调整距离阈值、扩张程度,或者使用 h-maxima(局部阈值最大值)来提高或降低准确度。但要小心过度拟合;也就是说,不要假设对单个图像进行调整会在任何地方为您提供最佳结果。
估计不确定性
您还可以通过算法稍微改变参数以了解计数的不确定性。可能看起来像这样
import numpy as np
import cv2
import itertools
from matplotlib import pyplot as plt
from skimage.morphology import extrema
from skimage.morphology import watershed as skwater
def CountCells(dilation=5, fg_frac=0.6):
#Read in image
img = cv2.imread('cells.jpg')
#Convert to a single, grayscale channel
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
#Threshold the image to binary using Otsu's method
ret, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
#Adjust iterations until desired result is achieved
kernel = np.ones((3,3),np.uint8)
dilated = cv2.dilate(thresh, kernel, iterations=dilation)
#Calculate distance transformation
dist = cv2.distanceTransform(dilated,cv2.DIST_L2,5)
#Adjust this parameter until desired separation occurs
fraction_foreground = fg_frac
ret, sure_fg = cv2.threshold(dist,fraction_foreground*dist.max(),255,0)
# Finding unknown region
unknown = cv2.subtract(dilated,sure_fg.astype(np.uint8))
# Marker labelling
ret, markers = cv2.connectedComponents(sure_fg.astype(np.uint8))
# Add one to all labels so that sure background is not 0, but 1
markers = markers+1
# Now, mark the region of unknown with zero
markers[unknown==np.max(unknown)] = 0
markers = skwater(-dist,markers,watershed_line=True)
return len(set(markers.flatten()))-1
#Smaller numbers are noisier, which leads to many small blobs that get
#thresholded out (undercounting); larger numbers result in possibly fewer blobs,
#which can also cause undercounting.
dilations = [4,5,6]
#Small numbers equal less separation, so undercounting; larger numbers equal
#more separation or drop-outs. This can lead to over-counting initially, but
#rapidly to under-counting.
fracs = [0.5, 0.6, 0.7, 0.8]
for params in itertools.product(dilations,fracs):
print("Dilation={0}, FG frac={1}, Count={2}".format(*params,CountCells(*params)))
给出结果:
Dilation=4, FG frac=0.5, Count=22
Dilation=4, FG frac=0.6, Count=23
Dilation=4, FG frac=0.7, Count=17
Dilation=4, FG frac=0.8, Count=12
Dilation=5, FG frac=0.5, Count=21
Dilation=5, FG frac=0.6, Count=23
Dilation=5, FG frac=0.7, Count=20
Dilation=5, FG frac=0.8, Count=13
Dilation=6, FG frac=0.5, Count=20
Dilation=6, FG frac=0.6, Count=23
Dilation=6, FG frac=0.7, Count=24
Dilation=6, FG frac=0.8, Count=14
取计数值的中位数是将不确定性纳入单个数字的一种方法。
请记住,Whosebug 的许可要求您提供 appropriate attribution。在学术工作中,这可以通过引用来完成。
我正在尝试使用 Pythony 计算显微镜样本中疾病孢子的数量,但到目前为止没有取得太大成功。
因为孢子的颜色和背景很像,很多都很接近
跟随样品的显微照相。
图像处理代码:
import numpy as np
import argparse
import imutils
import cv2
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
help="path to the input image")
ap.add_argument("-o", "--output", required=True,
help="path to the output image")
args = vars(ap.parse_args())
counter = {}
image_orig = cv2.imread(args["image"])
height_orig, width_orig = image_orig.shape[:2]
image_contours = image_orig.copy()
colors = ['Yellow']
for color in colors:
image_to_process = image_orig.copy()
counter[color] = 0
if color == 'Yellow':
lower = np.array([70, 150, 140]) #rgb(151, 143, 80)
upper = np.array([110, 240, 210]) #rgb(212, 216, 106)
image_mask = cv2.inRange(image_to_process, lower, upper)
image_res = cv2.bitwise_and(
image_to_process, image_to_process, mask=image_mask)
image_gray = cv2.cvtColor(image_res, cv2.COLOR_BGR2GRAY)
image_gray = cv2.GaussianBlur(image_gray, (5, 5), 50)
image_edged = cv2.Canny(image_gray, 100, 200)
image_edged = cv2.dilate(image_edged, None, iterations=1)
image_edged = cv2.erode(image_edged, None, iterations=1)
cnts = cv2.findContours(
image_edged.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if imutils.is_cv2() else cnts[1]
for c in cnts:
if cv2.contourArea(c) < 1100:
continue
hull = cv2.convexHull(c)
if color == 'Yellow':
cv2.drawContours(image_contours, [hull], 0, (0, 0, 255), 1)
counter[color] += 1
print("{} esporos {}".format(counter[color], color))
cv2.imwrite(args["output"], image_contours)
算法统计11 spores
但图片中包含27个孢子
图像处理结果显示孢子被分组
如何使它更准确?
这些真菌孢子的大小大致相等,如果你不关心精确度,而不是跳下扩大边界和分水岭的兔子洞,你可以 要做的是对您当前的算法进行非常简单的更改并获得更高的准确性。
这个场景中的孢子看起来大小相似,形状大致相同。鉴于此,您可以使用等高线的面积来使用孢子的平均面积找到占据该区域的孢子的近似数量。孢子不能完全填满这些任意形状,因此您必须考虑到这一点。您可以通过找到背景颜色并从轮廓区域中移除背景颜色占据的区域来实现。在这样的场景中,您应该非常接近 单元格区域的真实答案。
所以回顾一下:
Find average area of spore,
Find background color
Find contour area,
subtract background color pixels/area from contour
approximate_spore_count = ceil(contour_area / (average_area_of_spore))
你在这里使用 ceil 来处理这样一个事实,即你的孢子可能小于单独发现的平均值,尽管你也可以设置一个特定的条件来处理这个问题,但是你必须做一个决定是否要计算孢子的分数或四舍五入为等高线面积 > 平均孢子面积的整数。
但是,您可能会注意到,如果您能弄清楚背景颜色,并且孢子的形状大致相同且颜色均匀,那么简单地减去背景颜色的面积会在性能上做得更好 从整个图像中取出,然后用平均孢子大小除以剩余的面积。这比使用膨胀要快得多。
您应该考虑的另一件事是使用 OpenCV 的 built in Blob detection, which if you go for the area approach, might be able to help you with the edge cases that the gradient in your background might present. Using blob detection you can just detect blobs, and divide the total blob area by average spore area. You can follow this tutorial to understand how to use it in python. You may also find the success of a
TLDR:你的孢子大小和颜色深浅大致相同,你的背景大致均匀,使用平均孢子面积并除以孢子颜色占据的面积以获得更准确的数据计数
补遗:
如果您无法找到平均孢子面积,那么如果您知道孢子的平均“孤独度”(明显分离度),您可以使用它来对 contours/blobs按面积,然后根据“孤独”概率(n)取底部n%的孢子,取平均值。只要“孤独”在很大程度上不依赖于孢子大小,这应该是对平均孢子大小的相当准确的测量。这是可行的,因为如果你假设孢子的均匀分布是“孤独的”,那么你可以将其视为一个随机样本,如果你知道孤独的平均百分比,那么你可能会得到非常高的孤独百分比孢子,如果你按大小取 %n 个分类的孢子(或稍微缩小 n 以降低意外抓住大孢子的机会)。如果您知道缩放系数,理论上您只需要执行一次。
首先,我们将在下面使用一些初步代码:
import numpy as np
import cv2
from matplotlib import pyplot as plt
from skimage.morphology import extrema
from skimage.morphology import watershed as skwater
def ShowImage(title,img,ctype):
if ctype=='bgr':
b,g,r = cv2.split(img) # get b,g,r
rgb_img = cv2.merge([r,g,b]) # switch it to rgb
plt.imshow(rgb_img)
elif ctype=='hsv':
rgb = cv2.cvtColor(img,cv2.COLOR_HSV2RGB)
plt.imshow(rgb)
elif ctype=='gray':
plt.imshow(img,cmap='gray')
elif ctype=='rgb':
plt.imshow(img)
else:
raise Exception("Unknown colour type")
plt.title(title)
plt.show()
作为参考,这是您的原始图片:
#Read in image
img = cv2.imread('cells.jpg')
ShowImage('Original',img,'bgr')
Otsu's method 是一种分割颜色的方法。该方法假设图像像素的强度可以绘制成双峰直方图,并找到该直方图的最佳分隔符。我使用下面的方法。
#Convert to a single, grayscale channel
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
#Threshold the image to binary using Otsu's method
ret, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
ShowImage('Grayscale',gray,'gray')
ShowImage('Applying Otsu',thresh,'gray')
所有这些小斑点都很烦人,我们可以通过放大来去除它们:
#Adjust iterations until desired result is achieved
kernel = np.ones((3,3),np.uint8)
dilated = cv2.dilate(thresh, kernel, iterations=5)
ShowImage('Dilated',dilated,'gray')
我们现在需要识别分水岭的峰值并给它们单独的标签。这样做的目的是生成一组像素,使得每个单元格中都有一个像素,并且没有两个单元格的标识符像素接触。
为此,我们执行距离变换,然后过滤掉距离单元格中心太远的距离。
#Calculate distance transformation
dist = cv2.distanceTransform(dilated,cv2.DIST_L2,5)
ShowImage('Distance',dist,'gray')
#Adjust this parameter until desired separation occurs
fraction_foreground = 0.6
ret, sure_fg = cv2.threshold(dist,fraction_foreground*dist.max(),255,0)
ShowImage('Surely Foreground',sure_fg,'gray')
就算法而言,上图中的每个白色区域都是一个单独的单元格。
现在我们通过减去最大值来识别未知区域,这些区域将被分水岭算法标记:
# Finding unknown region
unknown = cv2.subtract(dilated,sure_fg.astype(np.uint8))
ShowImage('Unknown',unknown,'gray')
未知区域应在每个单元格周围形成完整的甜甜圈。
接下来,我们为距离变换产生的每个不同区域赋予唯一标签,然后在最终执行分水岭变换之前标记未知区域:
# Marker labelling
ret, markers = cv2.connectedComponents(sure_fg.astype(np.uint8))
ShowImage('Connected Components',markers,'rgb')
# Add one to all labels so that sure background is not 0, but 1
markers = markers+1
# Now, mark the region of unknown with zero
markers[unknown==np.max(unknown)] = 0
ShowImage('markers',markers,'rgb')
dist = cv2.distanceTransform(dilated,cv2.DIST_L2,5)
markers = skwater(-dist,markers,watershed_line=True)
ShowImage('Watershed',markers,'rgb')
现在细胞总数是独特标记的数量减去1(忽略背景):
len(set(markers.flatten()))-1
在这种情况下,我们得到 23。
您可以通过调整距离阈值、扩张程度,或者使用 h-maxima(局部阈值最大值)来提高或降低准确度。但要小心过度拟合;也就是说,不要假设对单个图像进行调整会在任何地方为您提供最佳结果。
估计不确定性
您还可以通过算法稍微改变参数以了解计数的不确定性。可能看起来像这样
import numpy as np
import cv2
import itertools
from matplotlib import pyplot as plt
from skimage.morphology import extrema
from skimage.morphology import watershed as skwater
def CountCells(dilation=5, fg_frac=0.6):
#Read in image
img = cv2.imread('cells.jpg')
#Convert to a single, grayscale channel
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
#Threshold the image to binary using Otsu's method
ret, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
#Adjust iterations until desired result is achieved
kernel = np.ones((3,3),np.uint8)
dilated = cv2.dilate(thresh, kernel, iterations=dilation)
#Calculate distance transformation
dist = cv2.distanceTransform(dilated,cv2.DIST_L2,5)
#Adjust this parameter until desired separation occurs
fraction_foreground = fg_frac
ret, sure_fg = cv2.threshold(dist,fraction_foreground*dist.max(),255,0)
# Finding unknown region
unknown = cv2.subtract(dilated,sure_fg.astype(np.uint8))
# Marker labelling
ret, markers = cv2.connectedComponents(sure_fg.astype(np.uint8))
# Add one to all labels so that sure background is not 0, but 1
markers = markers+1
# Now, mark the region of unknown with zero
markers[unknown==np.max(unknown)] = 0
markers = skwater(-dist,markers,watershed_line=True)
return len(set(markers.flatten()))-1
#Smaller numbers are noisier, which leads to many small blobs that get
#thresholded out (undercounting); larger numbers result in possibly fewer blobs,
#which can also cause undercounting.
dilations = [4,5,6]
#Small numbers equal less separation, so undercounting; larger numbers equal
#more separation or drop-outs. This can lead to over-counting initially, but
#rapidly to under-counting.
fracs = [0.5, 0.6, 0.7, 0.8]
for params in itertools.product(dilations,fracs):
print("Dilation={0}, FG frac={1}, Count={2}".format(*params,CountCells(*params)))
给出结果:
Dilation=4, FG frac=0.5, Count=22
Dilation=4, FG frac=0.6, Count=23
Dilation=4, FG frac=0.7, Count=17
Dilation=4, FG frac=0.8, Count=12
Dilation=5, FG frac=0.5, Count=21
Dilation=5, FG frac=0.6, Count=23
Dilation=5, FG frac=0.7, Count=20
Dilation=5, FG frac=0.8, Count=13
Dilation=6, FG frac=0.5, Count=20
Dilation=6, FG frac=0.6, Count=23
Dilation=6, FG frac=0.7, Count=24
Dilation=6, FG frac=0.8, Count=14
取计数值的中位数是将不确定性纳入单个数字的一种方法。
请记住,Whosebug 的许可要求您提供 appropriate attribution。在学术工作中,这可以通过引用来完成。