将图像张量分割成小块
Split image tensor into small patches
我有一个 image
形状 (466,394,1)
,我想将其分成 7x7 块。
image = tf.placeholder(dtype=tf.float32, shape=[1, 466, 394, 1])
使用
image_patches = tf.extract_image_patches(image, [1, 7, 7, 1], [1, 7, 7, 1], [1, 1, 1, 1], 'VALID')
# shape (1, 66, 56, 49)
image_patches_reshaped = tf.reshape(image_patches, [-1, 7, 7, 1])
# shape (3696, 7, 7, 1)
不幸的是,在实践中不起作用,因为 image_patches_reshaped
混淆了像素顺序(如果您查看 images_patches_reshaped
,您只会看到噪音)。
所以我的新方法是使用 tf.split
:
image_hsplits = tf.split(1, 4, image_resized)
# [<tf.Tensor 'split_255:0' shape=(462, 7, 1) dtype=float32>,...]
image_patches = []
for split in image_hsplits:
image_patches.extend(tf.split(0, 66, split))
image_patches
# [<tf.Tensor 'split_317:0' shape=(7, 7, 1) dtype=float32>, ...]
这确实保留了图像像素顺序,不幸的是它创建了很多不是很好的 OP。
如何使用较少的 OP 将图像分割成更小的块?
更新1:
我将 answer of this question for numpy 移植到 tensorflow:
def image_to_patches(image, image_height, image_width, patch_height, patch_width):
height = math.ceil(image_height/patch_height)*patch_height
width = math.ceil(image_width/patch_width)*patch_width
image_resized = tf.squeeze(tf.image.resize_image_with_crop_or_pad(image, height, width))
image_reshaped = tf.reshape(image_resized, [height // patch_height, patch_height, -1, patch_width])
image_transposed = tf.transpose(image_reshaped, [0, 2, 1, 3])
return tf.reshape(image_transposed, [-1, patch_height, patch_width, 1])
但我认为还有改进的余地
更新2:
这会将补丁转换回原始图像。
def patches_to_image(patches, image_height, image_width, patch_height, patch_width):
height = math.ceil(image_height/patch_height)*patch_height
width = math.ceil(image_width/patch_width)*patch_width
image_reshaped = tf.reshape(tf.squeeze(patches), [height // patch_height, width // patch_width, patch_height, patch_width])
image_transposed = tf.transpose(image_reshaped, [0, 2, 1, 3])
image_resized = tf.reshape(image_transposed, [height, width, 1])
return tf.image.resize_image_with_crop_or_pad(image_resized, image_height, image_width)
我认为您的问题出在其他地方。我编写了以下代码片段(使用较小的 14x14 图像以便我可以手动检查所有值),并确认您的初始代码执行了正确的操作:
import tensorflow as tf
import numpy as np
IMAGE_SIZE = [1, 14, 14, 1]
PATCH_SIZE = [1, 7, 7, 1]
input_image = np.reshape(np.array(xrange(14*14)), IMAGE_SIZE)
image = tf.placeholder(dtype=tf.int32, shape=IMAGE_SIZE)
image_patches = tf.extract_image_patches(
image, PATCH_SIZE, PATCH_SIZE, [1, 1, 1, 1], 'VALID')
image_patches_reshaped = tf.reshape(image_patches, [-1, 7, 7, 1])
sess = tf.Session()
(output, output_reshaped) = sess.run(
(image_patches, image_patches_reshaped),
feed_dict={image: input_image})
print "Output (shape: %s):" % (output.shape,)
print output
print "Reshaped (shape: %s):" % (output_reshaped.shape,)
print output_reshaped
输出是:
python resize.py
Output (shape: (1, 2, 2, 49)):
[[[[ 0 1 2 3 4 5 6 14 15 16 17 18 19 20 28 29 30 31
32 33 34 42 43 44 45 46 47 48 56 57 58 59 60 61 62 70
71 72 73 74 75 76 84 85 86 87 88 89 90]
[ 7 8 9 10 11 12 13 21 22 23 24 25 26 27 35 36 37 38
39 40 41 49 50 51 52 53 54 55 63 64 65 66 67 68 69 77
78 79 80 81 82 83 91 92 93 94 95 96 97]]
[[ 98 99 100 101 102 103 104 112 113 114 115 116 117 118 126 127 128 129
130 131 132 140 141 142 143 144 145 146 154 155 156 157 158 159 160 168
169 170 171 172 173 174 182 183 184 185 186 187 188]
[105 106 107 108 109 110 111 119 120 121 122 123 124 125 133 134 135 136
137 138 139 147 148 149 150 151 152 153 161 162 163 164 165 166 167 175
176 177 178 179 180 181 189 190 191 192 193 194 195]]]]
Reshaped (shape: (4, 7, 7, 1)):
[[[[ 0]
[ 1]
[ 2]
[ 3]
[ 4]
[ 5]
[ 6]]
[[ 14]
[ 15]
[ 16]
[ 17]
[ 18]
[ 19]
[ 20]]
[[ 28]
[ 29]
[ 30]
[ 31]
[ 32]
[ 33]
[ 34]]
[[ 42]
[ 43]
[ 44]
[ 45]
[ 46]
[ 47]
[ 48]]
[[ 56]
[ 57]
[ 58]
[ 59]
[ 60]
[ 61]
[ 62]]
[[ 70]
[ 71]
[ 72]
[ 73]
[ 74]
[ 75]
[ 76]]
[[ 84]
[ 85]
[ 86]
[ 87]
[ 88]
[ 89]
[ 90]]]
[[[ 7]
[ 8]
[ 9]
[ 10]
[ 11]
[ 12]
[ 13]]
[[ 21]
[ 22]
[ 23]
[ 24]
[ 25]
[ 26]
[ 27]]
[[ 35]
[ 36]
[ 37]
[ 38]
[ 39]
[ 40]
[ 41]]
[[ 49]
[ 50]
[ 51]
[ 52]
[ 53]
[ 54]
[ 55]]
[[ 63]
[ 64]
[ 65]
[ 66]
[ 67]
[ 68]
[ 69]]
[[ 77]
[ 78]
[ 79]
[ 80]
[ 81]
[ 82]
[ 83]]
[[ 91]
[ 92]
[ 93]
[ 94]
[ 95]
[ 96]
[ 97]]]
[[[ 98]
[ 99]
[100]
[101]
[102]
[103]
[104]]
[[112]
[113]
[114]
[115]
[116]
[117]
[118]]
[[126]
[127]
[128]
[129]
[130]
[131]
[132]]
[[140]
[141]
[142]
[143]
[144]
[145]
[146]]
[[154]
[155]
[156]
[157]
[158]
[159]
[160]]
[[168]
[169]
[170]
[171]
[172]
[173]
[174]]
[[182]
[183]
[184]
[185]
[186]
[187]
[188]]]
[[[105]
[106]
[107]
[108]
[109]
[110]
[111]]
[[119]
[120]
[121]
[122]
[123]
[124]
[125]]
[[133]
[134]
[135]
[136]
[137]
[138]
[139]]
[[147]
[148]
[149]
[150]
[151]
[152]
[153]]
[[161]
[162]
[163]
[164]
[165]
[166]
[167]]
[[175]
[176]
[177]
[178]
[179]
[180]
[181]]
[[189]
[190]
[191]
[192]
[193]
[194]
[195]]]]
根据重塑后的输出,您可以看到它是一个 4x7x7x1,第一个补丁的值为:[0-7),[14-21],[28-35],[42-49), [56-63], [70-77), [84-91), 对应左上7x7格子
也许你可以进一步解释一下当它不能正常工作时发生了什么?
我有一个 image
形状 (466,394,1)
,我想将其分成 7x7 块。
image = tf.placeholder(dtype=tf.float32, shape=[1, 466, 394, 1])
使用
image_patches = tf.extract_image_patches(image, [1, 7, 7, 1], [1, 7, 7, 1], [1, 1, 1, 1], 'VALID')
# shape (1, 66, 56, 49)
image_patches_reshaped = tf.reshape(image_patches, [-1, 7, 7, 1])
# shape (3696, 7, 7, 1)
不幸的是,在实践中不起作用,因为 image_patches_reshaped
混淆了像素顺序(如果您查看 images_patches_reshaped
,您只会看到噪音)。
所以我的新方法是使用 tf.split
:
image_hsplits = tf.split(1, 4, image_resized)
# [<tf.Tensor 'split_255:0' shape=(462, 7, 1) dtype=float32>,...]
image_patches = []
for split in image_hsplits:
image_patches.extend(tf.split(0, 66, split))
image_patches
# [<tf.Tensor 'split_317:0' shape=(7, 7, 1) dtype=float32>, ...]
这确实保留了图像像素顺序,不幸的是它创建了很多不是很好的 OP。
如何使用较少的 OP 将图像分割成更小的块?
更新1:
我将 answer of this question for numpy 移植到 tensorflow:
def image_to_patches(image, image_height, image_width, patch_height, patch_width):
height = math.ceil(image_height/patch_height)*patch_height
width = math.ceil(image_width/patch_width)*patch_width
image_resized = tf.squeeze(tf.image.resize_image_with_crop_or_pad(image, height, width))
image_reshaped = tf.reshape(image_resized, [height // patch_height, patch_height, -1, patch_width])
image_transposed = tf.transpose(image_reshaped, [0, 2, 1, 3])
return tf.reshape(image_transposed, [-1, patch_height, patch_width, 1])
但我认为还有改进的余地
更新2:
这会将补丁转换回原始图像。
def patches_to_image(patches, image_height, image_width, patch_height, patch_width):
height = math.ceil(image_height/patch_height)*patch_height
width = math.ceil(image_width/patch_width)*patch_width
image_reshaped = tf.reshape(tf.squeeze(patches), [height // patch_height, width // patch_width, patch_height, patch_width])
image_transposed = tf.transpose(image_reshaped, [0, 2, 1, 3])
image_resized = tf.reshape(image_transposed, [height, width, 1])
return tf.image.resize_image_with_crop_or_pad(image_resized, image_height, image_width)
我认为您的问题出在其他地方。我编写了以下代码片段(使用较小的 14x14 图像以便我可以手动检查所有值),并确认您的初始代码执行了正确的操作:
import tensorflow as tf
import numpy as np
IMAGE_SIZE = [1, 14, 14, 1]
PATCH_SIZE = [1, 7, 7, 1]
input_image = np.reshape(np.array(xrange(14*14)), IMAGE_SIZE)
image = tf.placeholder(dtype=tf.int32, shape=IMAGE_SIZE)
image_patches = tf.extract_image_patches(
image, PATCH_SIZE, PATCH_SIZE, [1, 1, 1, 1], 'VALID')
image_patches_reshaped = tf.reshape(image_patches, [-1, 7, 7, 1])
sess = tf.Session()
(output, output_reshaped) = sess.run(
(image_patches, image_patches_reshaped),
feed_dict={image: input_image})
print "Output (shape: %s):" % (output.shape,)
print output
print "Reshaped (shape: %s):" % (output_reshaped.shape,)
print output_reshaped
输出是:
python resize.py
Output (shape: (1, 2, 2, 49)):
[[[[ 0 1 2 3 4 5 6 14 15 16 17 18 19 20 28 29 30 31
32 33 34 42 43 44 45 46 47 48 56 57 58 59 60 61 62 70
71 72 73 74 75 76 84 85 86 87 88 89 90]
[ 7 8 9 10 11 12 13 21 22 23 24 25 26 27 35 36 37 38
39 40 41 49 50 51 52 53 54 55 63 64 65 66 67 68 69 77
78 79 80 81 82 83 91 92 93 94 95 96 97]]
[[ 98 99 100 101 102 103 104 112 113 114 115 116 117 118 126 127 128 129
130 131 132 140 141 142 143 144 145 146 154 155 156 157 158 159 160 168
169 170 171 172 173 174 182 183 184 185 186 187 188]
[105 106 107 108 109 110 111 119 120 121 122 123 124 125 133 134 135 136
137 138 139 147 148 149 150 151 152 153 161 162 163 164 165 166 167 175
176 177 178 179 180 181 189 190 191 192 193 194 195]]]]
Reshaped (shape: (4, 7, 7, 1)):
[[[[ 0]
[ 1]
[ 2]
[ 3]
[ 4]
[ 5]
[ 6]]
[[ 14]
[ 15]
[ 16]
[ 17]
[ 18]
[ 19]
[ 20]]
[[ 28]
[ 29]
[ 30]
[ 31]
[ 32]
[ 33]
[ 34]]
[[ 42]
[ 43]
[ 44]
[ 45]
[ 46]
[ 47]
[ 48]]
[[ 56]
[ 57]
[ 58]
[ 59]
[ 60]
[ 61]
[ 62]]
[[ 70]
[ 71]
[ 72]
[ 73]
[ 74]
[ 75]
[ 76]]
[[ 84]
[ 85]
[ 86]
[ 87]
[ 88]
[ 89]
[ 90]]]
[[[ 7]
[ 8]
[ 9]
[ 10]
[ 11]
[ 12]
[ 13]]
[[ 21]
[ 22]
[ 23]
[ 24]
[ 25]
[ 26]
[ 27]]
[[ 35]
[ 36]
[ 37]
[ 38]
[ 39]
[ 40]
[ 41]]
[[ 49]
[ 50]
[ 51]
[ 52]
[ 53]
[ 54]
[ 55]]
[[ 63]
[ 64]
[ 65]
[ 66]
[ 67]
[ 68]
[ 69]]
[[ 77]
[ 78]
[ 79]
[ 80]
[ 81]
[ 82]
[ 83]]
[[ 91]
[ 92]
[ 93]
[ 94]
[ 95]
[ 96]
[ 97]]]
[[[ 98]
[ 99]
[100]
[101]
[102]
[103]
[104]]
[[112]
[113]
[114]
[115]
[116]
[117]
[118]]
[[126]
[127]
[128]
[129]
[130]
[131]
[132]]
[[140]
[141]
[142]
[143]
[144]
[145]
[146]]
[[154]
[155]
[156]
[157]
[158]
[159]
[160]]
[[168]
[169]
[170]
[171]
[172]
[173]
[174]]
[[182]
[183]
[184]
[185]
[186]
[187]
[188]]]
[[[105]
[106]
[107]
[108]
[109]
[110]
[111]]
[[119]
[120]
[121]
[122]
[123]
[124]
[125]]
[[133]
[134]
[135]
[136]
[137]
[138]
[139]]
[[147]
[148]
[149]
[150]
[151]
[152]
[153]]
[[161]
[162]
[163]
[164]
[165]
[166]
[167]]
[[175]
[176]
[177]
[178]
[179]
[180]
[181]]
[[189]
[190]
[191]
[192]
[193]
[194]
[195]]]]
根据重塑后的输出,您可以看到它是一个 4x7x7x1,第一个补丁的值为:[0-7),[14-21],[28-35],[42-49), [56-63], [70-77), [84-91), 对应左上7x7格子
也许你可以进一步解释一下当它不能正常工作时发生了什么?