在 rasterio 中索引 numpy 数组和打开文件之间的权衡

Question

当使用 rasterio 时，我可以通过以下任一方式获取光栅的单个波段：

import rasterio
import numpy as np

dataset = rasterio.open('filepath')

# note that if you have the full dataset read in with image = dataset.read() you can do:
image = dataset.read()
print(image.shape)
red_band = image[2, :, :] # this 
print(red_band.shape)

# which is equal to simply doing
red_band_read = dataset.read(3)
print(red_band_read.shape)

if np.array_equal(red_band_read, red_band):
    print('They are the same.')

它会打印出：

(8, 250, 250)
(250, 250)
(250, 250)
They are the same.

但我很好奇 'better' 是哪个？我假设索引到 numpy 数组比从文件中读取要快得多，但是打开其中一些大型卫星图像会占用大量内存。有什么好的理由可以做其中之一吗？

Answer 1

您可以尝试对每种方法计时，看看是否有区别！

如果你只需要红色波段的数据，我肯定会使用后一种方法，而不是将所有波段读取到内存中，然后从更大的数组中切掉红色波段。

同理，如果您已经知道要查看的数据子集，则可以使用rasterio windowed reading and writing进一步减少内存消耗：

在 rasterio 中索引 numpy 数组和打开文件之间的权衡

Tradeoffs between indexing numpy array and opening file in rasterio

python

numpy

gdal

satellite-image

rasterio