如何将映射数据转换为字典,其中每个 XY 坐标都包含一个光谱?
How do I convert mapped data to dictionary, where each XY coordinate contains a spectrum?
我有一个二维地图,其中每个像素都包含一个光谱。我想从这种格式转换数据:
X Y Wave Intensity
-34727.180000 -4204.820000 1.484622 139.193512
-34727.180000 -4204.820000 1.484043 120.991280
-34727.180000 -4204.820000 1.483465 125.905304
-34726.180000 -4204.820000 1.483465 131.262970
-34726.180000 -4204.820000 1.482887 122.784081
-34726.180000 -4204.820000 1.482309 129.853088
-34725.180000 -4204.820000 1.483465 129.655670
-34725.180000 -4204.820000 1.482887 119.567032
-34725.180000 -4204.820000 1.482309 126.097000
-34727.180000 -4203.820000 1.463490 124.331985
-34727.180000 -4203.820000 1.462927 138.189377
-34727.180000 -4203.820000 1.462364 127.824867
到字典,其中键是 X、Y 坐标的元组,值是 3×2 numpy 数组。例如:
DICT = {
(-34727.180000, -4204.820000): [[1.484622, 139.193512], [1.484043, 120.991280], [1.483465, 125.905304]],
(-34726.180000, -4204.820000): [[1.482887, 122.784081], [1.482887, 122.784081], [1.482309, 129.853088]],
(-34725.180000, -4204.820000): [[1.483465, 129.655670], [1.482887, 119.567032], [1.482309, 126.097000]],
(-34727.180000, -4203.820000): [[1.463490, 124.331985], [1.462927, 138.189377], [1.462927, 138.189377]]}
这个例子被简化了;我的实际地图包含四个以上的像素(X,Y 坐标),并且每个坐标有 512 个 Wave-Intensity 对。我希望解决方案可以从一个四像素地图推广到一个 400 像素地图,并且每个数组从一个 3×2 numpy 数组推广到一个 512×2 numpy 数组。
最终目标是获取每个坐标的 Wave-Intensity 对,将它们拟合到高斯分布,找到该分布的(最大)振幅,并绘制每个 X、Y 坐标的最大值。这部分问题不需要包含在解决方案中,但是如果有人为这部分问题添加解决方案,那就太好了!
我对不涉及字典的方法持开放态度(例如 4D numpy 数组),但目前我看不到其他方法。随意推荐一种替代方法。目前,我正在使用 pandas
:
以原始格式导入数据
import pandas as pd
IN_PATH = r'PATH_TO_FILE'
FNAME = r'\FILENAME.txt'
data = pd.read_csv(IN_PATH+FNAME, sep='\t', skiprows=1)
data.columns = ["X", "Y", "Wave", "Intensity"]
提前致谢!
您可以简单地遍历数据框。但请注意,在您的示例数据中,前几个条目具有相同的 X 和 Y,因此字典条目将被覆盖。
d = {}
for ix, row in df.iterrows():
d[(row['X'], row['Y'])] = [row[a] for a in row.keys() if not a=='X' and not a=='Y']
编辑:
在同一个键下存储一个像素的所有数据:
d = {}
for ix, row in df.iterrows():
entry = [row[a] for a in row.keys() if not a=='X' and not a=='Y']
x,y = row['X'], row['Y']
if d.get((x,y)):
d[(x,y)] += [entry]
else:
d[(x,y)] = [entry]
第一个pandas.DataFrame.set_index
the coordinates, pandas.DataFrame.agg
list
along axis=1
, pandas.DataFrame.groupby
the indices, then pandas.DataFrame.groupby.agg
to list, and convert pandas.Series.to_dict
:
>>> df.set_index(['X', 'Y']).agg(list, 1).groupby(level=(0,1)).agg(list).to_dict()
{(-34727.18, -4204.82): [[1.484622, 139.193512],
[1.484043, 120.99128],
[1.483465, 125.905304]],
(-34727.18, -4203.82): [[1.46349, 124.331985],
[1.462927, 138.189377],
[1.462364, 127.824867]],
(-34726.18, -4204.82): [[1.483465, 131.26297],
[1.482887, 122.784081],
[1.482309, 129.853088]],
(-34725.18, -4204.82): [[1.483465, 129.65567],
[1.482887, 119.567032],
[1.482309, 126.097]]}
这将在list
中给出结果,如果你想要数组,你可以pandas.Series.transform
到numpy.array
:
>>> df.set_index(['X', 'Y']).agg(list, 1).groupby(level=(0,1)).agg(list).transform(np.array).to_dict()
{(-34727.18, -4204.82): array([[ 1.484622, 139.193512],
[ 1.484043, 120.99128 ],
[ 1.483465, 125.905304]]),
(-34727.18, -4203.82): array([[ 1.46349 , 124.331985],
[ 1.462927, 138.189377],
[ 1.462364, 127.824867]]),
(-34726.18, -4204.82): array([[ 1.483465, 131.26297 ],
[ 1.482887, 122.784081],
[ 1.482309, 129.853088]]),
(-34725.18, -4204.82): array([[ 1.483465, 129.65567 ],
[ 1.482887, 119.567032],
[ 1.482309, 126.097 ]])}
我有一个二维地图,其中每个像素都包含一个光谱。我想从这种格式转换数据:
X Y Wave Intensity
-34727.180000 -4204.820000 1.484622 139.193512
-34727.180000 -4204.820000 1.484043 120.991280
-34727.180000 -4204.820000 1.483465 125.905304
-34726.180000 -4204.820000 1.483465 131.262970
-34726.180000 -4204.820000 1.482887 122.784081
-34726.180000 -4204.820000 1.482309 129.853088
-34725.180000 -4204.820000 1.483465 129.655670
-34725.180000 -4204.820000 1.482887 119.567032
-34725.180000 -4204.820000 1.482309 126.097000
-34727.180000 -4203.820000 1.463490 124.331985
-34727.180000 -4203.820000 1.462927 138.189377
-34727.180000 -4203.820000 1.462364 127.824867
到字典,其中键是 X、Y 坐标的元组,值是 3×2 numpy 数组。例如:
DICT = {
(-34727.180000, -4204.820000): [[1.484622, 139.193512], [1.484043, 120.991280], [1.483465, 125.905304]],
(-34726.180000, -4204.820000): [[1.482887, 122.784081], [1.482887, 122.784081], [1.482309, 129.853088]],
(-34725.180000, -4204.820000): [[1.483465, 129.655670], [1.482887, 119.567032], [1.482309, 126.097000]],
(-34727.180000, -4203.820000): [[1.463490, 124.331985], [1.462927, 138.189377], [1.462927, 138.189377]]}
这个例子被简化了;我的实际地图包含四个以上的像素(X,Y 坐标),并且每个坐标有 512 个 Wave-Intensity 对。我希望解决方案可以从一个四像素地图推广到一个 400 像素地图,并且每个数组从一个 3×2 numpy 数组推广到一个 512×2 numpy 数组。
最终目标是获取每个坐标的 Wave-Intensity 对,将它们拟合到高斯分布,找到该分布的(最大)振幅,并绘制每个 X、Y 坐标的最大值。这部分问题不需要包含在解决方案中,但是如果有人为这部分问题添加解决方案,那就太好了!
我对不涉及字典的方法持开放态度(例如 4D numpy 数组),但目前我看不到其他方法。随意推荐一种替代方法。目前,我正在使用 pandas
:
import pandas as pd
IN_PATH = r'PATH_TO_FILE'
FNAME = r'\FILENAME.txt'
data = pd.read_csv(IN_PATH+FNAME, sep='\t', skiprows=1)
data.columns = ["X", "Y", "Wave", "Intensity"]
提前致谢!
您可以简单地遍历数据框。但请注意,在您的示例数据中,前几个条目具有相同的 X 和 Y,因此字典条目将被覆盖。
d = {}
for ix, row in df.iterrows():
d[(row['X'], row['Y'])] = [row[a] for a in row.keys() if not a=='X' and not a=='Y']
编辑: 在同一个键下存储一个像素的所有数据:
d = {}
for ix, row in df.iterrows():
entry = [row[a] for a in row.keys() if not a=='X' and not a=='Y']
x,y = row['X'], row['Y']
if d.get((x,y)):
d[(x,y)] += [entry]
else:
d[(x,y)] = [entry]
第一个pandas.DataFrame.set_index
the coordinates, pandas.DataFrame.agg
list
along axis=1
, pandas.DataFrame.groupby
the indices, then pandas.DataFrame.groupby.agg
to list, and convert pandas.Series.to_dict
:
>>> df.set_index(['X', 'Y']).agg(list, 1).groupby(level=(0,1)).agg(list).to_dict()
{(-34727.18, -4204.82): [[1.484622, 139.193512],
[1.484043, 120.99128],
[1.483465, 125.905304]],
(-34727.18, -4203.82): [[1.46349, 124.331985],
[1.462927, 138.189377],
[1.462364, 127.824867]],
(-34726.18, -4204.82): [[1.483465, 131.26297],
[1.482887, 122.784081],
[1.482309, 129.853088]],
(-34725.18, -4204.82): [[1.483465, 129.65567],
[1.482887, 119.567032],
[1.482309, 126.097]]}
这将在list
中给出结果,如果你想要数组,你可以pandas.Series.transform
到numpy.array
:
>>> df.set_index(['X', 'Y']).agg(list, 1).groupby(level=(0,1)).agg(list).transform(np.array).to_dict()
{(-34727.18, -4204.82): array([[ 1.484622, 139.193512],
[ 1.484043, 120.99128 ],
[ 1.483465, 125.905304]]),
(-34727.18, -4203.82): array([[ 1.46349 , 124.331985],
[ 1.462927, 138.189377],
[ 1.462364, 127.824867]]),
(-34726.18, -4204.82): array([[ 1.483465, 131.26297 ],
[ 1.482887, 122.784081],
[ 1.482309, 129.853088]]),
(-34725.18, -4204.82): array([[ 1.483465, 129.65567 ],
[ 1.482887, 119.567032],
[ 1.482309, 126.097 ]])}