分析列的高度差异并选择 Python 中的最大差异
Analysing height difference from columns and selecting max difference in Python
我有一个 .csv 文件,其中包含横断面 (.csv file here) 的 x y 数据。
该文件可以包含几十个横断面(仅示例 4 个)。
我想计算每个样带的海拔变化,然后 select 具有最高海拔变化的样带。
x y lines
0 3.444 1
0.009 3.445 1
0.180 3.449 1
0.027 3.449 1
...
0 2.115 2
0.008 2.115 2
0.017 2.115 2
0.027 2.116 2
我尝试用 pandas.dataframe.diff 计算变化,但我无法 select 由此计算出最高海拔变化。
更新:我找到了一种计算 1 个样带高度差的方法。现在的目标是通过不同的其他横断面循环此脚本,并让它 select 具有最高差异的横断面。不确定如何从此创建循环...
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.signal import savgol_filter, find_peaks, find_peaks_cwt
from pandas import read_csv
import csv
df = pd.read_csv('transect4.csv', delimiter=',', header=None, names=['x', 'y', 'lines'])
df_1 = df ['lines'] == 1
df1 = df[df_1]
plt.plot(df1['x'], df1['y'], label='Original Topography')
#apply a Savitzky-Golay filter
smooth = savgol_filter(df1.y.values, window_length = 351, polyorder = 5)
#find the maximums
peaks_idx_max, _ = find_peaks(smooth, prominence = 0.01)
#reciprocal, so mins will become max
smooth_rec = 1/smooth
#find the mins now
peaks_idx_mins, _ = find_peaks(smooth_rec, prominence = 0.01)
plt.xlabel('Distance')
plt.ylabel('Height')
plt.plot(df1['x'], smooth, label='Smoothed Topography')
#plot them
plt.scatter(df1.x.values[peaks_idx_max], smooth[peaks_idx_max], s = 55,
c = 'green', label = 'Local Max Cusp')
plt.scatter(df1.x.values[peaks_idx_mins], smooth[peaks_idx_mins], s = 55,
c = 'black', label = 'Local Min Cusp')
plt.legend(loc='upper left')
plt.show()
#Export to csv
df['Cusp_max']=False
df['Cusp_min']=False
df.loc[df1.x[peaks_idx_max].index, 'Cusp_max']=True
df.loc[df1.x[peaks_idx_mins].index, 'Cusp_min']=True
data=df[df['Cusp_max'] | df['Cusp_min']]
data.to_csv(r'Cusp_total.csv')
#Calculate height difference
my_data=pd.read_csv('Cusp_total.csv', delimiter=',', header=0, names=['ID', 'x', 'y', 'lines'])
df_1 = df ['lines'] == 1
df1 = df[df_1]
df1_diff=pd.DataFrame(my_data)
df1_diff['Diff_Cusps']=df1_diff['y'].diff(-1)
#Only use positive numbers for average
df1_pos = df_diff[df_diff['Diff_Cusps'] > 0]
print("Average Height Difference: ", (df1_pos['Diff_Cusps'].mean()), "m")
理想情况下,脚本会 select .csv 文件中未知数量的横断面改变最高海拔的横断面,然后将其导出到新的 .csv 文件。
您需要 groupby
按列 lines
。
不确定这是否是您所说的高程变化的意思,但这给出了每个组的高程差异 (max(y) - min(y)),其中组由共享相同值的所有行组成'line'每组代表一个这样的值。这应该可以帮助您解决逻辑中缺少的内容(抱歉,不能花更多时间)。
frame = pd.read_csv('transect4.csv', header=None, names=['x', 'y', 'lines'])
groups = frame.groupby('lines')
groups['y'].max() - groups['y'].min()
# Should give you max elevations of each group.
我有一个 .csv 文件,其中包含横断面 (.csv file here) 的 x y 数据。 该文件可以包含几十个横断面(仅示例 4 个)。
我想计算每个样带的海拔变化,然后 select 具有最高海拔变化的样带。
x y lines
0 3.444 1
0.009 3.445 1
0.180 3.449 1
0.027 3.449 1
...
0 2.115 2
0.008 2.115 2
0.017 2.115 2
0.027 2.116 2
我尝试用 pandas.dataframe.diff 计算变化,但我无法 select 由此计算出最高海拔变化。
更新:我找到了一种计算 1 个样带高度差的方法。现在的目标是通过不同的其他横断面循环此脚本,并让它 select 具有最高差异的横断面。不确定如何从此创建循环...
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.signal import savgol_filter, find_peaks, find_peaks_cwt
from pandas import read_csv
import csv
df = pd.read_csv('transect4.csv', delimiter=',', header=None, names=['x', 'y', 'lines'])
df_1 = df ['lines'] == 1
df1 = df[df_1]
plt.plot(df1['x'], df1['y'], label='Original Topography')
#apply a Savitzky-Golay filter
smooth = savgol_filter(df1.y.values, window_length = 351, polyorder = 5)
#find the maximums
peaks_idx_max, _ = find_peaks(smooth, prominence = 0.01)
#reciprocal, so mins will become max
smooth_rec = 1/smooth
#find the mins now
peaks_idx_mins, _ = find_peaks(smooth_rec, prominence = 0.01)
plt.xlabel('Distance')
plt.ylabel('Height')
plt.plot(df1['x'], smooth, label='Smoothed Topography')
#plot them
plt.scatter(df1.x.values[peaks_idx_max], smooth[peaks_idx_max], s = 55,
c = 'green', label = 'Local Max Cusp')
plt.scatter(df1.x.values[peaks_idx_mins], smooth[peaks_idx_mins], s = 55,
c = 'black', label = 'Local Min Cusp')
plt.legend(loc='upper left')
plt.show()
#Export to csv
df['Cusp_max']=False
df['Cusp_min']=False
df.loc[df1.x[peaks_idx_max].index, 'Cusp_max']=True
df.loc[df1.x[peaks_idx_mins].index, 'Cusp_min']=True
data=df[df['Cusp_max'] | df['Cusp_min']]
data.to_csv(r'Cusp_total.csv')
#Calculate height difference
my_data=pd.read_csv('Cusp_total.csv', delimiter=',', header=0, names=['ID', 'x', 'y', 'lines'])
df_1 = df ['lines'] == 1
df1 = df[df_1]
df1_diff=pd.DataFrame(my_data)
df1_diff['Diff_Cusps']=df1_diff['y'].diff(-1)
#Only use positive numbers for average
df1_pos = df_diff[df_diff['Diff_Cusps'] > 0]
print("Average Height Difference: ", (df1_pos['Diff_Cusps'].mean()), "m")
理想情况下,脚本会 select .csv 文件中未知数量的横断面改变最高海拔的横断面,然后将其导出到新的 .csv 文件。
您需要 groupby
按列 lines
。
不确定这是否是您所说的高程变化的意思,但这给出了每个组的高程差异 (max(y) - min(y)),其中组由共享相同值的所有行组成'line'每组代表一个这样的值。这应该可以帮助您解决逻辑中缺少的内容(抱歉,不能花更多时间)。
frame = pd.read_csv('transect4.csv', header=None, names=['x', 'y', 'lines'])
groups = frame.groupby('lines')
groups['y'].max() - groups['y'].min()
# Should give you max elevations of each group.