python 数据集/合并列表中某些数据点的插值
python interpolation of some datapoints in dataset / merging lists
在 .xlsx 文件中,机器数据的记录方式不适合进一步计算。这意味着我有一个包含切削刀具深度数据的文件。每个深度增量都带有一些进一步的信息,如压力、转速、力等等。
正如您在some datapoints中看到的,深度参数(0.01)的分辨率不够,因为其他参数更新更频繁。所以我想在两个连续的深度数据点之间进行插值。
What is important to know, this effect doesn't occure on each depth. When the cutting tool moves fast, everything is fine.
所以当两个连续深度数据点之间的差异为 0.01 时,我只需要对深度值进行插值
我试过以下方法:
- 作为数据框打开,重命名,删除 NaN,转换为列表
- 计算列表中的相同深度并将它们传输到数据帧
- 计算深度 i 和深度 i-1 之间的 Delta(即到前身),将 NaN 替换为“0”
- 如果 0.009 < delta 深度 < 0.011,则将 delta 深度除以时间步数 --> 插值深度
- 空列表列表,子列表的元素数对应于持续时间
- 将值从插值深度传递到相应的子列表 --> 列表 1
- 将元素从 delta_depth 转移到子列表 --> Liste 2
- 合并列表 1 和列表 2
- 扁平化列表
- 用数据帧中的插值替换原始深度值
看起来像这样,但是在第 8 点(合并)我没有得到我需要的东西:
import pandas as pd
from itertools import groupby
from itertools import zip_longest
import matplotlib.pyplot as plt
import numpy as np
#open and rename of some columns
df_raw=pd.read_excel(open('---.xlsx', 'rb'), sheet_name='---')
df_raw=df_raw.rename(columns={"---"})
#drop NaN
df_1=df_raw.dropna(subset=['depth'])
#convert to list
li = df_1['depth'].tolist()
#count identical depths in list and transfer them to dataframe
df_count = pd.DataFrame.from_records([[i, len([*group])] for i, group in groupby(li)])
df_count = df_count.rename(columns={0: "depth", 1: "duration"})
#calculate Delta between depth i and depth i-1 (i.e. to the predecessor), replace NaN with "0".
df_count["delta_depth"] = df_count["depth"].diff()
df_count=df_count.fillna(0)
#Divide delta depth by number of time steps if 0.009 < delta depth < 0.011
df_count["inter_depth"] = np.where(np.logical_and(df_count['delta_depth'] > 0.009, df_count['delta_depth'] < 0.011),df_count["delta_depth"] / df_count["duration"],0)
li2=df_count.values.tolist()
li_depth = df_count['depth'].tolist()
li_delta = df_count['delta_depth'].tolist()
li_duration = df_count['duration'].tolist()
li_inter = df_count['inter_depth'].tolist()
#empty List of Lists with the number of elements of the sublist corresponding to the duration
out=[]
for number in li_duration:
out.append(li_inter[:number])
#Pass values from interpolated depth to the respective sublists --> Liste 1
out = [[i]*j for i, j in zip(li_inter, [len(j) for j in out])]
#Transfer elements from delta_depth to sublists --> Liste 2
def extractDigits(lst):
return list(map(lambda el:[el], lst))
lst=extractDigits(li_delta)
#Merge list 1 and list 2
list1 = out
list2 = lst
new_list = []
for l1, l2 in zip_longest(list1, list2, fillvalue=[]):
new_list.append([y if y else x for x, y in zip_longest(l1, l2)])
new_list
合并 first elements of the sublists 原始深度值后是插值。但是子列表应该只包含插值。
现在我有以下问题:
- 通常有更好的方法来解决这个问题吗?
- 我如何通过合并解决问题,或者...
- ...找到一种方法来覆盖子列表中错误的第一个元素
The desired result would look something like this.
任何帮助将不胜感激,因为我在 python 方面非常缺乏经验并且完全卡住了。
我相信有人可以写出更漂亮的东西,但我认为这会很好用:
编辑了一些有点乱的脚本。我认为这会满足您的需求
_list_helper1 = df["Depth [m]"].to_list()
_list_helper1.insert(0, 0)
_list_helper1.insert(0, 0)
_list_helper1 = _list_helper1[:-2]
df["helper1"] = _list_helper1
_list = df["Depth [m]"].to_list() # grab all depth values
_list.insert(0, 0) # insert a value at the beginning to offset from original col
_list = _list[0:-1] # Delete the very last item
df["helper"] = _list # add the list to a helper col which is now offset
df["delta depth"] = df["Depth [m]"] - df["helper"] # subtract helper col from original
_id = 0
for i in range(len(df)):
if df.loc[i, "Depth [m]"] == df.loc[i, "helper"]:
break_val = df.loc[i, "Depth [m]"]
break_val_2 = df.loc[i+1, "Depth [m]"]
if break_val_2 == break_val:
df.loc[i, "IDcol"] = _id
df.loc[i+1, "IDcol"] = _id
else:
_id += 1
depth = df["IDcol"].to_list()
depth = list(dict.fromkeys(depth))
depth = [x for x in depth if str(x) != 'nan']
increments = []
for i in depth:
_df = df.copy()
_df = _df[_df["IDcol"] == i]
_df.reset_index(inplace=True, drop=True)
div_by = len(_df)
increment = _df.loc[0, "helper"] - _df.loc[0, "helper1"]
_df["delta depth"] = increment / div_by
_increment = increment / div_by
base_value = _df.loc[0, "Depth [m]"]
for y in range(div_by):
_df.loc[y, "Depth [m]"] = base_value + ((y + 1) * _increment)
increments.append(_df)
df["IDcol"] = df["IDcol"].fillna("KEEP")
df = df[df["IDcol"] == "KEEP"]
increments.append(df)
df = pd.concat(increments)
df = df.fillna(0)
df = df[["index", "Depth [m]", "delta depth", "IDcol"]] # and whatever other cols u want
在 .xlsx 文件中,机器数据的记录方式不适合进一步计算。这意味着我有一个包含切削刀具深度数据的文件。每个深度增量都带有一些进一步的信息,如压力、转速、力等等。
正如您在some datapoints中看到的,深度参数(0.01)的分辨率不够,因为其他参数更新更频繁。所以我想在两个连续的深度数据点之间进行插值。
What is important to know, this effect doesn't occure on each depth. When the cutting tool moves fast, everything is fine.
所以当两个连续深度数据点之间的差异为 0.01 时,我只需要对深度值进行插值
我试过以下方法:
- 作为数据框打开,重命名,删除 NaN,转换为列表
- 计算列表中的相同深度并将它们传输到数据帧
- 计算深度 i 和深度 i-1 之间的 Delta(即到前身),将 NaN 替换为“0”
- 如果 0.009 < delta 深度 < 0.011,则将 delta 深度除以时间步数 --> 插值深度
- 空列表列表,子列表的元素数对应于持续时间
- 将值从插值深度传递到相应的子列表 --> 列表 1
- 将元素从 delta_depth 转移到子列表 --> Liste 2
- 合并列表 1 和列表 2
- 扁平化列表
- 用数据帧中的插值替换原始深度值
看起来像这样,但是在第 8 点(合并)我没有得到我需要的东西:
import pandas as pd
from itertools import groupby
from itertools import zip_longest
import matplotlib.pyplot as plt
import numpy as np
#open and rename of some columns
df_raw=pd.read_excel(open('---.xlsx', 'rb'), sheet_name='---')
df_raw=df_raw.rename(columns={"---"})
#drop NaN
df_1=df_raw.dropna(subset=['depth'])
#convert to list
li = df_1['depth'].tolist()
#count identical depths in list and transfer them to dataframe
df_count = pd.DataFrame.from_records([[i, len([*group])] for i, group in groupby(li)])
df_count = df_count.rename(columns={0: "depth", 1: "duration"})
#calculate Delta between depth i and depth i-1 (i.e. to the predecessor), replace NaN with "0".
df_count["delta_depth"] = df_count["depth"].diff()
df_count=df_count.fillna(0)
#Divide delta depth by number of time steps if 0.009 < delta depth < 0.011
df_count["inter_depth"] = np.where(np.logical_and(df_count['delta_depth'] > 0.009, df_count['delta_depth'] < 0.011),df_count["delta_depth"] / df_count["duration"],0)
li2=df_count.values.tolist()
li_depth = df_count['depth'].tolist()
li_delta = df_count['delta_depth'].tolist()
li_duration = df_count['duration'].tolist()
li_inter = df_count['inter_depth'].tolist()
#empty List of Lists with the number of elements of the sublist corresponding to the duration
out=[]
for number in li_duration:
out.append(li_inter[:number])
#Pass values from interpolated depth to the respective sublists --> Liste 1
out = [[i]*j for i, j in zip(li_inter, [len(j) for j in out])]
#Transfer elements from delta_depth to sublists --> Liste 2
def extractDigits(lst):
return list(map(lambda el:[el], lst))
lst=extractDigits(li_delta)
#Merge list 1 and list 2
list1 = out
list2 = lst
new_list = []
for l1, l2 in zip_longest(list1, list2, fillvalue=[]):
new_list.append([y if y else x for x, y in zip_longest(l1, l2)])
new_list
合并 first elements of the sublists 原始深度值后是插值。但是子列表应该只包含插值。
现在我有以下问题:
- 通常有更好的方法来解决这个问题吗?
- 我如何通过合并解决问题,或者...
- ...找到一种方法来覆盖子列表中错误的第一个元素
The desired result would look something like this.
任何帮助将不胜感激,因为我在 python 方面非常缺乏经验并且完全卡住了。
我相信有人可以写出更漂亮的东西,但我认为这会很好用:
编辑了一些有点乱的脚本。我认为这会满足您的需求
_list_helper1 = df["Depth [m]"].to_list()
_list_helper1.insert(0, 0)
_list_helper1.insert(0, 0)
_list_helper1 = _list_helper1[:-2]
df["helper1"] = _list_helper1
_list = df["Depth [m]"].to_list() # grab all depth values
_list.insert(0, 0) # insert a value at the beginning to offset from original col
_list = _list[0:-1] # Delete the very last item
df["helper"] = _list # add the list to a helper col which is now offset
df["delta depth"] = df["Depth [m]"] - df["helper"] # subtract helper col from original
_id = 0
for i in range(len(df)):
if df.loc[i, "Depth [m]"] == df.loc[i, "helper"]:
break_val = df.loc[i, "Depth [m]"]
break_val_2 = df.loc[i+1, "Depth [m]"]
if break_val_2 == break_val:
df.loc[i, "IDcol"] = _id
df.loc[i+1, "IDcol"] = _id
else:
_id += 1
depth = df["IDcol"].to_list()
depth = list(dict.fromkeys(depth))
depth = [x for x in depth if str(x) != 'nan']
increments = []
for i in depth:
_df = df.copy()
_df = _df[_df["IDcol"] == i]
_df.reset_index(inplace=True, drop=True)
div_by = len(_df)
increment = _df.loc[0, "helper"] - _df.loc[0, "helper1"]
_df["delta depth"] = increment / div_by
_increment = increment / div_by
base_value = _df.loc[0, "Depth [m]"]
for y in range(div_by):
_df.loc[y, "Depth [m]"] = base_value + ((y + 1) * _increment)
increments.append(_df)
df["IDcol"] = df["IDcol"].fillna("KEEP")
df = df[df["IDcol"] == "KEEP"]
increments.append(df)
df = pd.concat(increments)
df = df.fillna(0)
df = df[["index", "Depth [m]", "delta depth", "IDcol"]] # and whatever other cols u want