python 数据集/合并列表中某些数据点的插值

Question

在 .xlsx 文件中，机器数据的记录方式不适合进一步计算。这意味着我有一个包含切削刀具深度数据的文件。每个深度增量都带有一些进一步的信息，如压力、转速、力等等。

正如您在some datapoints中看到的，深度参数（0.01）的分辨率不够，因为其他参数更新更频繁。所以我想在两个连续的深度数据点之间进行插值。

What is important to know, this effect doesn't occure on each depth. When the cutting tool moves fast, everything is fine.

Here is also an example file.

所以当两个连续深度数据点之间的差异为 0.01 时，我只需要对深度值进行插值

我试过以下方法：

作为数据框打开，重命名，删除 NaN，转换为列表
计算列表中的相同深度并将它们传输到数据帧
计算深度 i 和深度 i-1 之间的 Delta（即到前身），将 NaN 替换为“0”
如果 0.009 < delta 深度 < 0.011，则将 delta 深度除以时间步数 --> 插值深度
空列表列表，子列表的元素数对应于持续时间
将值从插值深度传递到相应的子列表 --> 列表 1
将元素从 delta_depth 转移到子列表 --> Liste 2
合并列表 1 和列表 2
扁平化列表
用数据帧中的插值替换原始深度值

看起来像这样，但是在第 8 点（合并）我没有得到我需要的东西：

import pandas as pd
from itertools import groupby
from itertools import zip_longest
import matplotlib.pyplot as plt
import numpy as np

#open and rename of some columns 
df_raw=pd.read_excel(open('---.xlsx', 'rb'), sheet_name='---')  
df_raw=df_raw.rename(columns={"---"})

#drop NaN 
df_1=df_raw.dropna(subset=['depth'])

#convert to list
li = df_1['depth'].tolist()

#count identical depths in list and transfer them to dataframe
df_count = pd.DataFrame.from_records([[i, len([*group])] for i, group in groupby(li)])
df_count = df_count.rename(columns={0: "depth", 1: "duration"})

#calculate Delta between depth i and depth i-1 (i.e. to the predecessor), replace NaN with "0".
df_count["delta_depth"] = df_count["depth"].diff()
df_count=df_count.fillna(0)

#Divide delta depth by number of time steps if 0.009 < delta depth < 0.011
df_count["inter_depth"] = np.where(np.logical_and(df_count['delta_depth'] > 0.009, df_count['delta_depth'] < 0.011),df_count["delta_depth"] / df_count["duration"],0)

li2=df_count.values.tolist()
li_depth = df_count['depth'].tolist()
li_delta = df_count['delta_depth'].tolist()
li_duration = df_count['duration'].tolist()
li_inter = df_count['inter_depth'].tolist()


#empty List of Lists with the number of elements of the sublist corresponding to the duration
out=[]
for number in li_duration:
  out.append(li_inter[:number])  

#Pass values from interpolated depth to the respective sublists --> Liste 1
out = [[i]*j for i, j in zip(li_inter, [len(j) for j in out])] 

#Transfer elements from delta_depth to sublists --> Liste 2
def extractDigits(lst):
    return list(map(lambda el:[el], lst))            
lst=extractDigits(li_delta)

#Merge list 1 and list 2
list1 = out
list2 = lst
new_list = []
for l1, l2 in zip_longest(list1, list2, fillvalue=[]):
    new_list.append([y if y else x for x, y in zip_longest(l1, l2)])
new_list

合并 first elements of the sublists 原始深度值后是插值。但是子列表应该只包含插值。

现在我有以下问题：

通常有更好的方法来解决这个问题吗？
我如何通过合并解决问题，或者...
...找到一种方法来覆盖子列表中错误的第一个元素

The desired result would look something like this.

任何帮助将不胜感激，因为我在 python 方面非常缺乏经验并且完全卡住了。

Answer 1

我相信有人可以写出更漂亮的东西，但我认为这会很好用：

编辑了一些有点乱的脚本。我认为这会满足您的需求

_list_helper1 = df["Depth [m]"].to_list()
_list_helper1.insert(0, 0)
_list_helper1.insert(0, 0)
_list_helper1 = _list_helper1[:-2]
df["helper1"] = _list_helper1

_list = df["Depth [m]"].to_list()  # grab all depth values
_list.insert(0, 0)  # insert a value at the beginning to offset from original col
_list = _list[0:-1]  # Delete the very last item

df["helper"] = _list  # add the list to a helper col which is now offset
df["delta depth"] = df["Depth [m]"] - df["helper"]  # subtract helper col from original


_id = 0

for i in range(len(df)):
    if df.loc[i, "Depth [m]"] == df.loc[i, "helper"]:
        break_val = df.loc[i, "Depth [m]"]
        break_val_2 = df.loc[i+1, "Depth [m]"]
        if break_val_2 == break_val:
            df.loc[i, "IDcol"] = _id
            df.loc[i+1, "IDcol"] = _id
        else:
            _id += 1


depth = df["IDcol"].to_list()
depth = list(dict.fromkeys(depth))
depth = [x for x in depth if str(x) != 'nan']

increments = []
for i in depth:
    _df = df.copy()
    _df = _df[_df["IDcol"] == i]
    _df.reset_index(inplace=True, drop=True)
    div_by = len(_df)
    increment = _df.loc[0, "helper"] - _df.loc[0, "helper1"]
    _df["delta depth"] = increment / div_by
    _increment = increment / div_by
    base_value = _df.loc[0, "Depth [m]"]

    for y in range(div_by):
        _df.loc[y, "Depth [m]"] = base_value + ((y + 1) * _increment)

    increments.append(_df)

df["IDcol"] = df["IDcol"].fillna("KEEP")
df = df[df["IDcol"] == "KEEP"]
increments.append(df)
df = pd.concat(increments)
df = df.fillna(0)
df = df[["index", "Depth [m]", "delta depth", "IDcol"]]  # and whatever other cols u want

python 数据集/合并列表中某些数据点的插值

python interpolation of some datapoints in dataset / merging lists

python

nested-lists

pandas