我们应该如何在 OOP 中编写必须读取文件和多个库的代码?

How should we write a code in OOP that has to read a file and several libraries?

我需要以 OOP 格式编写代码。为此,我看了很少关于什么是 OOP 的视频,所有我能找到的示例,通过这些示例我无法了解如何将我的代码转换为 OOP。 所有,我能弄清楚的是我必须创建一个 class 假设 class 插值,然后以某种方式定义实际的,前向填充,后向填充,线性和立方体。 但我不知道我到底要写什么 说,

def forwardfill (?,?):
    ?? ( should I simple copy the stuff here?)

当有人向您询问有关 OOP 的问题时,他可能希望您的代码采用 class 方法。 为什么好呢?您将有一个可重复使用的 class 用于可视化,只需输入不同的 CSV 路径和列名即可。

首先,我做了一个小df用于进一步测试并保存:

days = pd.date_range('1/1/2000', periods=8, freq='D')
df = {'price': [10, 11, 9, 13, 14, 18, 17, 19],
      'volume': [50, 60, 40, 100, 50, 100, 40, 50]}
df = pd.DataFrame(df)
df['date'] = days
print(df)
df.to_csv('some_df.csv', index=False)

看起来像这样:

   price  volume       date
0     10      50 2000-01-01
1     11      60 2000-01-02
2      9      40 2000-01-03
3     13     100 2000-01-04
4     14      50 2000-01-05
5     18     100 2000-01-06
6     17      40 2000-01-07
7     19      50 2000-01-08

然后我可以使用我的 class:

class TimeSeriesOOP:
    def __init__(self, path_to_csv, date_column=None, index_column=None):
        self.df = self.csv_reader(path_to_csv, date_column=date_column, index_column=index_column)
        self.process_dataframe( )
        self.df_ffill = self.df.ffill( )  # df.ffill-pandas func to forward fill missing values
        self.df_bfill = self.df.bfill( )  # df.ffill-pandas func to backward fill missing values

    @staticmethod
    def csv_reader(path_to_csv, date_column=None, index_column=None):
        dataframe = pd.read_csv(path_to_csv, parse_dates=[date_column],
                                index_col=index_column)
        return dataframe

    def process_dataframe(self):  # make separate func if you need more processing
        self.df = self.df.resample('15min').mean( )

    def make_interpolations(self, column_of_interest):
        # 4. Linear Interpolation ------------------
        self.df['rownum'] = np.arange(self.df.shape[0])  # df.shape[0]-gives number of row count
        df_nona = self.df.dropna(subset=[column_of_interest])  # df.dropna- Remove missing values.
        f = interp1d(df_nona['rownum'], df_nona[column_of_interest])
        self.df['linear_fill'] = f(self.df['rownum'])

        # 5. Cubic Interpolation --------------------
        f2 = interp1d(df_nona['rownum'], df_nona[column_of_interest], kind='cubic')
        self.df['cubic_fill'] = f2(self.df['rownum'])

    def draw_all(self, column_of_interest):
        self.make_interpolations(column_of_interest=column_of_interest)

        fig, axes = plt.subplots(5, 1, sharex=True, figsize=(20, 20))
        plt.rcParams.update({'xtick.bottom': False})
        error = 0

        # 1. Actual -------------------------------
        self.df[column_of_interest].plot(title='Actual', ax=axes[0], label='Actual', color='green', style=".-")

        # 2. Forward Fill --------------------------
        self.df_ffill[column_of_interest].plot(title='Forward Fill (MSE: ' + str(error) + ")", ax=axes[1],
                                               label='Forward Fill', style=".-")

        # 3. Backward Fill -------------------------
        self.df_bfill[column_of_interest].plot(title="Backward Fill (MSE: " + str(error) + ")", ax=axes[2],
                                               label='Back Fill',
                                               color='purple', style=".-")

        # 4. Linear Interpolation ------------------
        self.df['linear_fill'].plot(title="Linear Fill (MSE: " + str(error) + ")", ax=axes[3], label='Cubic Fill',
                                    color='red',
                                    style=".-")

        # 5. Cubic Interpolation --------------------
        self.df['cubic_fill'].plot(title="Cubic Fill (MSE: " + str(error) + ")", ax=axes[4], label='Cubic Fill',
                                   color='deeppink',
                                   style=".-")
        plt.show( )

我们在逻辑上将所有进程分成不同的方法:

  • 读取 CSV
  • 处​​理它
  • 进行插值
  • 画出一切

如何使用?

time_series_visualiser = TimeSeriesOOP('some_df.csv', date_column='date', index_column='date')
col_of_interest = 'price'
time_series_visualiser.draw_all(column_of_interest=col_of_interest)

You can see output plot here

所以,我只是将我们创建的 class 与示例数据一起使用。但是您可以使用其他名称并根据需要重复使用整个 class!

试试这些:

time_series_visualiser = TimeSeriesOOP(r'C:\Users\Admin\OOP\TimeSeries\dataset.csv',
date_column='LastUpdated', index_column='LastUpdated')
col_of_interest = 'Occupancy'
time_series_visualiser.draw_all(column_of_interest=col_of_interest)