Omnet++ / pandas 单元格（列表）与 pandas 系列（列）中的数据

Question

所以我使用Omnet++，一个离散时间网络模拟器，来模拟不同的网络场景。在某些时候，可以进一步处理 Omnet++ 输出统计信息并将它们存储在 .csv 文件中。

有趣的是，每次 (vectime) 都有一个值 (vecvalue)。这些 vectime/vecvalues 存储在此类 .csv 文件的单个单元格中。当导入到 Pandas Dataframe 时，我得到这样的结果。

In [45]: df1[['module','vectime','vecvalue']]
Out[45]: 
              module                                            vectime                                           vecvalue
237  Tictoc13.tic[1]  [2.542245319062, 3.066965320033, 4.78723506093...  [0.334535581612, 0.390459633837, 0.50391696492...
249  Tictoc13.tic[4]  [2.649303071938, 6.02527384362, 21.42434044990...  [2.649303071938, 1.654927100273, 3.11051622577...
261  Tictoc13.tic[3]  [4.28876656608, 16.104821448604, 19.5989313700...  [2.245250432259, 3.201153958979, 2.39023520069...
277  Tictoc13.tic[2]  [13.884917126016, 21.467263378748, 29.59962616...  [0.411703261805, 0.764708518232, 0.83288346614...
289  Tictoc13.tic[5]  [14.146524815409, 14.349744576545, 24.95022463...  [1.732060647139, 8.66456377103, 2.275388282721...

例如，如果我需要为每个模块绘制每个 vectime/vecvalue，今天我将执行以下操作...

%pylab

def runningAvg(x):
    sigma_x = np.cumsum(x)
    sigma_n = np.arange(1,x.size + 1)
    return  sigma_x / sigma_n

for row in df1.itertuples():
    t = row.vectime
    x = row.vecvalue
    x = runningAvg(x)
    plot(t,x)

...获得这个...

我的问题是：什么是性能最好的：

按原样使用数据，这意味着在每个单元格内使用这些数组，遍历 DF 以绘制每个数组；
将这些数组转换为 pd.Series。在这种情况下，仍然将模块作为索引会更好吗？
将这些数组取消嵌套到 pd.Series 中对我有好处吗？

谢谢！

Answer 1

好吧，我想了想，似乎将 Omnet 数据转换成 pd.Series 可能没有我想象的那么有效。

这是我的两个方法：

1) 按原样使用 Omnet 数据，在 Pandas DF 中列出。

figure(1)

start = datetime.datetime.now()

for row in df1.itertuples():
    t = row.vectime
    x = row.vecvalue
    x = runningAvg(x)
    plot(t,x)

total = (datetime.datetime.now() - start).total_seconds()
print(total)

以上运行时，total为0.026571。

2) 将 Omnet 数据转换为 pd.Series.

为了获得相同的结果，我不得不多次转置序列。

figure(2)

start = datetime.datetime.now()

t = df1.vectime
v = df1.vecvalue
t = t.apply(pd.Series) 
v = v.apply(pd.Series)
t = t.T
v = v.T

sigma_v = np.cumsum(v)
sigma_n = np.arange(1,v.shape[0]+1)
sigma   = sigma_v.T / sigma_n

plot(t,sigma.T)

total = (datetime.datetime.now() - start).total_seconds()
print(total)

对于后者，total是0.57266。

看来我会坚持使用方法 1，遍历不同的行。

Omnet++ / pandas 单元格（列表）与 pandas 系列（列）中的数据

Omnet++ / Data in a pandas cell(list) vs pandas series(column)

python

pandas

performance

omnet++