将行附加到 Pandas DataFrame

Appending Rows to a Pandas DataFrame

我正在尝试从测量计算个人测量设备 (PMD-1208FS) 读取模拟信号,然后将其写入一个文件,并为每次观察设置相应的时间戳。我想每秒附加一次新的观察到这个文件。

PyUniversalLibrary 允许我从设备读取数据,但我一直在尝试弄清楚如何将信息保存到数据框中。此 example 有助于从 PMD 读取数据,但未提供任何数据记录示例。

下面的例子接近于解决这个问题,但是 df.append(pd.DataFrame() 函数没有提供我想要的结果。此函数最终将最新的数据帧附加到先前保存的数据帧的底部,而不仅仅是附加新数据。结果是一个包含许多重复数据帧的数据帧。

这是我的代码:

## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd

## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0

## Create empty lists and a dataframe to fill:
co = [] ## carbon monoxide concentration in ppm
data = [] ## raw analog output between 0-5V
times = [] ## timestamp
df = pd.DataFrame()


## Set filepath:
filename = "~/pmd_data.csv"

while True:
    ts = time.time()
    DataValue = UL.cbAIn(BoardNum, Chan, Gain)
    EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
    ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
    data.append(EngUnits)
    times.append(datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S'))
    co.append(ppm)
    ## This line of code is not providing the desired result:
    df = df.append(pd.DataFrame({'co':ppm, 'volts':data, 'datetime':times})) 
    print(df)
    df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
    time.sleep(1)

当前输出:

    co    datetime    volts
0    13.8    2017-05-03 15:57:19   1.38
1    13.8    2017-05-03 15:57:19   1.38    
2    13.9    2017-05-03 15:57:20   1.39
3    13.8    2017-05-03 15:57:19   1.38
4    13.9    2017-05-03 15:57:20   1.39
5    14.2    2017-05-03 15:57:21   1.42

期望的输出:

    co    datetime    volts
0    13.8    2017-05-03 15:57:19   1.38
1    13.9    2017-05-03 15:57:20   1.39
2    14.2    2017-05-03 15:57:21   1.42

每次进入 while 循环时,您都会为每个字段附加一个带有列表(随时间增长)的数据框。但是您应该添加一个数据框,其中包含一个列表,一次每个字段只有一个元素。请看下面的例子

你基本上是这样做的:

co = [] ## carbon monoxide concentration in ppm
data = [] ## raw analog output between 0-5V
times = [] ## timestamp

df = pd.DataFrame()
for i in range(0,5):
    data.append(i)
    times.append(i)
    co.append(i)
    df = df.append(pd.DataFrame({'co':co, 'volts':data, 'datetime':times}))
print df

这导致

   co  datetime  volts
0   0         0      0
0   0         0      0
1   1         1      1
0   0         0      0
1   1         1      1
2   2         2      2
0   0         0      0
1   1         1      1
2   2         2      2
3   3         3      3
0   0         0      0
1   1         1      1
2   2         2      2
3   3         3      3
4   4         4      4

但你应该这样做

df = pd.DataFrame()
for i in range(0,5):
    df = df.append(pd.DataFrame({'co':[i], 'volts':[i], 'datetime':[i]}))
print df

这导致

   co  datetime  volts
0   0         0      0
0   1         1      1
0   2         2      2
0   3         3      3
0   4         4      4

所以你的代码应该像

## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd

## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0

## Create empty dataframe to fill:
df = pd.DataFrame()

## Set filepath:
filename = "~/pmd_data.csv"

while True:
    ts = time.time()
    DataValue = UL.cbAIn(BoardNum, Chan, Gain)
    EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
    ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
    times = (datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S'))
    df = df.append(pd.DataFrame({'co':[ppm], 'volts':[EngUnits], 'datetime':[times]})) 
    print(df)
    df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
    time.sleep(1)

由于您没有专门使用索引,我会保留一个计数器并使用它向现有数据帧添加新行。

我会像这样重写 while 循环

## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd

## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0

## Create empty lists and a dataframe to fill:
df = pd.DataFrame(columns=['co', 'volts', 'datetime'])

## Set filepath:
filename = "~/pmd_data.csv"

counter = 0
while True:
    ts = time.time()
    DataValue = UL.cbAIn(BoardNum, Chan, Gain)
    EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
    ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
    df.loc[counter] = pd.Series(dict(
            co=ppm, volts=EngUnits, datetime=ts
        ))
    ## This line of code is not providing the desired result:
    counter += 1
    df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
    time.sleep(1)

如果您只是想追加,那么您不需要带有 .loc 的计数器。您可以将其更改为 df.loc[len(df)] = row 。这将始终在 DataFrame 的末尾写入一个新行。

从此处的 piRSquared 代码更新代码:

## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd

## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0

## Create empty lists and a dataframe to fill:
df = pd.DataFrame(columns=['co', 'volts', 'datetime'])

## Set filepath:
filename = "~/pmd_data.csv"

while True:
    ts = time.time()
    DataValue = UL.cbAIn(BoardNum, Chan, Gain)
    EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
    ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
    df.loc[len(df)] = pd.Series(dict(
            co=ppm, volts=EngUnits, datetime=ts
        ))
    ## This line of code is not providing the desired result:
    df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
    time.sleep(1)