将行附加到 Pandas DataFrame
Appending Rows to a Pandas DataFrame
我正在尝试从测量计算个人测量设备 (PMD-1208FS) 读取模拟信号,然后将其写入一个文件,并为每次观察设置相应的时间戳。我想每秒附加一次新的观察到这个文件。
PyUniversalLibrary
允许我从设备读取数据,但我一直在尝试弄清楚如何将信息保存到数据框中。此 example 有助于从 PMD 读取数据,但未提供任何数据记录示例。
下面的例子接近于解决这个问题,但是 df.append(pd.DataFrame()
函数没有提供我想要的结果。此函数最终将最新的数据帧附加到先前保存的数据帧的底部,而不仅仅是附加新数据。结果是一个包含许多重复数据帧的数据帧。
这是我的代码:
## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd
## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0
## Create empty lists and a dataframe to fill:
co = [] ## carbon monoxide concentration in ppm
data = [] ## raw analog output between 0-5V
times = [] ## timestamp
df = pd.DataFrame()
## Set filepath:
filename = "~/pmd_data.csv"
while True:
ts = time.time()
DataValue = UL.cbAIn(BoardNum, Chan, Gain)
EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
data.append(EngUnits)
times.append(datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S'))
co.append(ppm)
## This line of code is not providing the desired result:
df = df.append(pd.DataFrame({'co':ppm, 'volts':data, 'datetime':times}))
print(df)
df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
time.sleep(1)
当前输出:
co datetime volts
0 13.8 2017-05-03 15:57:19 1.38
1 13.8 2017-05-03 15:57:19 1.38
2 13.9 2017-05-03 15:57:20 1.39
3 13.8 2017-05-03 15:57:19 1.38
4 13.9 2017-05-03 15:57:20 1.39
5 14.2 2017-05-03 15:57:21 1.42
期望的输出:
co datetime volts
0 13.8 2017-05-03 15:57:19 1.38
1 13.9 2017-05-03 15:57:20 1.39
2 14.2 2017-05-03 15:57:21 1.42
每次进入 while 循环时,您都会为每个字段附加一个带有列表(随时间增长)的数据框。但是您应该添加一个数据框,其中包含一个列表,一次每个字段只有一个元素。请看下面的例子
你基本上是这样做的:
co = [] ## carbon monoxide concentration in ppm
data = [] ## raw analog output between 0-5V
times = [] ## timestamp
df = pd.DataFrame()
for i in range(0,5):
data.append(i)
times.append(i)
co.append(i)
df = df.append(pd.DataFrame({'co':co, 'volts':data, 'datetime':times}))
print df
这导致
co datetime volts
0 0 0 0
0 0 0 0
1 1 1 1
0 0 0 0
1 1 1 1
2 2 2 2
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
但你应该这样做
df = pd.DataFrame()
for i in range(0,5):
df = df.append(pd.DataFrame({'co':[i], 'volts':[i], 'datetime':[i]}))
print df
这导致
co datetime volts
0 0 0 0
0 1 1 1
0 2 2 2
0 3 3 3
0 4 4 4
所以你的代码应该像
## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd
## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0
## Create empty dataframe to fill:
df = pd.DataFrame()
## Set filepath:
filename = "~/pmd_data.csv"
while True:
ts = time.time()
DataValue = UL.cbAIn(BoardNum, Chan, Gain)
EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
times = (datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S'))
df = df.append(pd.DataFrame({'co':[ppm], 'volts':[EngUnits], 'datetime':[times]}))
print(df)
df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
time.sleep(1)
由于您没有专门使用索引,我会保留一个计数器并使用它向现有数据帧添加新行。
我会像这样重写 while
循环
## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd
## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0
## Create empty lists and a dataframe to fill:
df = pd.DataFrame(columns=['co', 'volts', 'datetime'])
## Set filepath:
filename = "~/pmd_data.csv"
counter = 0
while True:
ts = time.time()
DataValue = UL.cbAIn(BoardNum, Chan, Gain)
EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
df.loc[counter] = pd.Series(dict(
co=ppm, volts=EngUnits, datetime=ts
))
## This line of code is not providing the desired result:
counter += 1
df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
time.sleep(1)
如果您只是想追加,那么您不需要带有 .loc 的计数器。您可以将其更改为 df.loc[len(df)] = row 。这将始终在 DataFrame 的末尾写入一个新行。
从此处的 piRSquared 代码更新代码:
## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd
## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0
## Create empty lists and a dataframe to fill:
df = pd.DataFrame(columns=['co', 'volts', 'datetime'])
## Set filepath:
filename = "~/pmd_data.csv"
while True:
ts = time.time()
DataValue = UL.cbAIn(BoardNum, Chan, Gain)
EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
df.loc[len(df)] = pd.Series(dict(
co=ppm, volts=EngUnits, datetime=ts
))
## This line of code is not providing the desired result:
df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
time.sleep(1)
我正在尝试从测量计算个人测量设备 (PMD-1208FS) 读取模拟信号,然后将其写入一个文件,并为每次观察设置相应的时间戳。我想每秒附加一次新的观察到这个文件。
PyUniversalLibrary
允许我从设备读取数据,但我一直在尝试弄清楚如何将信息保存到数据框中。此 example 有助于从 PMD 读取数据,但未提供任何数据记录示例。
下面的例子接近于解决这个问题,但是 df.append(pd.DataFrame()
函数没有提供我想要的结果。此函数最终将最新的数据帧附加到先前保存的数据帧的底部,而不仅仅是附加新数据。结果是一个包含许多重复数据帧的数据帧。
这是我的代码:
## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd
## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0
## Create empty lists and a dataframe to fill:
co = [] ## carbon monoxide concentration in ppm
data = [] ## raw analog output between 0-5V
times = [] ## timestamp
df = pd.DataFrame()
## Set filepath:
filename = "~/pmd_data.csv"
while True:
ts = time.time()
DataValue = UL.cbAIn(BoardNum, Chan, Gain)
EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
data.append(EngUnits)
times.append(datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S'))
co.append(ppm)
## This line of code is not providing the desired result:
df = df.append(pd.DataFrame({'co':ppm, 'volts':data, 'datetime':times}))
print(df)
df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
time.sleep(1)
当前输出:
co datetime volts
0 13.8 2017-05-03 15:57:19 1.38
1 13.8 2017-05-03 15:57:19 1.38
2 13.9 2017-05-03 15:57:20 1.39
3 13.8 2017-05-03 15:57:19 1.38
4 13.9 2017-05-03 15:57:20 1.39
5 14.2 2017-05-03 15:57:21 1.42
期望的输出:
co datetime volts
0 13.8 2017-05-03 15:57:19 1.38
1 13.9 2017-05-03 15:57:20 1.39
2 14.2 2017-05-03 15:57:21 1.42
每次进入 while 循环时,您都会为每个字段附加一个带有列表(随时间增长)的数据框。但是您应该添加一个数据框,其中包含一个列表,一次每个字段只有一个元素。请看下面的例子
你基本上是这样做的:
co = [] ## carbon monoxide concentration in ppm
data = [] ## raw analog output between 0-5V
times = [] ## timestamp
df = pd.DataFrame()
for i in range(0,5):
data.append(i)
times.append(i)
co.append(i)
df = df.append(pd.DataFrame({'co':co, 'volts':data, 'datetime':times}))
print df
这导致
co datetime volts
0 0 0 0
0 0 0 0
1 1 1 1
0 0 0 0
1 1 1 1
2 2 2 2
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
但你应该这样做
df = pd.DataFrame()
for i in range(0,5):
df = df.append(pd.DataFrame({'co':[i], 'volts':[i], 'datetime':[i]}))
print df
这导致
co datetime volts
0 0 0 0
0 1 1 1
0 2 2 2
0 3 3 3
0 4 4 4
所以你的代码应该像
## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd
## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0
## Create empty dataframe to fill:
df = pd.DataFrame()
## Set filepath:
filename = "~/pmd_data.csv"
while True:
ts = time.time()
DataValue = UL.cbAIn(BoardNum, Chan, Gain)
EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
times = (datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S'))
df = df.append(pd.DataFrame({'co':[ppm], 'volts':[EngUnits], 'datetime':[times]}))
print(df)
df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
time.sleep(1)
由于您没有专门使用索引,我会保留一个计数器并使用它向现有数据帧添加新行。
我会像这样重写 while
循环
## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd
## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0
## Create empty lists and a dataframe to fill:
df = pd.DataFrame(columns=['co', 'volts', 'datetime'])
## Set filepath:
filename = "~/pmd_data.csv"
counter = 0
while True:
ts = time.time()
DataValue = UL.cbAIn(BoardNum, Chan, Gain)
EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
df.loc[counter] = pd.Series(dict(
co=ppm, volts=EngUnits, datetime=ts
))
## This line of code is not providing the desired result:
counter += 1
df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
time.sleep(1)
如果您只是想追加,那么您不需要带有 .loc 的计数器。您可以将其更改为 df.loc[len(df)] = row 。这将始终在 DataFrame 的末尾写入一个新行。
从此处的 piRSquared 代码更新代码:
## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd
## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0
## Create empty lists and a dataframe to fill:
df = pd.DataFrame(columns=['co', 'volts', 'datetime'])
## Set filepath:
filename = "~/pmd_data.csv"
while True:
ts = time.time()
DataValue = UL.cbAIn(BoardNum, Chan, Gain)
EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
df.loc[len(df)] = pd.Series(dict(
co=ppm, volts=EngUnits, datetime=ts
))
## This line of code is not providing the desired result:
df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
time.sleep(1)