将 dask 导出到 csv,不带 csv 引号但在列中引号

Export dask to csv without csv quoting but quotes within columns

我正在尝试使用 Dask 准备数十亿条记录。为了稍后将它们导入 influx db,文件需要采用 in line protocol as csv(或 txt,dat,...) 格式,其结构如下:

我需要: ['output-dask-to-csv-0.dat']

weather temp=-6.73,air=963.7,prec=0.0,datetime="2011-01-01 00:00:13" 1293840013000000000  
weather temp=-6.74,air=963.7,prec=0.0,datetime="2011-01-01 00:00:13" 1293840013000000000 
weather air=963.7,datetime="2011-01-01 00:00:22" 1293840022000000000 
weather prec=0.0,datetime="2011-01-01 00:00:32" 1293840032000000000

当我将 dask 导出到 csv 时,我需要去掉开头和结尾 自动出现的引号。同时需要保留datetime的双引号,日期和时间之间空1个space;以及列字段中的逗号分隔条目;列测量和字段之间以及字段和时间戳之间为空 space。

示例代码:

import dask.dataframe as dd
import pandas as pd
import csv

measurement = ["weather", "weather", "weather", "weather"]
fields = ["temp=8.73,air=962.71,prec=4.0", "temp=4.12,air=963.2,prec=30.0", "air=964.21", "prec=0.0"]
datetime = ["2012-01-01 00:00:13", "2012-01-01 00:00:13", "2012-01-01 00:00:22", "2012-01-01 00:00:32"]
timestamp = [1293840013000000000,1293840013000000000,1293840022000000000 ,1293840032000000000]

d = pd.DataFrame(data={"measurement": measurement, "fields":fields, "datetime":datetime,"timestamp":timestamp})
df = dd.from_pandas(d, npartitions=1)

例如,到目前为止还没有解决并输出开头和结尾的引号,以及日期时间周围的双引号:

df['influx_format'] = df['measurement'] + ' ' + df.fields + df.timestamp.astype(str)
df.influx_format.to_csv(filename='output-dask-to-csv-*.dat', sep=" ", escapechar='"', header=False, index=0, decimal='.')
"weather temp=-6.73,air=963.7,prec=0.0,datetime=""2011-01-01 00:00:13"" 1293840013000000000"
"weather temp=-6.74,air=963.7,prec=0.0,datetime=""2011-01-01 00:00:13"" 1293840013000000000"
"weather air=963.7,datetime=""2011-01-01 00:00:22"" 1293840022000000000"
"weather prec=0.0,datetime=""2011-01-01 00:00:32"" 1293840032000000000"

我也无法使用引用=csv.QUOTE_NONE:

df['influx_format'] = df['measurement'] + ' ' + df.fields + df.timestamp.astype(str)
df.influx_format.to_csv(filename='output-dask-to-csv-*.dat', quoting=csv.QUOTE_NONE, quotechar="", sep=" ", escapechar='"', header=False, index=0, decimal='.')
weather" temp=-6.73,air=963.7,prec=0.0,datetime=""2011-01-01" 00:00:13""" 1293840013000000000
weather" temp=-6.74,air=963.7,prec=0.0,datetime=""2011-01-01" 00:00:13""" 1293840013000000000
weather" air=963.7,datetime=""2011-01-01" 00:00:22""" 1293840022000000000
weather" prec=0.0,datetime=""2011-01-01" 00:00:32""" 1293840032000000000

有没有人有想法并且可以帮助我?

下面的代码给了我这个:

weather temp=8.73,air=962.71,prec=4.0,datetime="2012-01-01 00:00:13" 1293840013000000000 weather temp=4.12,air=963.2,prec=30.0,datetime="2012-01-01 00:00:13" 1293840013000000000 weather air=964.21,datetime="2012-01-01 00:00:22" 1293840022000000000 weather prec=0.0,datetime="2012-01-01 00:00:32" 1293840032000000000

df['influx_format'] = df['measurement'] + ' ' + df.fields + ',datetime=\"'+df['datetime'] + '\" ' + df.timestamp.astype(str)
df.influx_format.to_csv(filename='output-dask-to-csv-*.dat', quoting=csv.QUOTE_NONE, quotechar="", sep=" ", escapechar=' ',header=False, index=0, decimal='.')

希望对您有所帮助