df.to_csv() as tab-delim 但逗号冲突
df.to_csv() as tab-delim but commas conflict
我想将 DataFrame 保存为 tab-delimited .csv
df.to_csv('df.csv', index=False, sep ='\t')
然而,第 3 列有一个列表 object,巧合的是逗号:,
.
因此,我的输出 df.csv
有很多列。第一个是 3 个值,由制表符正确分隔。第二个和更多是逗号分割值。
df(正确:3 列):
0 1 \
0 Emissions 305-1~GHG emissions in metric tons of CO2e~Gro...
1 Emissions 305-1~GHG emissions in metric tons of CO2e~Bio...
2 Emissions 305-1~Direct (Scope 1) GHG emissions by gas~CO2
3 Emissions 305-1~Direct (Scope 1) GHG emissions by gas~N20
4 Emissions 305-1~Direct (Scope 1) GHG emissions by gas~HFCs
5 Emissions 305-1~Direct (Scope 1) GHG emissions by gas~PFCs
6 Emissions 305-1~Direct (Scope 1) GHG emissions by gas~SF6
7 Emissions 305-2~GHG Emissions in metric tons of CO2e~Gro...
8 Emissions 305-2~GHG Emissions in metric tons of CO2e~Gro...
9 Emissions 305-2~GHG Emissions in metric tons of CO2e~Tot...
10 Emissions 305-2~GHG Emissions in metric tons of CO2e~Tot...
11 Emissions 103-1~Explanation of the material topic and it...
12 Emissions 103-2~The management approach and its components
13 Emissions 103-3~Evaluation of the management approach
2
0 [2014_2760, 2015_278585, 2016_409886, 2017_972...
1 [2014_299605, 2015_477610, 2016_822657, 2017_8...
2 [2014_444055, 2015_730929, 2016_766490, 2017_8...
3 [2014_510811, 2015_583265, 2016_694522, 2017_7...
4 [2014_162816, 2015_199622, 2016_228775, 2017_3...
5 [2014_61824, 2015_569032, 2016_607814, 2017_77...
6 [2014_60442, 2015_64418, 2016_329338, 2017_784...
7 [2014_53078, 2015_500448, 2016_527776, 2017_61...
8 [2014_165580, 2015_557426, 2016_894641, 2017_9...
9 [2014_60142, 2015_84502, 2016_532996, 2017_893...
10 [2014_71762, 2015_72349, 2016_195351, 2017_624...
11 consumption rate fossil fuels coal oil emissio...
12 how evaluate companys environmental management...
13 evaluation effectiveness companys environmenta...
df.csv
(不正确,技术上我想要 one 列,但对于原始 3 column-values 是 tab-delimited):
简化的模板示例
df:
text | text | ['list', 'object', 'here', 'of', 'any', 'length']
text | text | ['foo', 'bar']
所需的 .CSV [一个文字列,但值由制表符 (->) 分隔]:
| text -> text -> ['list', 'object', 'here', 'of', 'any', 'length'] |
| text -> text -> ['foo', 'bar'] |
单列输出,值由制表符分隔。没有 headers 或索引
如何确保 Pandas 忽略列表 object 的 ,
?
如果我需要提供更多详细信息,请告诉我。
仅供参考,您可以在变量查看器中的 df
上单击“复制值”(每个 IDE 的语义不同)(同样,名称更改取决于 IDE)以我可以复制的方式复制它的数据,但我根据您提供的内容创建了一个示例。
import pandas as pd
import csv
样本 df:
df = pd.DataFrame({'col1': ['Emissions', 'Emissions'], 'col2': ['305-1~GHG emissions in metric tons of CO2e~Gro...', '305-1~GHG emissions in metric tons of CO2e~Bio...'], 'col3': [['2014_2760, 2015_278585, 2016_409886'], ['[2014_299605, 2015_477610, 2016_822657']]})
现在这里的技巧是使用 quoting
参数,根据 docs 是:
> quoting : 来自 csv 模块的可选常量
Defaults to csv.QUOTE_MINIMAL. If you have set a float_format then floats are
converted to strings and thus csv.QUOTE_NONNUMERIC will treat them as
non-numeric.
编辑:
在澄清你的 objective 之后,apply
应该会实现它:
df = df[df.columns].apply(
lambda x: ' -> '.join(x.astype(str)),
axis=1)
保存文件:
df.to_csv('sample.csv', index=False)
输出:
我想将 DataFrame 保存为 tab-delimited .csv
df.to_csv('df.csv', index=False, sep ='\t')
然而,第 3 列有一个列表 object,巧合的是逗号:,
.
因此,我的输出 df.csv
有很多列。第一个是 3 个值,由制表符正确分隔。第二个和更多是逗号分割值。
df(正确:3 列):
0 1 \
0 Emissions 305-1~GHG emissions in metric tons of CO2e~Gro...
1 Emissions 305-1~GHG emissions in metric tons of CO2e~Bio...
2 Emissions 305-1~Direct (Scope 1) GHG emissions by gas~CO2
3 Emissions 305-1~Direct (Scope 1) GHG emissions by gas~N20
4 Emissions 305-1~Direct (Scope 1) GHG emissions by gas~HFCs
5 Emissions 305-1~Direct (Scope 1) GHG emissions by gas~PFCs
6 Emissions 305-1~Direct (Scope 1) GHG emissions by gas~SF6
7 Emissions 305-2~GHG Emissions in metric tons of CO2e~Gro...
8 Emissions 305-2~GHG Emissions in metric tons of CO2e~Gro...
9 Emissions 305-2~GHG Emissions in metric tons of CO2e~Tot...
10 Emissions 305-2~GHG Emissions in metric tons of CO2e~Tot...
11 Emissions 103-1~Explanation of the material topic and it...
12 Emissions 103-2~The management approach and its components
13 Emissions 103-3~Evaluation of the management approach
2
0 [2014_2760, 2015_278585, 2016_409886, 2017_972...
1 [2014_299605, 2015_477610, 2016_822657, 2017_8...
2 [2014_444055, 2015_730929, 2016_766490, 2017_8...
3 [2014_510811, 2015_583265, 2016_694522, 2017_7...
4 [2014_162816, 2015_199622, 2016_228775, 2017_3...
5 [2014_61824, 2015_569032, 2016_607814, 2017_77...
6 [2014_60442, 2015_64418, 2016_329338, 2017_784...
7 [2014_53078, 2015_500448, 2016_527776, 2017_61...
8 [2014_165580, 2015_557426, 2016_894641, 2017_9...
9 [2014_60142, 2015_84502, 2016_532996, 2017_893...
10 [2014_71762, 2015_72349, 2016_195351, 2017_624...
11 consumption rate fossil fuels coal oil emissio...
12 how evaluate companys environmental management...
13 evaluation effectiveness companys environmenta...
df.csv
(不正确,技术上我想要 one 列,但对于原始 3 column-values 是 tab-delimited):
简化的模板示例
df:
text | text | ['list', 'object', 'here', 'of', 'any', 'length']
text | text | ['foo', 'bar']
所需的 .CSV [一个文字列,但值由制表符 (->) 分隔]:
| text -> text -> ['list', 'object', 'here', 'of', 'any', 'length'] |
| text -> text -> ['foo', 'bar'] |
单列输出,值由制表符分隔。没有 headers 或索引
如何确保 Pandas 忽略列表 object 的 ,
?
如果我需要提供更多详细信息,请告诉我。
仅供参考,您可以在变量查看器中的 df
上单击“复制值”(每个 IDE 的语义不同)(同样,名称更改取决于 IDE)以我可以复制的方式复制它的数据,但我根据您提供的内容创建了一个示例。
import pandas as pd
import csv
样本 df:
df = pd.DataFrame({'col1': ['Emissions', 'Emissions'], 'col2': ['305-1~GHG emissions in metric tons of CO2e~Gro...', '305-1~GHG emissions in metric tons of CO2e~Bio...'], 'col3': [['2014_2760, 2015_278585, 2016_409886'], ['[2014_299605, 2015_477610, 2016_822657']]})
现在这里的技巧是使用 quoting
参数,根据 docs 是:
> quoting : 来自 csv 模块的可选常量
Defaults to csv.QUOTE_MINIMAL. If you have set a float_format then floats are converted to strings and thus csv.QUOTE_NONNUMERIC will treat them as non-numeric.
编辑:
在澄清你的 objective 之后,apply
应该会实现它:
df = df[df.columns].apply(
lambda x: ' -> '.join(x.astype(str)),
axis=1)
保存文件:
df.to_csv('sample.csv', index=False)
输出: