df.to_csv() as tab-delim 但逗号冲突

df.to_csv() as tab-delim but commas conflict

我想将 DataFrame 保存为 tab-delimited .csv

df.to_csv('df.csv', index=False, sep ='\t')

然而,第 3 列有一个列表 object,巧合的是逗号:,.

因此,我的输出 df.csv 有很多列。第一个是 3 个值,由制表符正确分隔。第二个和更多是逗号分割值。


df(正确:3 列):

            0                                                  1  \
0   Emissions  305-1~GHG emissions in metric tons of CO2e~Gro...   
1   Emissions  305-1~GHG emissions in metric tons of CO2e~Bio...   
2   Emissions    305-1~Direct (Scope 1) GHG emissions by gas~CO2   
3   Emissions    305-1~Direct (Scope 1) GHG emissions by gas~N20   
4   Emissions   305-1~Direct (Scope 1) GHG emissions by gas~HFCs   
5   Emissions   305-1~Direct (Scope 1) GHG emissions by gas~PFCs   
6   Emissions    305-1~Direct (Scope 1) GHG emissions by gas~SF6   
7   Emissions  305-2~GHG Emissions in metric tons of CO2e~Gro...   
8   Emissions  305-2~GHG Emissions in metric tons of CO2e~Gro...   
9   Emissions  305-2~GHG Emissions in metric tons of CO2e~Tot...   
10  Emissions  305-2~GHG Emissions in metric tons of CO2e~Tot...   
11  Emissions  103-1~Explanation of the material topic and it...   
12  Emissions   103-2~The management approach and its components   
13  Emissions        103-3~Evaluation of the management approach   

                                                    2  
0   [2014_2760, 2015_278585, 2016_409886, 2017_972...  
1   [2014_299605, 2015_477610, 2016_822657, 2017_8...  
2   [2014_444055, 2015_730929, 2016_766490, 2017_8...  
3   [2014_510811, 2015_583265, 2016_694522, 2017_7...  
4   [2014_162816, 2015_199622, 2016_228775, 2017_3...  
5   [2014_61824, 2015_569032, 2016_607814, 2017_77...  
6   [2014_60442, 2015_64418, 2016_329338, 2017_784...  
7   [2014_53078, 2015_500448, 2016_527776, 2017_61...  
8   [2014_165580, 2015_557426, 2016_894641, 2017_9...  
9   [2014_60142, 2015_84502, 2016_532996, 2017_893...  
10  [2014_71762, 2015_72349, 2016_195351, 2017_624...  
11  consumption rate fossil fuels coal oil emissio...  
12  how evaluate companys environmental management...  
13  evaluation effectiveness companys environmenta...  

df.csv(不正确,技术上我想要 one 列,但对于原始 3 column-values 是 tab-delimited):

简化的模板示例

df:

text | text | ['list', 'object', 'here', 'of', 'any', 'length']
text | text | ['foo', 'bar']

所需的 .CSV [一个文字列,但值由制表符 (->) 分隔]:

| text -> text -> ['list', 'object', 'here', 'of', 'any', 'length'] |
| text -> text -> ['foo', 'bar'] |

单列输出,值由制表符分隔。没有 headers 或索引


如何确保 Pandas 忽略列表 object 的 ,

如果我需要提供更多详细信息,请告诉我。

仅供参考,您可以在变量查看器中的 df 上单击“复制值”(每个 IDE 的语义不同)(同样,名称更改取决于 IDE)以我可以复制的方式复制它的数据,但我根据您提供的内容创建了一个示例。

import pandas as pd
import csv

样本 df:

df = pd.DataFrame({'col1': ['Emissions', 'Emissions'], 'col2': ['305-1~GHG emissions in metric tons of CO2e~Gro...', '305-1~GHG emissions in metric tons of CO2e~Bio...'], 'col3': [['2014_2760, 2015_278585, 2016_409886'], ['[2014_299605, 2015_477610, 2016_822657']]})

现在这里的技巧是使用 quoting 参数,根据 docs 是:

> quoting : 来自 csv 模块的可选常量

Defaults to csv.QUOTE_MINIMAL. If you have set a float_format then floats are converted to strings and thus csv.QUOTE_NONNUMERIC will treat them as non-numeric.


编辑:

在澄清你的 objective 之后,apply 应该会实现它:

df = df[df.columns].apply(
    lambda x: ' -> '.join(x.astype(str)),
    axis=1)

保存文件:

df.to_csv('sample.csv', index=False)

输出: