从 csv 阿拉伯文文件中提取一列 python

Question

我正在尝试将特定列从阿拉伯文文件提取到另一个文件这是我的代码

# coding=utf-8
import csv
from os import open

file = open('jamid.csv', 'r', encoding='utf-8')
test = csv.reader(file)
f = open('col.txt','w+', 'wb' ,encoding='utf-8')
for row in test:

    if len(row[0].split("\t"))>3 :
         f.write((row[0].split("\t"))[3].encode("utf-8"))

f.close()

文件是这样的：

4   جَوَارِيفُ  جواريف  جرف     اسم 
18  حَرْقى  حرقى    حرق     اسم
24  غَزَواتٌ    غزوات   غزو     اِسْمٌ

我一直在犯同样的错误:

File "col.py", line 5, in <module>  file = open('jamid.csv', 'r', encoding='utf-8')
TypeError: an integer is required (got type str)

Answer 1

您可以尝试使用 Pandas。我正在发布示例代码。

import pandas as pd
df = pd.read_csv("Book1.csv")
# print(df.head(10))
my_col = df['اسم'] #Insert the column name you want to select.
print(my_col)

输出：注意：我希望它采用阿拉伯语编码。

import pandas as pd 
df = pd.read_csv("filename.csv",encoding='utf-8') 
saved_column = df['اسم'] #change it to str type
# f= open("col.txt","w+",encoding='utf-8') 
with open("col3.txt","w+",encoding='utf-8') as f:
    f.write(saved_column)

Answer 2

你可以尝试使用unicodecsv

How to write UTF-8 in a CSV file

# coding=utf-8
import csv
import unicodecsv as csv

file = open('jamid.csv', 'rb')
test = csv.reader(file, delimiter='\t')
f = open('col.txt', 'wb')
for row in test:
    if len(row)>3 :
         f.write(row[3].encode('utf8'))

f.close()

Answer 3

我发现您的代码存在一些问题。首先，您将 open 函数的签名与 os.open 一起使用，但它具有不同的参数。你可以坚持 open。更重要的是，您似乎试图通过在选项卡上再次拆分来修复来自 csv.reader 的行。

我的猜测是您在 row[0] 中看到了整行，因此尝试修复它。但问题是 reader 默认情况下以逗号分隔 - 您需要提供不同的分隔符。这里有点问题，因为您的代码用制表符拆分，但示例显示空格。我在解决方案中使用了空格，但您可以根据需要进行切换。

最后，您尝试在将字符串提供给输出文件对象之前对其进行编码。该对象应该使用正确的编码打开，您应该简单地给它字符串。

# coding=utf-8
import csv

with open('jamid.csv', 'r', newline='', encoding='utf-8') as in_fp:
    with open('col.txt','w', newline='', encoding='utf-8') as out_fp:
        csv.writer(out_fp).writerows(row[3] for row in
            csv.reader(in_fp, delimiter=' ', skipinitialspace=True)
            if len(row) >= 3)

从 csv 阿拉伯文文件中提取一列 python

extract a column from a csv arabic file python

python

csv

arabic

python-3.x