在 Python 中打印 tsv 文件(使用 UTF-8)的内容
Printing contents of a tsv file (with UTF-8) in Python
下面的代码在我命名为 tsv_test.py:
的文件中运行良好
import csv
class ReadUTF8():
def unicode_csv_reader(self, utf8_data, dialect=csv.excel_tab, **kwargs):
csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs)
for row in csv_reader:
yield [unicode(cell, 'utf-8') for cell in row]
def load_deck_data(self):
filename = 'lexicon.tsv'
reader = self.unicode_csv_reader(open(filename))
for field1, field2, field3, field4 in reader:
print field1, field2, field3, field4
ReadUTF8().load_deck_data()
但是当我 copy/paste 它进入我的项目(这是一个 kivy 项目)时,它中断了。代码和错误如下:
class StudyScreenManagement(ScreenManager):
def unicode_csv_reader(self, utf8_data, dialect=csv.excel_tab, **kwargs):
csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs)
for row in csv_reader:
yield [unicode(cell, 'utf-8') for cell in row]
def load_deck_data(self):
filename = 'lexicon.tsv'
reader = self.unicode_csv_reader(open(filename))
for field1, field2, field3, field4 in reader:
print field1, field2, field3, field4
我怀疑这是否相关,但为了以防万一,相关的 .kv 文件:
Button:
text: 'Lexicon'
on_press: app.root.load_deck_data()
输出:
File "/Users/bearnun/code/mingyu/mingyuKivy/mingyu_controllers.py", line 14, in load_deck_data
for field1, field2, field3, field4 in reader:
ValueError: need more than 1 value to unpack
::旁注::
我尝试在这两种情况下都只打印 'field1'。有了这个改变,两者的输出是:
[u'\u4b03', u'\u98d2', u'[sa4]', u'/variant of \u98af|\u98d2[sa4]/']
[u'\u4b20', u'\u4b20', u'[fei1]', u'/old variant of \u970f[fei1]/']
我想要的输出:
䬃 飒 [sa4] /variant of 颯|飒[sa4]/
䬠 䬠 [fei1] /old variant of 霏[fei1]/
[在下方编辑]
lexicon.tsv内容:
䬃 飒 [sa4] /variant of 颯|飒[sa4]/
䬠 䬠 [fei1] /old variant of 霏[fei1]/
显然,我收到的是列表而不是生成器,所以如果在 load_deck_data() 中我更改...:[=12=]
for field1, field2, field3, field4 in reader:
print field1, field2, field3, field4
...到...:[=12=]
for line in reader:
print ''.join(line)
...我的项目运行良好。当然,这在最初有效的小代码片段中不起作用。
我很想知道为什么我在一个地方得到一个生成器,而在另一个地方得到一个列表。 :)
Apparently, I am receiving a list instead of a generator, so if in
load_deck_data() I change:
for field1, field2, field3, field4 in reader:
print field1, field2, field3, field4
to:
for line in reader:
print ''.join(line)
my project works fine.
看看这个例子:
data = [
['a', 'b', 'c', 'd'],
['e'],
]
def mygen(x):
for item in x:
yield item
for line in mygen(data):
print ''.join(line)
--output:--
abcd
e
for col1, col2, col3, col4 in mygen(data):
print col1, col2, col3, col4
--output:--
a b c d
Traceback (most recent call last):
File "1.py", line 13, in <module>
for col1, col2, col3, col4 in mygen(data):
ValueError: need more than 1 value to unpack
在第一个for-in循环中,你在问,"Please retrieve all the elements in the list and join them together."在第二个for-in循环中,你在要求,"Retrieve four elements from the list!"看到区别了吗?在第一种情况下,列表可以包含 0 到 n 个元素,并且不会出现错误。在第二种情况下,列表必须至少有 4 个元素——否则会出错。
I would love to know why I'm getting a generator in one place, but a list in another.
简单。你不是。 csv.reader()
returns 每行的字符串列表,这意味着 your generator function
returns 每次迭代的字符串列表。
我认为您更改了文件中的数据。在一个文件中,您有 tab delimited
数据和 csv.reader()
returns 文件中每一行的四个内容的列表,可以将其解压缩为四个变量;但是您的另一个文件有 non-tab delimited
数据,这导致 csv.reader()
将整行作为一项读取,因此 csv.reader() returns 的字符串列表仅包含一项, 并且一个单项列表不能被分解成四个变量。
I tried just printing 'field1' in both cases. With that change the
output for both is:
[u'\u4b03', u'\u98d2', u'[sa4]', u'/variant of \u98af|\u98d2[sa4]/']
[u'\u4b20', u'\u4b20', u'[fei1]', u'/old variant of \u970f[fei1]/']
而不是 print field1
,如果你这样做 print repr(field1)
我想你会得到:
"[u'\u4b03', u'\u98d2', u'[sa4]', u'/variant of \u98af|\u98d2[sa4]/']"
注意外引号,这意味着您的 tsv 文件在一行中确实包含以下内容:
[䬃, 飒, [sa4], /variant of 颯|飒[sa4]/]
没有制表符分隔任何东西,所以整行看起来像一个列表被作为一个项目读入,因此 csv.reader() returns 一个包含该项目的列表物品。您误以为单个项目是 python 列表,因为当您打印字符串时,python 不显示引号。例如,以下两个打印语句的输出没有区别:
>>> print "[1, 2, 3]"
[1, 2, 3]
>>> print [1, 2, 3]
[1, 2, 3]
print
在其他情况下也可以欺骗你,因为字符串可以包含不可打印的字符,print 的输出不会显示这些字符:
>>> print "hello\bworld"
hellworld
底线是:你永远无法通过查看 print 的输出知道原来的东西是什么。每当您想确切知道原始事物是什么时,请始终使用:
print repr(some_string)
现在,看看结果:
>>> print repr([1, 2, 3])
[1, 2, 3]
>>> print repr('[1, 2, 3]')
'[1, 2, 3]'
>>> print repr('hello\bworld')
'hello\x08world'
输出准确地告诉你原来的东西是什么。
使用以下制表符分隔的 lexicon.tsv
文件:
1 2 3 €
䬃 飒 [sa4] /variant of 颯|飒[sa4]/
单击 Lexicon 按钮后,下面的代码不会导致错误:
from kivy.app import App
from kivy.uix.screenmanager import ScreenManager, Screen
import csv
class StudyScreenManager(ScreenManager):
def unicode_csv_reader(self, utf8_data, dialect=csv.excel_tab, **kwargs):
csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs)
for row in csv_reader:
yield [unicode(cell, 'utf-8') for cell in row]
def load_deck_data(self):
filename = 'lexicon.tsv'
reader = self.unicode_csv_reader(open(filename))
for field1, field2, field3, field4 in reader:
print field1, field2, field3, field4
class HistoryScreen(Screen):
pass
class MathScreen(Screen):
pass
class MyApp(App):
def build(self):
sm = StudyScreenManager()
sm.add_widget(HistoryScreen(name='history'))
sm.add_widget(MathScreen(name='math'))
return sm
MyApp().run()
my.kv:
<HistoryScreen>: #the 'root' of the following widget hierarchy:
BoxLayout:
Button:
text: 'Lexicon'
on_press: app.root.load_deck_data() #self=Button, root=HistoryScreen, app.root=the Widget returned by build()
Button:
text: "Next"
on_press: root.manager.current = "math"
<MathScreen>: #the 'root' of the following widget heirarchy:
BoxLayout:
Button:
text: 'Lexicon'
on_press: app.root.load_deck_data()
Button:
text: 'Previous'
on_press: root.manager.current = "history"
单击 Lexicon 按钮后,这是我在 utf-8 aware terminal window
中看到的输出:
1 2 3 €
䬃 飒 [sa4] /variant of 颯|飒[sa4]/
下面的代码在我命名为 tsv_test.py:
的文件中运行良好import csv
class ReadUTF8():
def unicode_csv_reader(self, utf8_data, dialect=csv.excel_tab, **kwargs):
csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs)
for row in csv_reader:
yield [unicode(cell, 'utf-8') for cell in row]
def load_deck_data(self):
filename = 'lexicon.tsv'
reader = self.unicode_csv_reader(open(filename))
for field1, field2, field3, field4 in reader:
print field1, field2, field3, field4
ReadUTF8().load_deck_data()
但是当我 copy/paste 它进入我的项目(这是一个 kivy 项目)时,它中断了。代码和错误如下:
class StudyScreenManagement(ScreenManager):
def unicode_csv_reader(self, utf8_data, dialect=csv.excel_tab, **kwargs):
csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs)
for row in csv_reader:
yield [unicode(cell, 'utf-8') for cell in row]
def load_deck_data(self):
filename = 'lexicon.tsv'
reader = self.unicode_csv_reader(open(filename))
for field1, field2, field3, field4 in reader:
print field1, field2, field3, field4
我怀疑这是否相关,但为了以防万一,相关的 .kv 文件:
Button:
text: 'Lexicon'
on_press: app.root.load_deck_data()
输出:
File "/Users/bearnun/code/mingyu/mingyuKivy/mingyu_controllers.py", line 14, in load_deck_data
for field1, field2, field3, field4 in reader:
ValueError: need more than 1 value to unpack
::旁注::
我尝试在这两种情况下都只打印 'field1'。有了这个改变,两者的输出是:
[u'\u4b03', u'\u98d2', u'[sa4]', u'/variant of \u98af|\u98d2[sa4]/']
[u'\u4b20', u'\u4b20', u'[fei1]', u'/old variant of \u970f[fei1]/']
我想要的输出:
䬃 飒 [sa4] /variant of 颯|飒[sa4]/
䬠 䬠 [fei1] /old variant of 霏[fei1]/
[在下方编辑]
lexicon.tsv内容:
䬃 飒 [sa4] /variant of 颯|飒[sa4]/
䬠 䬠 [fei1] /old variant of 霏[fei1]/
显然,我收到的是列表而不是生成器,所以如果在 load_deck_data() 中我更改...:[=12=]
for field1, field2, field3, field4 in reader:
print field1, field2, field3, field4
...到...:[=12=]
for line in reader:
print ''.join(line)
...我的项目运行良好。当然,这在最初有效的小代码片段中不起作用。
我很想知道为什么我在一个地方得到一个生成器,而在另一个地方得到一个列表。 :)
Apparently, I am receiving a list instead of a generator, so if in load_deck_data() I change:
for field1, field2, field3, field4 in reader: print field1, field2, field3, field4
to:
for line in reader: print ''.join(line)
my project works fine.
看看这个例子:
data = [
['a', 'b', 'c', 'd'],
['e'],
]
def mygen(x):
for item in x:
yield item
for line in mygen(data):
print ''.join(line)
--output:--
abcd
e
for col1, col2, col3, col4 in mygen(data):
print col1, col2, col3, col4
--output:--
a b c d
Traceback (most recent call last):
File "1.py", line 13, in <module>
for col1, col2, col3, col4 in mygen(data):
ValueError: need more than 1 value to unpack
在第一个for-in循环中,你在问,"Please retrieve all the elements in the list and join them together."在第二个for-in循环中,你在要求,"Retrieve four elements from the list!"看到区别了吗?在第一种情况下,列表可以包含 0 到 n 个元素,并且不会出现错误。在第二种情况下,列表必须至少有 4 个元素——否则会出错。
I would love to know why I'm getting a generator in one place, but a list in another.
简单。你不是。 csv.reader()
returns 每行的字符串列表,这意味着 your generator function
returns 每次迭代的字符串列表。
我认为您更改了文件中的数据。在一个文件中,您有 tab delimited
数据和 csv.reader()
returns 文件中每一行的四个内容的列表,可以将其解压缩为四个变量;但是您的另一个文件有 non-tab delimited
数据,这导致 csv.reader()
将整行作为一项读取,因此 csv.reader() returns 的字符串列表仅包含一项, 并且一个单项列表不能被分解成四个变量。
I tried just printing 'field1' in both cases. With that change the output for both is:
[u'\u4b03', u'\u98d2', u'[sa4]', u'/variant of \u98af|\u98d2[sa4]/'] [u'\u4b20', u'\u4b20', u'[fei1]', u'/old variant of \u970f[fei1]/']
而不是 print field1
,如果你这样做 print repr(field1)
我想你会得到:
"[u'\u4b03', u'\u98d2', u'[sa4]', u'/variant of \u98af|\u98d2[sa4]/']"
注意外引号,这意味着您的 tsv 文件在一行中确实包含以下内容:
[䬃, 飒, [sa4], /variant of 颯|飒[sa4]/]
没有制表符分隔任何东西,所以整行看起来像一个列表被作为一个项目读入,因此 csv.reader() returns 一个包含该项目的列表物品。您误以为单个项目是 python 列表,因为当您打印字符串时,python 不显示引号。例如,以下两个打印语句的输出没有区别:
>>> print "[1, 2, 3]"
[1, 2, 3]
>>> print [1, 2, 3]
[1, 2, 3]
print
在其他情况下也可以欺骗你,因为字符串可以包含不可打印的字符,print 的输出不会显示这些字符:
>>> print "hello\bworld"
hellworld
底线是:你永远无法通过查看 print 的输出知道原来的东西是什么。每当您想确切知道原始事物是什么时,请始终使用:
print repr(some_string)
现在,看看结果:
>>> print repr([1, 2, 3])
[1, 2, 3]
>>> print repr('[1, 2, 3]')
'[1, 2, 3]'
>>> print repr('hello\bworld')
'hello\x08world'
输出准确地告诉你原来的东西是什么。
使用以下制表符分隔的 lexicon.tsv
文件:
1 2 3 €
䬃 飒 [sa4] /variant of 颯|飒[sa4]/
单击 Lexicon 按钮后,下面的代码不会导致错误:
from kivy.app import App
from kivy.uix.screenmanager import ScreenManager, Screen
import csv
class StudyScreenManager(ScreenManager):
def unicode_csv_reader(self, utf8_data, dialect=csv.excel_tab, **kwargs):
csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs)
for row in csv_reader:
yield [unicode(cell, 'utf-8') for cell in row]
def load_deck_data(self):
filename = 'lexicon.tsv'
reader = self.unicode_csv_reader(open(filename))
for field1, field2, field3, field4 in reader:
print field1, field2, field3, field4
class HistoryScreen(Screen):
pass
class MathScreen(Screen):
pass
class MyApp(App):
def build(self):
sm = StudyScreenManager()
sm.add_widget(HistoryScreen(name='history'))
sm.add_widget(MathScreen(name='math'))
return sm
MyApp().run()
my.kv:
<HistoryScreen>: #the 'root' of the following widget hierarchy:
BoxLayout:
Button:
text: 'Lexicon'
on_press: app.root.load_deck_data() #self=Button, root=HistoryScreen, app.root=the Widget returned by build()
Button:
text: "Next"
on_press: root.manager.current = "math"
<MathScreen>: #the 'root' of the following widget heirarchy:
BoxLayout:
Button:
text: 'Lexicon'
on_press: app.root.load_deck_data()
Button:
text: 'Previous'
on_press: root.manager.current = "history"
单击 Lexicon 按钮后,这是我在 utf-8 aware terminal window
中看到的输出:
1 2 3 €
䬃 飒 [sa4] /variant of 颯|飒[sa4]/