使用包含字符串拆分操作的字典理解
Using a dictionary comprehension with an included string split operation
考虑一个微小的属性解析器片段:
testx="""var1 = foo
var2 = bar"""
dd = { l.split('=')[0].strip():l.split('=')[1].strip() for l in testx.split('\n')}
print(dd)
# {'var1': 'foo', 'var2': 'bar'}
行得通,但由于在 l.split('=')[0].strip():l.split('=')[1].strip()
中两次调用“拆分”,所以很难看
.如何将字典理解更改为只需要拆分一次,然后将字典条目构建为:
l[0].strip():l[1].strip()
该重构是否需要嵌套理解或构建单级理解的不同方式?
使用re.findall
:
import re
testx="""var1 = foo
var2 = bar"""
dct = dict(re.findall(r'(\S+)\s*=\s*(\S+)', testx))
print(dct)
# {'var1': 'foo', 'var2': 'bar'}
如果您使用的是 Python >= 3.8,这正是添加赋值表达式的原因:
>>> {(parts:=l.split('='))[0].strip(): parts[1].strip() for l in testx.split("\n")}
{'var1': 'foo', 'var2': 'bar'}
在此之前,您可以执行以下操作:
>>> {key.strip():value.strip() for l in testx.split('\n') for key, value in [l.split("=")]}
{'var1': 'foo', 'var2': 'bar'}
老实说,我发现它更具可读性。
但老实说,这些对我来说仍然很难读。说到底,我觉得你打不过:
>>> result = {}
>>> for l in testx.split("\n"):
... key, value = l.split("=")
... result[key.strip()] = value.strip()
...
>>> result
{'var1': 'foo', 'var2': 'bar'}
编辑
请注意,for <target list> in [<expression>]
习语实际上已在 Python 3.9 中 优化:
https://docs.python.org/3/whatsnew/3.9.html#optimizations
Optimized the idiom for assignment a temporary variable in
comprehensions. Now for y in [expr]
in comprehensions is as fast as a
simple assignment y = expr
. For example:
sums = [s for s in [0] for x in data for s in [s + x]]
Unlike the :=
operator this idiom does not leak a variable to the
outer scope.
比较 Pyhton 3.8 和 Pyhton 3.9 中的字节码,你会发现 Python 3.9 版本中没有嵌套迭代:
Python 3.8:
Python 3.8.1 (default, Jan 8 2020, 16:15:59)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> dis.dis('{k:v for l in "a b|c d".split("|") for k,v in [l.split()]}')
1 0 LOAD_CONST 0 (<code object <dictcomp> at 0x7fdbd6249d40, file "<dis>", line 1>)
2 LOAD_CONST 1 ('<dictcomp>')
4 MAKE_FUNCTION 0
6 LOAD_CONST 2 ('a b|c d')
8 LOAD_METHOD 0 (split)
10 LOAD_CONST 3 ('|')
12 CALL_METHOD 1
14 GET_ITER
16 CALL_FUNCTION 1
18 RETURN_VALUE
Disassembly of <code object <dictcomp> at 0x7fdbd6249d40, file "<dis>", line 1>:
1 0 BUILD_MAP 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 30 (to 36)
6 STORE_FAST 1 (l)
8 LOAD_FAST 1 (l)
10 LOAD_METHOD 0 (split)
12 CALL_METHOD 0
14 BUILD_TUPLE 1
16 GET_ITER
>> 18 FOR_ITER 14 (to 34)
20 UNPACK_SEQUENCE 2
22 STORE_FAST 2 (k)
24 STORE_FAST 3 (v)
26 LOAD_FAST 2 (k)
28 LOAD_FAST 3 (v)
30 MAP_ADD 3
32 JUMP_ABSOLUTE 18
>> 34 JUMP_ABSOLUTE 4
>> 36 RETURN_VALUE
与Python 3.9:
Python 3.9.0 | packaged by conda-forge | (default, Oct 14 2020, 22:56:29)
[Clang 10.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> dis.dis('{k:v for l in "a b|c d".split("|") for k,v in [l.split()]}')
1 0 LOAD_CONST 0 (<code object <dictcomp> at 0x7fb3587d1870, file "<dis>", line 1>)
2 LOAD_CONST 1 ('<dictcomp>')
4 MAKE_FUNCTION 0
6 LOAD_CONST 2 ('a b|c d')
8 LOAD_METHOD 0 (split)
10 LOAD_CONST 3 ('|')
12 CALL_METHOD 1
14 GET_ITER
16 CALL_FUNCTION 1
18 RETURN_VALUE
Disassembly of <code object <dictcomp> at 0x7fb3587d1870, file "<dis>", line 1>:
1 0 BUILD_MAP 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 22 (to 28)
6 STORE_FAST 1 (l)
8 LOAD_FAST 1 (l)
10 LOAD_METHOD 0 (split)
12 CALL_METHOD 0
14 UNPACK_SEQUENCE 2
16 STORE_FAST 2 (k)
18 STORE_FAST 3 (v)
20 LOAD_FAST 2 (k)
22 LOAD_FAST 3 (v)
24 MAP_ADD 2
26 JUMP_ABSOLUTE 4
>> 28 RETURN_VALUE
考虑一个微小的属性解析器片段:
testx="""var1 = foo
var2 = bar"""
dd = { l.split('=')[0].strip():l.split('=')[1].strip() for l in testx.split('\n')}
print(dd)
# {'var1': 'foo', 'var2': 'bar'}
行得通,但由于在 l.split('=')[0].strip():l.split('=')[1].strip()
中两次调用“拆分”,所以很难看
.如何将字典理解更改为只需要拆分一次,然后将字典条目构建为:
l[0].strip():l[1].strip()
该重构是否需要嵌套理解或构建单级理解的不同方式?
使用re.findall
:
import re
testx="""var1 = foo
var2 = bar"""
dct = dict(re.findall(r'(\S+)\s*=\s*(\S+)', testx))
print(dct)
# {'var1': 'foo', 'var2': 'bar'}
如果您使用的是 Python >= 3.8,这正是添加赋值表达式的原因:
>>> {(parts:=l.split('='))[0].strip(): parts[1].strip() for l in testx.split("\n")}
{'var1': 'foo', 'var2': 'bar'}
在此之前,您可以执行以下操作:
>>> {key.strip():value.strip() for l in testx.split('\n') for key, value in [l.split("=")]}
{'var1': 'foo', 'var2': 'bar'}
老实说,我发现它更具可读性。
但老实说,这些对我来说仍然很难读。说到底,我觉得你打不过:
>>> result = {}
>>> for l in testx.split("\n"):
... key, value = l.split("=")
... result[key.strip()] = value.strip()
...
>>> result
{'var1': 'foo', 'var2': 'bar'}
编辑
请注意,for <target list> in [<expression>]
习语实际上已在 Python 3.9 中 优化:
https://docs.python.org/3/whatsnew/3.9.html#optimizations
Optimized the idiom for assignment a temporary variable in comprehensions. Now
for y in [expr]
in comprehensions is as fast as a simple assignmenty = expr
. For example:
sums = [s for s in [0] for x in data for s in [s + x]]
Unlike the
:=
operator this idiom does not leak a variable to the outer scope.
比较 Pyhton 3.8 和 Pyhton 3.9 中的字节码,你会发现 Python 3.9 版本中没有嵌套迭代:
Python 3.8:
Python 3.8.1 (default, Jan 8 2020, 16:15:59)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> dis.dis('{k:v for l in "a b|c d".split("|") for k,v in [l.split()]}')
1 0 LOAD_CONST 0 (<code object <dictcomp> at 0x7fdbd6249d40, file "<dis>", line 1>)
2 LOAD_CONST 1 ('<dictcomp>')
4 MAKE_FUNCTION 0
6 LOAD_CONST 2 ('a b|c d')
8 LOAD_METHOD 0 (split)
10 LOAD_CONST 3 ('|')
12 CALL_METHOD 1
14 GET_ITER
16 CALL_FUNCTION 1
18 RETURN_VALUE
Disassembly of <code object <dictcomp> at 0x7fdbd6249d40, file "<dis>", line 1>:
1 0 BUILD_MAP 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 30 (to 36)
6 STORE_FAST 1 (l)
8 LOAD_FAST 1 (l)
10 LOAD_METHOD 0 (split)
12 CALL_METHOD 0
14 BUILD_TUPLE 1
16 GET_ITER
>> 18 FOR_ITER 14 (to 34)
20 UNPACK_SEQUENCE 2
22 STORE_FAST 2 (k)
24 STORE_FAST 3 (v)
26 LOAD_FAST 2 (k)
28 LOAD_FAST 3 (v)
30 MAP_ADD 3
32 JUMP_ABSOLUTE 18
>> 34 JUMP_ABSOLUTE 4
>> 36 RETURN_VALUE
与Python 3.9:
Python 3.9.0 | packaged by conda-forge | (default, Oct 14 2020, 22:56:29)
[Clang 10.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> dis.dis('{k:v for l in "a b|c d".split("|") for k,v in [l.split()]}')
1 0 LOAD_CONST 0 (<code object <dictcomp> at 0x7fb3587d1870, file "<dis>", line 1>)
2 LOAD_CONST 1 ('<dictcomp>')
4 MAKE_FUNCTION 0
6 LOAD_CONST 2 ('a b|c d')
8 LOAD_METHOD 0 (split)
10 LOAD_CONST 3 ('|')
12 CALL_METHOD 1
14 GET_ITER
16 CALL_FUNCTION 1
18 RETURN_VALUE
Disassembly of <code object <dictcomp> at 0x7fb3587d1870, file "<dis>", line 1>:
1 0 BUILD_MAP 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 22 (to 28)
6 STORE_FAST 1 (l)
8 LOAD_FAST 1 (l)
10 LOAD_METHOD 0 (split)
12 CALL_METHOD 0
14 UNPACK_SEQUENCE 2
16 STORE_FAST 2 (k)
18 STORE_FAST 3 (v)
20 LOAD_FAST 2 (k)
22 LOAD_FAST 3 (v)
24 MAP_ADD 2
26 JUMP_ABSOLUTE 4
>> 28 RETURN_VALUE