比较两个电子名片
Compare two vcards
我有两张电子名片:
vcard1 = "BEGIN:VCARD
VERSION:3.0
N;CHARSET=UTF-8:Name;;;;
TEL:0005555000
END:VCARD"
vcard2 = "BEGIN:VCARD
VERSION:3.0
N;CHARSET=UTF-8:Name;;;;
TEL:0005555000
EMAIL;CHARSET=UTF-8:my_email@email.com
END:VCARD"
如您所见,唯一的区别是第二张 vcard 有一个额外的属性,即 EMAIL?使用代码可以将这两个 vcards 视为相等吗?
import vobject
print(vobject.readOne(vcard1).serialize()==vobject.readOne(vcard2).serialize())
解决方案
任何比较的第一条规则是定义比较的基础。您甚至可以比较苹果和橙子,前提是您正在寻找可以比较的数量:例如 “多少个苹果与橙子” 或“5 个苹果与 5 个苹果的重量” -橙子”。重点是比较基础的定义必须明确。
Note: I will use the data from the Dummy Data
section below.
将此概念扩展到您的用例,您可以将 vcards
与每个字段进行比较,然后还可以与所有字段进行比较。例如,我已经向您展示了三种比较它们的方法:
Example A1
:比较只有vcard1
和vcard2
之间的common个字段。
Example A2
: 比较 all fileds between vcard1
and vcard2
.
Example A3
:比较仅在vcard1
和vcard2
之间常见的用户指定字段.
显然,在这种情况下,如果比较 vcard1
和 vcard2
的序列化版本,则会 return False
因为这两个 vcards 的内容不同.
vc1.serialize()==vc2.serialize() # False
例子
在每种情况下 (A1, A2, A3
),自定义函数 compare_vcards()
return 有两件事:
match
:一个dict
,给出每个字段级别的匹配项
summary
:dict
,给出聚合的绝对匹配(如果所有字段都匹配)和相对(比例:[0,1]
)匹配(适合部分匹配)。
But you will have to define your own business logic to determine what you consider as a match and what is not. What I have shown here should help you get started though.
## Example - A1
# Compare ONLY COMMON fields b/w vc1 and vc2
match, summary = compare_vcards(vc1, vc2, mode='common')
print(f'match: \t{match}')
print(f'summary: \t{summary}')
## Output
# match: {'n': True, 'tel': True, 'version': True}
# summary: {'abs_match': True, 'rel_match': 1.0}
## Example - A2
# Compare ALL fields b/w vc1 and vc2
match, summary = compare_vcards(vc1, vc2, mode='all')
print(f'match: \t{match}')
print(f'summary: \t{summary}')
## Output
# match: {'tel': True, 'email': False, 'n': True, 'version': True}
# summary: {'abs_match': False, 'rel_match': 0.75}
## Example - A3
# Compare ONLY COMMON USER-SPECIFIED fields b/w vc1 and vc2
match, summary = compare_vcards(vc1, vc2, fields=['email', 'n', 'tel'])
print(f'match: \t{match}')
print(f'summary: \t{summary}')
## Output
# match: {'email': False, 'n': True, 'tel': True}
# summary: {'abs_match': False, 'rel_match': 0.6666666666666666}
代码
def get_fields(vc1, vc2, mode='common'):
if mode=='common':
fields = set(vc1.sortChildKeys()).intersection(set(vc2.sortChildKeys()))
else:
# mode = 'all'
fields = set(vc1.sortChildKeys()).union(set(vc2.sortChildKeys()))
return fields
def compare_vcards(vc1, vc2, fields=None, mode='common'):
if fields is None:
fields = get_fields(vc1, vc2, mode=mode)
match = dict(
(field, str(vc1.getChildValue(field)).strip()==str(vc2.getChildValue(field)).strip())
for field in fields
)
summary = {
'abs_match': all(match.values()),
'rel_match': sum(match.values()) / len(match)
}
return match, summary
虚拟数据
vcard1 = """
BEGIN:VCARD
VERSION:3.0
N;CHARSET=UTF-8:Name;;;;
TEL:0005555000
END:VCARD
"""
vcard2 = """
BEGIN:VCARD
VERSION:3.0
N;CHARSET=UTF-8:Name;;;;
TEL:0005555000
EMAIL;CHARSET=UTF-8:my_email@email.com
END:VCARD
"""
# pip install vobject
import vobject
vc1 = vobject.readOne(vcard1)
vc2 = vobject.readOne(vcard2)
参考资料
我有两张电子名片:
vcard1 = "BEGIN:VCARD
VERSION:3.0
N;CHARSET=UTF-8:Name;;;;
TEL:0005555000
END:VCARD"
vcard2 = "BEGIN:VCARD
VERSION:3.0
N;CHARSET=UTF-8:Name;;;;
TEL:0005555000
EMAIL;CHARSET=UTF-8:my_email@email.com
END:VCARD"
如您所见,唯一的区别是第二张 vcard 有一个额外的属性,即 EMAIL?使用代码可以将这两个 vcards 视为相等吗?
import vobject
print(vobject.readOne(vcard1).serialize()==vobject.readOne(vcard2).serialize())
解决方案
任何比较的第一条规则是定义比较的基础。您甚至可以比较苹果和橙子,前提是您正在寻找可以比较的数量:例如 “多少个苹果与橙子” 或“5 个苹果与 5 个苹果的重量” -橙子”。重点是比较基础的定义必须明确。
Note: I will use the data from the
Dummy Data
section below.
将此概念扩展到您的用例,您可以将 vcards
与每个字段进行比较,然后还可以与所有字段进行比较。例如,我已经向您展示了三种比较它们的方法:
Example A1
:比较只有vcard1
和vcard2
之间的common个字段。Example A2
: 比较 all fileds betweenvcard1
andvcard2
.Example A3
:比较仅在vcard1
和vcard2
之间常见的用户指定字段.
显然,在这种情况下,如果比较 vcard1
和 vcard2
的序列化版本,则会 return False
因为这两个 vcards 的内容不同.
vc1.serialize()==vc2.serialize() # False
例子
在每种情况下 (A1, A2, A3
),自定义函数 compare_vcards()
return 有两件事:
match
:一个dict
,给出每个字段级别的匹配项summary
:dict
,给出聚合的绝对匹配(如果所有字段都匹配)和相对(比例:[0,1]
)匹配(适合部分匹配)。
But you will have to define your own business logic to determine what you consider as a match and what is not. What I have shown here should help you get started though.
## Example - A1
# Compare ONLY COMMON fields b/w vc1 and vc2
match, summary = compare_vcards(vc1, vc2, mode='common')
print(f'match: \t{match}')
print(f'summary: \t{summary}')
## Output
# match: {'n': True, 'tel': True, 'version': True}
# summary: {'abs_match': True, 'rel_match': 1.0}
## Example - A2
# Compare ALL fields b/w vc1 and vc2
match, summary = compare_vcards(vc1, vc2, mode='all')
print(f'match: \t{match}')
print(f'summary: \t{summary}')
## Output
# match: {'tel': True, 'email': False, 'n': True, 'version': True}
# summary: {'abs_match': False, 'rel_match': 0.75}
## Example - A3
# Compare ONLY COMMON USER-SPECIFIED fields b/w vc1 and vc2
match, summary = compare_vcards(vc1, vc2, fields=['email', 'n', 'tel'])
print(f'match: \t{match}')
print(f'summary: \t{summary}')
## Output
# match: {'email': False, 'n': True, 'tel': True}
# summary: {'abs_match': False, 'rel_match': 0.6666666666666666}
代码
def get_fields(vc1, vc2, mode='common'):
if mode=='common':
fields = set(vc1.sortChildKeys()).intersection(set(vc2.sortChildKeys()))
else:
# mode = 'all'
fields = set(vc1.sortChildKeys()).union(set(vc2.sortChildKeys()))
return fields
def compare_vcards(vc1, vc2, fields=None, mode='common'):
if fields is None:
fields = get_fields(vc1, vc2, mode=mode)
match = dict(
(field, str(vc1.getChildValue(field)).strip()==str(vc2.getChildValue(field)).strip())
for field in fields
)
summary = {
'abs_match': all(match.values()),
'rel_match': sum(match.values()) / len(match)
}
return match, summary
虚拟数据
vcard1 = """
BEGIN:VCARD
VERSION:3.0
N;CHARSET=UTF-8:Name;;;;
TEL:0005555000
END:VCARD
"""
vcard2 = """
BEGIN:VCARD
VERSION:3.0
N;CHARSET=UTF-8:Name;;;;
TEL:0005555000
EMAIL;CHARSET=UTF-8:my_email@email.com
END:VCARD
"""
# pip install vobject
import vobject
vc1 = vobject.readOne(vcard1)
vc2 = vobject.readOne(vcard2)