用于在 Perl 中捕获 vcard 组的正则表达式

RegEx for capturing vcard groups in Perl

我大学这学期一直在学习句法和语义,正则表达式经常参与其中。作为一种锻炼方式,我发现了可以应用正则表达式的不同场景。考虑到 VCards 是其中之一,我一直无法指定一些东西来对 BEGIN:VCARDEND:VCARD

之间的所有内容进行分组

请注意,.vcf 文件使用换行符

我对此的最佳模式如下所示:(尽管我尝试了很多变体

BEGIN:VCARD\n([^(END:VCARD)\n]*END:VCARD

所以想法是:"From begin vcard read all that is not END:VCARD, and which ends with a linebreak, until end vcard is encountered"

我使用的是 perl 变体,但使用的是 vala 编程语言。

我知道问题出在我的模式上,但经过长时间的阅读和反复试验,我仍然不太确定为什么测试仪显示它不起作用。

测试数据:

BEGIN:VCARD
VERSION:3.0
N:Doe;John;;;
FN:John Doe
ORG:Example.com Inc.;
TITLE:Imaginary test person
EMAIL;type=INTERNET;type=WORK;type=pref:johnDoe@example.org
TEL;type=WORK;type=pref:+1 617 555 1212
TEL;type=WORK:+1 (617) 555-1234
TEL;type=CELL:+1 781 555 1212
TEL;type=HOME:+1 202 555 1212
NOTE:John Doe has a long and varied history\, being documented on more police files that anyone else. Reports of his death are alas numerous.
CATEGORIES:Work,Test group
X-ABUID:5AD380FD-B2DE-4261-BA99-DE1D1DB52FBE\:ABPerson
END:VCARD
BEGIN:VCARD
VERSION:3.0
N:Doe;Jane;;;
FN:Jane Doe
ORG:Example.com Inc.;
TITLE:Another Imaginary test person
EMAIL;type=INTERNET;type=WORK;type=pref:johnDoe@example.org
TEL;type=WORK;type=pref:+1 617 555 1213
TEL;type=WORK:+1 (617) 555-1233
TEL;type=CELL:+1 781 555 1213
TEL;type=HOME:+1 202 555 1213
NOTE:Jane Doe has a long and varied history\, being documented on more police files that anyone else. Reports of her death are alas numerous.
CATEGORIES:Work,Test group
X-ABUID:5AD380FD-B2DE-4261-BA99-DE1D1DB52FBE\:ABPerson
END:VCARD

在我最成功的测试中,它标记了从第一个 BEGIN:VCARDEND:VCARD

之前的所有内容

这个表达式可能会帮助你做到这一点:

(BEGIN:VCARD([\s\S]*?)END:VCARD)

Perl 测试:

use strict;

my $str = 'BEGIN:VCARD
VERSION:3.0
N:Doe;John;;;
FN:John Doe
ORG:Example.com Inc.;
TITLE:Imaginary test person
EMAIL;type=INTERNET;type=WORK;type=pref:johnDoe@example.org
TEL;type=WORK;type=pref:+1 617 555 1212
TEL;type=WORK:+1 (617) 555-1234
TEL;type=CELL:+1 781 555 1212
TEL;type=HOME:+1 202 555 1212
NOTE:John Doe has a long and varied history\, being documented on more police files that anyone else. Reports of his death are alas numerous.
CATEGORIES:Work,Test group
X-ABUID:5AD380FD-B2DE-4261-BA99-DE1D1DB52FBE\:ABPerson
END:VCARD
BEGIN:VCARD
VERSION:3.0
N:Doe;Jane;;;
FN:Jane Doe
ORG:Example.com Inc.;
TITLE:Another Imaginary test person
EMAIL;type=INTERNET;type=WORK;type=pref:johnDoe@example.org
TEL;type=WORK;type=pref:+1 617 555 1213
TEL;type=WORK:+1 (617) 555-1233
TEL;type=CELL:+1 781 555 1213
TEL;type=HOME:+1 202 555 1213
NOTE:Jane Doe has a long and varied history\, being documented on more police files that anyone else. Reports of her death are alas numerous.
CATEGORIES:Work,Test group
X-ABUID:5AD380FD-B2DE-4261-BA99-DE1D1DB52FBE\:ABPerson
END:VCARD';
my $regex = qr/(BEGIN:VCARD([\s\S]*?)END:VCARD)/mp;

if ( $str =~ /$regex/g ) {
  print "Whole match is ${^MATCH} and its start/end positions can be obtained via $-[0] and $+[0]\n";
  # print "Capture Group 1 is  and its start/end positions can be obtained via $-[1] and $+[1]\n";
  # print "Capture Group 2 is  ... and so on\n";
}

# ${^POSTMATCH} and ${^PREMATCH} are also available with the use of '/p'
# Named capture groups can be called via $+{name}

正则表达式

如果这不是您想要的表达方式,您可以 modify/change 您的表达方式 regex101.com

正则表达式电路

您还可以在 jex.im:

中可视化您的表情