删除字符串的 "part" 以外的所有内容
Removing everything except a "part" of the string
这是字符串,一个完整的例子:
('1416851040', '1416851040', '50.62.177.118', '84.161.97.189', 'humpy_electro', 393883, '385962628'),
('1416851046', '1416851046', '2607:5300:60:6097::', '80.187.100.105', 'lagbugdc', 393884, '737537953'),
('1416851067', '1416851067', '174.66.174.101', '98.148.244.151', 'maihym', 393885, '1473193487'),
('1416851094', '1416851094', '2607:5300:60:6097::', '92.157.2.230', 'xeosse26', 393886, '737537953'),
我想从中删除 -EVERYTHING-,除了:facebook:jens.pettersson.7568
(用户名槽)
facebook:jens.pettersson.7568
实际上是 'facebook:jens.pettersson.7568'
,我希望它显示为:
facebook:jens.pettersson.7568
(看到那个白色的space了吗?)
然后对我的列表进行排序,其中所有 361k 行都像这样排列:
x x xx xcx xzx xyx xtz
所有 spaces,如果可能的话,在技术上占 1 行。
或者如果删除并仅收集我需要的 1 行就足够了,我想我可以手动进行排序
我要看字里行间,猜猜你想要的是这个:
BEFORE:
('1416851040', '1416851040', '50.62.177.118', '84.161.97.189', 'humpy_electro', 393883, '385962628'),
^ this is username
AFTER:
facebook:humpy_electro
您可以使用以下正则表达式处理该问题:
s/(?:[^,]*,){4}[\s'"]*([^'",]*).*/facebook:, /
即
(?: # begin non-capturing group
[^,]*, # zero or more non-comma characters, followed by a comma
){4} # end non-capturing group, and repeat 4 times
# this skips the first 4 columns of data
[\s'"]* # matches any whitespace and the first quote
( # begin capturing group 1
[^'",]* # capture all non-comma characters until the end quote
) # end capturing group 1
.* # match rest of line
# REPLACE WITH
facebook: # literal text
# capturing group 1
, # comma and a trailing space (not shown here)
瞧瞧。
这变成了这个:
('1416851040', '1416851040', '50.62.177.118', '84.161.97.189', 'humpy_electro', 393883, '385962628'),
('1416851046', '1416851046', '2607:5300:60:6097::', '80.187.100.105', 'lagbugdc', 393884, '737537953'),
('1416851067', '1416851067', '174.66.174.101', '98.148.244.151', 'maihym', 393885, '1473193487'),
('1416851094', '1416851094', '2607:5300:60:6097::', '92.157.2.230', 'xeosse26', 393886, '737537953'),
进入这个
facebook:humpy_electro, facebook:lagbugdc, facebook:maihym, facebook:xeosse26,
我是从朋友那里得到的,分为两个部分:第一步:^((.?'){4}) 什么都不替换,然后是第二步'( (.?$){1}) 什么都不替换。
这是字符串,一个完整的例子:
('1416851040', '1416851040', '50.62.177.118', '84.161.97.189', 'humpy_electro', 393883, '385962628'),
('1416851046', '1416851046', '2607:5300:60:6097::', '80.187.100.105', 'lagbugdc', 393884, '737537953'),
('1416851067', '1416851067', '174.66.174.101', '98.148.244.151', 'maihym', 393885, '1473193487'),
('1416851094', '1416851094', '2607:5300:60:6097::', '92.157.2.230', 'xeosse26', 393886, '737537953'),
我想从中删除 -EVERYTHING-,除了:facebook:jens.pettersson.7568
(用户名槽)
facebook:jens.pettersson.7568
实际上是 'facebook:jens.pettersson.7568'
,我希望它显示为:
facebook:jens.pettersson.7568
(看到那个白色的space了吗?)
然后对我的列表进行排序,其中所有 361k 行都像这样排列:
x x xx xcx xzx xyx xtz
所有 spaces,如果可能的话,在技术上占 1 行。
或者如果删除并仅收集我需要的 1 行就足够了,我想我可以手动进行排序
我要看字里行间,猜猜你想要的是这个:
BEFORE:
('1416851040', '1416851040', '50.62.177.118', '84.161.97.189', 'humpy_electro', 393883, '385962628'),
^ this is username
AFTER:
facebook:humpy_electro
您可以使用以下正则表达式处理该问题:
s/(?:[^,]*,){4}[\s'"]*([^'",]*).*/facebook:, /
即
(?: # begin non-capturing group
[^,]*, # zero or more non-comma characters, followed by a comma
){4} # end non-capturing group, and repeat 4 times
# this skips the first 4 columns of data
[\s'"]* # matches any whitespace and the first quote
( # begin capturing group 1
[^'",]* # capture all non-comma characters until the end quote
) # end capturing group 1
.* # match rest of line
# REPLACE WITH
facebook: # literal text
# capturing group 1
, # comma and a trailing space (not shown here)
瞧瞧。
这变成了这个:
('1416851040', '1416851040', '50.62.177.118', '84.161.97.189', 'humpy_electro', 393883, '385962628'),
('1416851046', '1416851046', '2607:5300:60:6097::', '80.187.100.105', 'lagbugdc', 393884, '737537953'),
('1416851067', '1416851067', '174.66.174.101', '98.148.244.151', 'maihym', 393885, '1473193487'),
('1416851094', '1416851094', '2607:5300:60:6097::', '92.157.2.230', 'xeosse26', 393886, '737537953'),
进入这个
facebook:humpy_electro, facebook:lagbugdc, facebook:maihym, facebook:xeosse26,
我是从朋友那里得到的,分为两个部分:第一步:^((.?'){4}) 什么都不替换,然后是第二步'( (.?$){1}) 什么都不替换。