提取以特定数字开头的字符串
extract strings starting with specific digits
我想提取所有以零开头的数字 - 例如"00000019649216698073892337728035449839"
从下面的字符向量到一个新变量使用这个 :
extract <- str_extract_all(char,"^[0000]+\d+")
什么是合适的正则表达式?有人可以建议一个简单易懂的资源,其中包含大量示例以帮助理解正则表达式吗?
谢谢
> head (char)
[1] "00000019649216698073892337728035449839 39071 13678 13664 12867 13497 15397 15473 15474"
[2] "00000080923073746593943837301491443853 151080 42474 81803"
[3] "00000083877135182933559945053496630804 12867 15473 15474 13678 13497 15397 13664 13471"
[4] "00000093886891188081884033470837454487 12867 14830 14941 15397 14245 16674 15766 15360 14761 14766 14292 15473 15474 15475 15732 16215 82269 15864 15691 15705"
[5] "00000115676684814560790906286871109938 12867 15473 15474 15397 13664 90037 13471"
[6] "00000116775812010820497846774603592685 116731 17623 116727 116722 39071 15125 15131 8847 8846 90037 8848 12867 12871 12868 15068 39340 15063 15061 15076 52253 52251 12826 12824 12839 12834 39383 14939 14828 104636 14765 14759 14782 14728 14735 14745 14750 57038 14631 14647 50975 14610 56996 50924 50923 50933 50931 14535 14514 14519 22120 14391 14392 65009 14382 20415 65012 14339 109086 7618 81306 81308 81309 14199 14260 14271 14268 109886 16191 14245 14250 14290 14285 14321 14326 14307 14304 16503 14315 14319 81118 81117 73299 73298 13630 13631 13617 13608 13614 13612 13606 13595 13597 13599 13589 13591 13579 13581 15764 13583 81838 50063 13573 13575 13686 13679 13678 13677 15865 13664 13661 13662 15815 82166 82165 13644 13641 13640 13643 13642 13636 13639 13633 13632 13635 13634 13703 13709 63671 63677 50038 63672 63674 15731 15681 15702 134029 13344 13327 13342 13420 13418 13409 13408 13436 13439 13432 13433 13434 13428 13430 13424 13475 13476 13481 13482 13486 15401 13489 13492 13497 15395 15394 15397 15396 13442 13440 13444 13451 13449 13454 13459 15370 13458 13457 13456 15374 13462 13460 15361 13471 15366 13470 15364 13540 13542 15480 13548 15478 13551 15473 15474 13546 15475 13556 13558 101400 13564 13567 13562 13510 13508 13519 13516 13514 13526 13524 13535 13532 13530"
如果您只想从以零开头的元素中提取第一个元素,请尝试
regmatches(char, regexpr("^0+\d+", char))
# [1] "00000019649216698073892337728035449839"
# [2] "00000080923073746593943837301491443853"
# [3] "00000083877135182933559945053496630804"
# [4] "00000093886891188081884033470837454487"
# [5] "00000115676684814560790906286871109938"
# [6] "00000116775812010820497846774603592685"
要走另一条路(删除所有匹配项),您可以使用 sub()
sub("^0+\d+ ", "", x)
我想提取所有以零开头的数字 - 例如"00000019649216698073892337728035449839"
从下面的字符向量到一个新变量使用这个 :
extract <- str_extract_all(char,"^[0000]+\d+")
什么是合适的正则表达式?有人可以建议一个简单易懂的资源,其中包含大量示例以帮助理解正则表达式吗?
谢谢
> head (char)
[1] "00000019649216698073892337728035449839 39071 13678 13664 12867 13497 15397 15473 15474"
[2] "00000080923073746593943837301491443853 151080 42474 81803"
[3] "00000083877135182933559945053496630804 12867 15473 15474 13678 13497 15397 13664 13471"
[4] "00000093886891188081884033470837454487 12867 14830 14941 15397 14245 16674 15766 15360 14761 14766 14292 15473 15474 15475 15732 16215 82269 15864 15691 15705"
[5] "00000115676684814560790906286871109938 12867 15473 15474 15397 13664 90037 13471"
[6] "00000116775812010820497846774603592685 116731 17623 116727 116722 39071 15125 15131 8847 8846 90037 8848 12867 12871 12868 15068 39340 15063 15061 15076 52253 52251 12826 12824 12839 12834 39383 14939 14828 104636 14765 14759 14782 14728 14735 14745 14750 57038 14631 14647 50975 14610 56996 50924 50923 50933 50931 14535 14514 14519 22120 14391 14392 65009 14382 20415 65012 14339 109086 7618 81306 81308 81309 14199 14260 14271 14268 109886 16191 14245 14250 14290 14285 14321 14326 14307 14304 16503 14315 14319 81118 81117 73299 73298 13630 13631 13617 13608 13614 13612 13606 13595 13597 13599 13589 13591 13579 13581 15764 13583 81838 50063 13573 13575 13686 13679 13678 13677 15865 13664 13661 13662 15815 82166 82165 13644 13641 13640 13643 13642 13636 13639 13633 13632 13635 13634 13703 13709 63671 63677 50038 63672 63674 15731 15681 15702 134029 13344 13327 13342 13420 13418 13409 13408 13436 13439 13432 13433 13434 13428 13430 13424 13475 13476 13481 13482 13486 15401 13489 13492 13497 15395 15394 15397 15396 13442 13440 13444 13451 13449 13454 13459 15370 13458 13457 13456 15374 13462 13460 15361 13471 15366 13470 15364 13540 13542 15480 13548 15478 13551 15473 15474 13546 15475 13556 13558 101400 13564 13567 13562 13510 13508 13519 13516 13514 13526 13524 13535 13532 13530"
如果您只想从以零开头的元素中提取第一个元素,请尝试
regmatches(char, regexpr("^0+\d+", char))
# [1] "00000019649216698073892337728035449839"
# [2] "00000080923073746593943837301491443853"
# [3] "00000083877135182933559945053496630804"
# [4] "00000093886891188081884033470837454487"
# [5] "00000115676684814560790906286871109938"
# [6] "00000116775812010820497846774603592685"
要走另一条路(删除所有匹配项),您可以使用 sub()
sub("^0+\d+ ", "", x)