用于模式匹配和范围比较的 AWK 键?
AWK keys for pattern matching and comparison of ranges?
我正在尝试使用 awk 键进行比较。
我在另一个 中使用了以下行。
这个想法是再次对正则表达式使用相同的方法,并将其放入新文件中:
awk 'NR==FNR{a[,,]; next} !((,,) in a)' file2 file1
这次,列不是直接比较的:
file_A.txt
chr1 1 10000
chr2 2 500
chr3 1 20000
chr1 10 15
file_B.narrowPeak
abs42322 chr1 25 15000
rvy42134 chr2 1 400
ttx24124 chr3 1 20000
sadas664 chr1 3 14
比较时必须忽略文件 B 的第 1、3 和 4 列。
我想存储第 2 列中匹配的文件 B 的所有行
一串文件 A 列 1.
文件A的第2列和第3列以及文件B的第3列和第4列是范围。
在本例中,第一个范围的起点是 1
,终点是 10000
,
第二个范围是 1
到 400
等等...
下一步应该是过滤掉文件 B 的行,当这些
范围不包括在其中一个
文件 A 的范围,仅比较在第一步中匹配的行。
示例:
将文件 B 的第 1 行与文件 A 的第 1 行和第 4 行进行比较,因为
chr1
。范围25-15000
大于1-10000
和10-15
,所以这一行被过滤掉了。
文件 B 的第 3 行与文件 A 的第 3 行进行比较,因为 chr3
。这
范围 1-20000
包含在(此处等于)1-20000
中,因此此行存储在输出文件中。
输出文件
ttx24124 chr3 1 20000
sadas664 chr1 3 14
编辑:
真实数据是这样的。实际上,文件要长得多,因此,第 2 列更加多样化,如下所示。
文件A
chr2 16148738 89330679
chr2 10845 16143362
chr2 94570062 106475164
chr2 99510860 113404812
chr2 86925269 87697988
chr2 91415844 91839817
chr9 64343270 64801485
chr9 65740027 66179306
chr1 144610018 144888777
chr2 95802871 108756829
chr16_KI270728v1_random 173055 1246276
chr9 63252862 63477334
chr2_KI270774v1_alt 0 188910
chr1_KI270712v1_random 7198 176043
chr9 63008373 63202857
chr2_GL383521v1_alt 0 143390
chr2 89530679 89663939
chr2 90236570 90402011
chr2_KI270894v1_alt 42931 213658
chr1 143320490 144356500
chr2 108732003 109758895
chr2_KI270770v1_alt 8875 136240
chr9 65130082 65281495
chr2 89767603 89960747
chr2_KI270769v1_alt 0 116362
chr2 94187600 94293015
chr9 40238354 40677933
chr2_KI270772v1_alt 1330 133041
chr16 33932082 34096118
chr13 18259709 18357163
chr14_KI270725v1_random 22583 138472
chr16 34779380 34943880
chr7 60892044 60992155
chr2_KI270773v1_alt 0 70886
chr2 110445435 110530068
chr9 43236167 43304276
chr22 10628203 10690626
chr2 87340235 87402756
chr21 8651170 8706715
chrUn_KI270744v1 38861 105138
chr2 110395939 110441265
chr2 109930242 109975557
chr1 143315267 144153087
chr17 26716619 26775606
文件 B
HumanGM18558_peak_1 chr1 9997 10330 150 . 10.78887 18.86368 15.08777 100
HumanGM18558_peak_2 chr1 628885 635117 2509 . 83.77238 255.95094 250.99944 5270
HumanGM18558_peak_3 chr1 1250086 1250413 94 . 8.25031 13.14358 9.49110 143
HumanGM18558_peak_4 chr1 1724342 1724642 56 . 6.34639 9.18460 5.65124 88
HumanGM18558_peak_5 chr1 8629404 8629679 56 . 6.34639 9.18460 5.65124 180
HumanGM18558_peak_6 chr1 9181157 9181438 56 . 6.34639 9.18460 5.65124 65
HumanGM18558_peak_7 chr1 9626296 9626600 56 . 6.34639 9.18460 5.65124 247
HumanGM18558_peak_8 chr1 11908028 11908531 341 . 18.40454 38.14250 34.12190 246
HumanGM18558_peak_9 chr1 11909636 11910042 81 . 7.61567 11.78841 8.17169 150
HumanGM18558_peak_10 chr1 15966215 15966638 81 . 7.61567 11.78841 8.17169 200
HumanGM18558_peak_11 chr1 16513837 16514451 591 . 27.28949 63.33707 59.13566 271
HumanGM18558_peak_12 chr1 16613629 16613934 81 . 7.61567 11.78841 8.17169 103
HumanGM18558_peak_13 chr1 16644496 16644800 68 . 6.98103 10.46777 6.88890 191
HumanGM18558_peak_14 chr1 16666545 16667135 291 . 16.50062 33.08122 29.10692 306
HumanGM18558_peak_15 chr1 16740126 16740977 307 . 17.13526 34.75273 30.76194 453
HumanGM18558_peak_16 chr1 16895871 16896489 517 . 24.75093 55.90571 51.76084 254
HumanGM18558_peak_17 chr1 16905126 16905616 242 . 14.59670 28.16750 24.24907 224
HumanGM18558_peak_18 chr1 21294320 21294624 81 . 7.61567 11.78841 8.17169 161
HumanGM18558_peak_19 chr1 24744867 24745154 68 . 6.98103 10.46777 6.88890 136
HumanGM18558_peak_20 chr1 24900187 24900971 94 . 8.25031 13.14358 9.49110 526
HumanGM18558_peak_21 chr1 24930434 24930704 56 . 6.34639 9.18460 5.65124 209
HumanGM18558_peak_22 chr1 25022463 25022733 81 . 7.61567 11.78841 8.17169 177
HumanGM18558_peak_23 chr1 25998134 25998419 68 . 6.98103 10.46777 6.88890 96
HumanGM18558_peak_24 chr1 26541891 26542188 68 . 6.98103 10.46777 6.88890 86
HumanGM18558_peak_25 chr1 26744090 26744360 81 . 7.61567 11.78841 8.17169 163
HumanGM18558_peak_26 chr1 26890007 26890277 44 . 5.71175 7.94242 4.46638 52
HumanGM18558_peak_27 chr1 27322070 27322340 56 . 6.34639 9.18460 5.65124 136
HumanGM18558_peak_28 chr1 27631584 27631967 108 . 8.88495 14.53075 10.84614 241
HumanGM18558_peak_29 chr1 27884095 27884365 56 . 6.34639 9.18460 5.65124 170
HumanGM18558_peak_30 chr1 28510350 28510620 68 . 6.98103 10.46777 6.88890 238
HumanGM18558_peak_31 chr1 28510787 28511122 56 . 6.34639 9.18460 5.65124 109
HumanGM18558_peak_32 chr1 28648490 28649063 307 . 17.13526 34.75273 30.76194 238
HumanGM18558_peak_33 chr1 28736505 28736783 68 . 6.98103 10.46777 6.88890 135
HumanGM18558_peak_34 chr1 31431897 31432219 56 . 6.34639 9.18460 5.65124 84
HumanGM18558_peak_35 chr1 31944389 31944659 56 . 6.34639 9.18460 5.65124 42
HumanGM18558_peak_36 chr1 32250032 32250320 56 . 6.34639 9.18460 5.65124 42
HumanGM18558_peak_37 chr1 37477246 37477607 94 . 8.25031 13.14358 9.49110 211
HumanGM18558_peak_38 chr1 37989885 37990303 122 . 9.51959 15.94772 12.23132 244
HumanGM18558_peak_39 chr1 39026095 39026365 68 . 6.98103 10.46777 6.88890 108
HumanGM18558_peak_40 chr1 40668966 40669236 56 . 6.34639 9.18460 5.65124 77
HumanGM18558_peak_41 chr1 44721466 44721913 258 . 15.23134 29.78794 25.84961 210
HumanGM18558_peak_42 chr1 44730832 44731120 94 . 8.25031 13.14358 9.49110 172
HumanGM18558_peak_43 chr1 44819632 44819969 122 . 9.51959 15.94772 12.23132 169
HumanGM18558_peak_44 chr1 46132753 46133023 56 . 6.34639 9.18460 5.65124 233
HumanGM18558_peak_45 chr1 46331051 46331321 68 . 6.98103 10.46777 6.88890 141
HumanGM18558_peak_46 chr1 66282467 66282777 108 . 8.88495 14.53075 10.84614 140
HumanGM18558_peak_47 chr1 78004335 78004605 81 . 7.61567 11.78841 8.17169 128
HumanGM18558_peak_48 chr1 88684186 88684456 56 . 6.34639 9.18460 5.65124 62
HumanGM18558_peak_49 chr1 91387139 91387504 94 . 8.25031 13.14358 9.49110 129
HumanGM18558_peak_50 chr1 93079024 93079327 94 . 8.25031 13.14358 9.49110 182
HumanGM18558_peak_51 chr1 101235617 101235902 68 . 6.98103 10.46777 6.88890 121
HumanGM18558_peak_52 chr1 101407748 101408136 81 . 7.61567 11.78841 8.17169 246
HumanGM18558_peak_53 chr1 109099999 109100368 122 . 9.51959 15.94772 12.23132 222
HumanGM18558_peak_54 chr1 109984498 109984792 81 . 7.61567 11.78841 8.17169 107
HumanGM18558_peak_55 chr1 110902916 110903186 56 . 6.34639 9.18460 5.65124 92
HumanGM18558_peak_56 chr1 111215999 111216474 108 . 8.88495 14.53075 10.84614 257
HumanGM18558_peak_57 chr1 111221711 111222087 68 . 6.98103 10.46777 6.88890 152
HumanGM18558_peak_58 chr1 113904864 113905420 81 . 7.61567 11.78841 8.17169 258
HumanGM18558_peak_59 chr1 116504467 116504737 68 . 6.98103 10.46777 6.88890 165
HumanGM18558_peak_60 chr1 116558228 116558508 94 . 8.25031 13.14358 9.49110 175
HumanGM18558_peak_61 chr1 120850520 120851089 481 . 23.48165 52.25492 48.13765 265
HumanGM18558_peak_62 chr1 125069249 125069729 122 . 9.51959 15.94772 12.23132 240
HumanGM18558_peak_63 chr1 125080252 125080535 44 . 5.71175 7.94242 4.46638 150
HumanGM18558_peak_64 chr1 125080944 125081214 44 . 5.71175 7.94242 4.46638 181
HumanGM18558_peak_65 chr1 125166080 125168950 762 . 33.00124 80.62179 76.28172 1813
HumanGM18558_peak_66 chr1 125168955 125169667 68 . 6.98103 10.46777 6.88890 462
HumanGM18558_peak_67 chr1 125169674 125170842 392 . 20.30845 43.33632 39.27747 271
HumanGM18558_peak_68 chr1 125170903 125171408 195 . 12.69278 23.42019 19.56689 240
HumanGM18558_peak_69 chr1 125173576 125174604 195 . 12.69278 23.42019 19.56689 561
HumanGM18558_peak_70 chr1 125175148 125176443 427 . 21.57773 46.86636 42.78468 916
HumanGM18558_peak_71 chr1 125176541 125184739 4637 . 138.35135 469.20218 463.71423 3666
HumanGM18558_peak_72 chr1 143184419 143188606 1999 . 69.81032 204.82724 199.97639 690
HumanGM18558_peak_73 chr1 143188729 143198082 3304 . 104.71547 335.55066 330.42758 4947
HumanGM18558_peak_74 chr1 143198227 143204460 2867 . 93.29197 291.73703 286.70563 4484
HumanGM18558_peak_75 chr1 143204483 143204990 150 . 10.78887 18.86368 15.08777 256
HumanGM18558_peak_76 chr1 143205353 143208069 2675 . 88.21485 272.56412 267.57269 950
HumanGM18558_peak_77 chr1 143208226 143210053 358 . 19.03918 39.85970 35.82584 1250
HumanGM18558_peak_78 chr1 143210072 143225450 4051 . 123.75465 410.42435 405.11169 4606
HumanGM18558_peak_79 chr1 143225537 143226480 226 . 13.96206 26.56550 22.66770 496
HumanGM18558_peak_80 chr1 143226822 143242516 2771 . 90.75341 282.12637 277.11282 6269
将您的输入转换为 bed format. The 3 required fields are chromosome, start position and end position. The rest of the fields are optional. Then use bedtools intersect
from the bedtools
包。例如:
# Create input files:
cat > file_A.txt <<EOF
chr1 1 10000
chr2 2 500
chr3 1 20000
chr1 10 15
EOF
cat > file_B.narrowPeak <<EOF
abs42322 chr1 25 15000
rvy42134 chr2 1 400
ttx24124 chr3 1 20000
sadas664 chr1 3 14
EOF
# Convert to bed format:
perl -lane 'print join "\t", @F;' file_A.txt > file_A.bed
perl -lane 'print join "\t", @F[1, 2, 3];' file_B.narrowPeak > file_B.bed
# Find feature in file_B.bed contained entirely in file_A.bed:
bedtools intersect -a file_B.bed -b file_A.bed -wa -f 1.0 > file_A_in_B.bed
输出:
chr3 1 20000
chr1 3 14
bedtools intersect
命令与这些选项一起使用:
-wa
: 为每个重叠写入-a
选项(file_B.bed
)中指定的文件的原始条目。
-f
:作为 -a
选项中指定的文件的一小部分所需的最小重叠。使用 fraction = 1.0 确保 file_B.bed
特征的 100% 包含在 file_A.bed
.
中
我正在尝试使用 awk 键进行比较。
我在另一个
awk 'NR==FNR{a[,,]; next} !((,,) in a)' file2 file1
这次,列不是直接比较的:
file_A.txt
chr1 1 10000
chr2 2 500
chr3 1 20000
chr1 10 15
file_B.narrowPeak
abs42322 chr1 25 15000
rvy42134 chr2 1 400
ttx24124 chr3 1 20000
sadas664 chr1 3 14
比较时必须忽略文件 B 的第 1、3 和 4 列。 我想存储第 2 列中匹配的文件 B 的所有行 一串文件 A 列 1.
文件A的第2列和第3列以及文件B的第3列和第4列是范围。 在本例中,第一个范围的起点是
1
,终点是10000
, 第二个范围是1
到400
等等... 下一步应该是过滤掉文件 B 的行,当这些 范围不包括在其中一个 文件 A 的范围,仅比较在第一步中匹配的行。
示例:
将文件 B 的第 1 行与文件 A 的第 1 行和第 4 行进行比较,因为
chr1
。范围25-15000
大于1-10000
和10-15
,所以这一行被过滤掉了。
文件 B 的第 3 行与文件 A 的第 3 行进行比较,因为 chr3
。这
范围 1-20000
包含在(此处等于)1-20000
中,因此此行存储在输出文件中。
输出文件
ttx24124 chr3 1 20000
sadas664 chr1 3 14
编辑: 真实数据是这样的。实际上,文件要长得多,因此,第 2 列更加多样化,如下所示。
文件A
chr2 16148738 89330679
chr2 10845 16143362
chr2 94570062 106475164
chr2 99510860 113404812
chr2 86925269 87697988
chr2 91415844 91839817
chr9 64343270 64801485
chr9 65740027 66179306
chr1 144610018 144888777
chr2 95802871 108756829
chr16_KI270728v1_random 173055 1246276
chr9 63252862 63477334
chr2_KI270774v1_alt 0 188910
chr1_KI270712v1_random 7198 176043
chr9 63008373 63202857
chr2_GL383521v1_alt 0 143390
chr2 89530679 89663939
chr2 90236570 90402011
chr2_KI270894v1_alt 42931 213658
chr1 143320490 144356500
chr2 108732003 109758895
chr2_KI270770v1_alt 8875 136240
chr9 65130082 65281495
chr2 89767603 89960747
chr2_KI270769v1_alt 0 116362
chr2 94187600 94293015
chr9 40238354 40677933
chr2_KI270772v1_alt 1330 133041
chr16 33932082 34096118
chr13 18259709 18357163
chr14_KI270725v1_random 22583 138472
chr16 34779380 34943880
chr7 60892044 60992155
chr2_KI270773v1_alt 0 70886
chr2 110445435 110530068
chr9 43236167 43304276
chr22 10628203 10690626
chr2 87340235 87402756
chr21 8651170 8706715
chrUn_KI270744v1 38861 105138
chr2 110395939 110441265
chr2 109930242 109975557
chr1 143315267 144153087
chr17 26716619 26775606
文件 B
HumanGM18558_peak_1 chr1 9997 10330 150 . 10.78887 18.86368 15.08777 100
HumanGM18558_peak_2 chr1 628885 635117 2509 . 83.77238 255.95094 250.99944 5270
HumanGM18558_peak_3 chr1 1250086 1250413 94 . 8.25031 13.14358 9.49110 143
HumanGM18558_peak_4 chr1 1724342 1724642 56 . 6.34639 9.18460 5.65124 88
HumanGM18558_peak_5 chr1 8629404 8629679 56 . 6.34639 9.18460 5.65124 180
HumanGM18558_peak_6 chr1 9181157 9181438 56 . 6.34639 9.18460 5.65124 65
HumanGM18558_peak_7 chr1 9626296 9626600 56 . 6.34639 9.18460 5.65124 247
HumanGM18558_peak_8 chr1 11908028 11908531 341 . 18.40454 38.14250 34.12190 246
HumanGM18558_peak_9 chr1 11909636 11910042 81 . 7.61567 11.78841 8.17169 150
HumanGM18558_peak_10 chr1 15966215 15966638 81 . 7.61567 11.78841 8.17169 200
HumanGM18558_peak_11 chr1 16513837 16514451 591 . 27.28949 63.33707 59.13566 271
HumanGM18558_peak_12 chr1 16613629 16613934 81 . 7.61567 11.78841 8.17169 103
HumanGM18558_peak_13 chr1 16644496 16644800 68 . 6.98103 10.46777 6.88890 191
HumanGM18558_peak_14 chr1 16666545 16667135 291 . 16.50062 33.08122 29.10692 306
HumanGM18558_peak_15 chr1 16740126 16740977 307 . 17.13526 34.75273 30.76194 453
HumanGM18558_peak_16 chr1 16895871 16896489 517 . 24.75093 55.90571 51.76084 254
HumanGM18558_peak_17 chr1 16905126 16905616 242 . 14.59670 28.16750 24.24907 224
HumanGM18558_peak_18 chr1 21294320 21294624 81 . 7.61567 11.78841 8.17169 161
HumanGM18558_peak_19 chr1 24744867 24745154 68 . 6.98103 10.46777 6.88890 136
HumanGM18558_peak_20 chr1 24900187 24900971 94 . 8.25031 13.14358 9.49110 526
HumanGM18558_peak_21 chr1 24930434 24930704 56 . 6.34639 9.18460 5.65124 209
HumanGM18558_peak_22 chr1 25022463 25022733 81 . 7.61567 11.78841 8.17169 177
HumanGM18558_peak_23 chr1 25998134 25998419 68 . 6.98103 10.46777 6.88890 96
HumanGM18558_peak_24 chr1 26541891 26542188 68 . 6.98103 10.46777 6.88890 86
HumanGM18558_peak_25 chr1 26744090 26744360 81 . 7.61567 11.78841 8.17169 163
HumanGM18558_peak_26 chr1 26890007 26890277 44 . 5.71175 7.94242 4.46638 52
HumanGM18558_peak_27 chr1 27322070 27322340 56 . 6.34639 9.18460 5.65124 136
HumanGM18558_peak_28 chr1 27631584 27631967 108 . 8.88495 14.53075 10.84614 241
HumanGM18558_peak_29 chr1 27884095 27884365 56 . 6.34639 9.18460 5.65124 170
HumanGM18558_peak_30 chr1 28510350 28510620 68 . 6.98103 10.46777 6.88890 238
HumanGM18558_peak_31 chr1 28510787 28511122 56 . 6.34639 9.18460 5.65124 109
HumanGM18558_peak_32 chr1 28648490 28649063 307 . 17.13526 34.75273 30.76194 238
HumanGM18558_peak_33 chr1 28736505 28736783 68 . 6.98103 10.46777 6.88890 135
HumanGM18558_peak_34 chr1 31431897 31432219 56 . 6.34639 9.18460 5.65124 84
HumanGM18558_peak_35 chr1 31944389 31944659 56 . 6.34639 9.18460 5.65124 42
HumanGM18558_peak_36 chr1 32250032 32250320 56 . 6.34639 9.18460 5.65124 42
HumanGM18558_peak_37 chr1 37477246 37477607 94 . 8.25031 13.14358 9.49110 211
HumanGM18558_peak_38 chr1 37989885 37990303 122 . 9.51959 15.94772 12.23132 244
HumanGM18558_peak_39 chr1 39026095 39026365 68 . 6.98103 10.46777 6.88890 108
HumanGM18558_peak_40 chr1 40668966 40669236 56 . 6.34639 9.18460 5.65124 77
HumanGM18558_peak_41 chr1 44721466 44721913 258 . 15.23134 29.78794 25.84961 210
HumanGM18558_peak_42 chr1 44730832 44731120 94 . 8.25031 13.14358 9.49110 172
HumanGM18558_peak_43 chr1 44819632 44819969 122 . 9.51959 15.94772 12.23132 169
HumanGM18558_peak_44 chr1 46132753 46133023 56 . 6.34639 9.18460 5.65124 233
HumanGM18558_peak_45 chr1 46331051 46331321 68 . 6.98103 10.46777 6.88890 141
HumanGM18558_peak_46 chr1 66282467 66282777 108 . 8.88495 14.53075 10.84614 140
HumanGM18558_peak_47 chr1 78004335 78004605 81 . 7.61567 11.78841 8.17169 128
HumanGM18558_peak_48 chr1 88684186 88684456 56 . 6.34639 9.18460 5.65124 62
HumanGM18558_peak_49 chr1 91387139 91387504 94 . 8.25031 13.14358 9.49110 129
HumanGM18558_peak_50 chr1 93079024 93079327 94 . 8.25031 13.14358 9.49110 182
HumanGM18558_peak_51 chr1 101235617 101235902 68 . 6.98103 10.46777 6.88890 121
HumanGM18558_peak_52 chr1 101407748 101408136 81 . 7.61567 11.78841 8.17169 246
HumanGM18558_peak_53 chr1 109099999 109100368 122 . 9.51959 15.94772 12.23132 222
HumanGM18558_peak_54 chr1 109984498 109984792 81 . 7.61567 11.78841 8.17169 107
HumanGM18558_peak_55 chr1 110902916 110903186 56 . 6.34639 9.18460 5.65124 92
HumanGM18558_peak_56 chr1 111215999 111216474 108 . 8.88495 14.53075 10.84614 257
HumanGM18558_peak_57 chr1 111221711 111222087 68 . 6.98103 10.46777 6.88890 152
HumanGM18558_peak_58 chr1 113904864 113905420 81 . 7.61567 11.78841 8.17169 258
HumanGM18558_peak_59 chr1 116504467 116504737 68 . 6.98103 10.46777 6.88890 165
HumanGM18558_peak_60 chr1 116558228 116558508 94 . 8.25031 13.14358 9.49110 175
HumanGM18558_peak_61 chr1 120850520 120851089 481 . 23.48165 52.25492 48.13765 265
HumanGM18558_peak_62 chr1 125069249 125069729 122 . 9.51959 15.94772 12.23132 240
HumanGM18558_peak_63 chr1 125080252 125080535 44 . 5.71175 7.94242 4.46638 150
HumanGM18558_peak_64 chr1 125080944 125081214 44 . 5.71175 7.94242 4.46638 181
HumanGM18558_peak_65 chr1 125166080 125168950 762 . 33.00124 80.62179 76.28172 1813
HumanGM18558_peak_66 chr1 125168955 125169667 68 . 6.98103 10.46777 6.88890 462
HumanGM18558_peak_67 chr1 125169674 125170842 392 . 20.30845 43.33632 39.27747 271
HumanGM18558_peak_68 chr1 125170903 125171408 195 . 12.69278 23.42019 19.56689 240
HumanGM18558_peak_69 chr1 125173576 125174604 195 . 12.69278 23.42019 19.56689 561
HumanGM18558_peak_70 chr1 125175148 125176443 427 . 21.57773 46.86636 42.78468 916
HumanGM18558_peak_71 chr1 125176541 125184739 4637 . 138.35135 469.20218 463.71423 3666
HumanGM18558_peak_72 chr1 143184419 143188606 1999 . 69.81032 204.82724 199.97639 690
HumanGM18558_peak_73 chr1 143188729 143198082 3304 . 104.71547 335.55066 330.42758 4947
HumanGM18558_peak_74 chr1 143198227 143204460 2867 . 93.29197 291.73703 286.70563 4484
HumanGM18558_peak_75 chr1 143204483 143204990 150 . 10.78887 18.86368 15.08777 256
HumanGM18558_peak_76 chr1 143205353 143208069 2675 . 88.21485 272.56412 267.57269 950
HumanGM18558_peak_77 chr1 143208226 143210053 358 . 19.03918 39.85970 35.82584 1250
HumanGM18558_peak_78 chr1 143210072 143225450 4051 . 123.75465 410.42435 405.11169 4606
HumanGM18558_peak_79 chr1 143225537 143226480 226 . 13.96206 26.56550 22.66770 496
HumanGM18558_peak_80 chr1 143226822 143242516 2771 . 90.75341 282.12637 277.11282 6269
将您的输入转换为 bed format. The 3 required fields are chromosome, start position and end position. The rest of the fields are optional. Then use bedtools intersect
from the bedtools
包。例如:
# Create input files:
cat > file_A.txt <<EOF
chr1 1 10000
chr2 2 500
chr3 1 20000
chr1 10 15
EOF
cat > file_B.narrowPeak <<EOF
abs42322 chr1 25 15000
rvy42134 chr2 1 400
ttx24124 chr3 1 20000
sadas664 chr1 3 14
EOF
# Convert to bed format:
perl -lane 'print join "\t", @F;' file_A.txt > file_A.bed
perl -lane 'print join "\t", @F[1, 2, 3];' file_B.narrowPeak > file_B.bed
# Find feature in file_B.bed contained entirely in file_A.bed:
bedtools intersect -a file_B.bed -b file_A.bed -wa -f 1.0 > file_A_in_B.bed
输出:
chr3 1 20000
chr1 3 14
bedtools intersect
命令与这些选项一起使用:
-wa
: 为每个重叠写入-a
选项(file_B.bed
)中指定的文件的原始条目。
-f
:作为 -a
选项中指定的文件的一小部分所需的最小重叠。使用 fraction = 1.0 确保 file_B.bed
特征的 100% 包含在 file_A.bed
.