如何绘制分布在单个连续轴上数千个支架上的 SNP 的 p 值?
How can I plot p-values for SNPs that are spread across thousands of scaffolds on a single continuous axis?
我有关联映射派生的 SNP 的 P 值,这些 SNP 散布在非模型生物体中的数千个支架中。我想在曼哈顿式图上绘制每个 SNP 的 P 值。我不关心支架的顺序,但我想保留各自支架上 SNP 位置的相对顺序和间距。我只是想粗略地可视化有多少基因组区域与表型显着相关。例如:
我的数据是这样的:
SCAFFOLD POSITION
1 8967
1 8986
1 9002
1 9025
1 9064
2 60995
2 61091
2 61642
2 61898
2 61921
2 62034
2 62133
2 62202
2 62219
2 62220
3 731894
3 731907
3 731962
3 731999
3 732000
3 732050
3 732076
3 732097
我想编写一个 perl 代码来创建第三列,该列保留同一支架上 SNP 之间的距离,同时任意地将支架间隔一定数量(下例中为 100):
SCAFFOLD POSITION CONTINUOUS_AXIS
1 8967 8967
1 8986 8986
1 9002 9002
1 9025 9025
1 9064 9064
2 60995 9164
2 61091 9260
2 61642 9811
2 61898 10067
2 61921 10090
2 62034 10203
2 62133 10302
2 62202 10371
2 62219 10388
2 62220 10389
3 731894 10489
3 731907 10502
3 731962 10557
3 731999 10594
3 732000 10595
3 732050 10645
3 732076 10671
3 732097 10692
感谢任何可能有好的策略的人。
像下面这样的东西应该可以工作:
#!/usr/bin/env perl
use strict;
use warnings;
use constant SCAFFOLD_SPACING => 100;
my ($last_scaffold, $last_position, $continuous_axis, $found_data);
my $input = './input';
open my $fh, "<$input"
or die "Unable to open '$input' for reading : $!";
print join( "\t", qw( SCAFFOLD POSITION CONTINUOUS_AXIS ) ) . "\n"; # Output Header
while (<$fh>) {
next unless m|\d|; # Skip non-data lines
my ($scaffold, $position) = split /\s+/; # Split on whitespace
unless ($found_data++) {
# Initialize
$last_scaffold = $scaffold; # Set to first data value
$last_position = $position; # Set to first data value
$continuous_axis = $position; # Start continuous axis at first position
}
my $position_diff = $position - $last_position;
my $scaffold_diff = $scaffold - $last_scaffold;
if ($scaffold_diff == 0) {
$continuous_axis += $position_diff;
} else {
$continuous_axis += SCAFFOLD_SPACING;
}
print join( "\t", $scaffold, $position, $continuous_axis ) . "\n";
# Update
$last_scaffold = $scaffold;
$last_position = $position;
}
我有关联映射派生的 SNP 的 P 值,这些 SNP 散布在非模型生物体中的数千个支架中。我想在曼哈顿式图上绘制每个 SNP 的 P 值。我不关心支架的顺序,但我想保留各自支架上 SNP 位置的相对顺序和间距。我只是想粗略地可视化有多少基因组区域与表型显着相关。例如:
我的数据是这样的:
SCAFFOLD POSITION
1 8967
1 8986
1 9002
1 9025
1 9064
2 60995
2 61091
2 61642
2 61898
2 61921
2 62034
2 62133
2 62202
2 62219
2 62220
3 731894
3 731907
3 731962
3 731999
3 732000
3 732050
3 732076
3 732097
我想编写一个 perl 代码来创建第三列,该列保留同一支架上 SNP 之间的距离,同时任意地将支架间隔一定数量(下例中为 100):
SCAFFOLD POSITION CONTINUOUS_AXIS
1 8967 8967
1 8986 8986
1 9002 9002
1 9025 9025
1 9064 9064
2 60995 9164
2 61091 9260
2 61642 9811
2 61898 10067
2 61921 10090
2 62034 10203
2 62133 10302
2 62202 10371
2 62219 10388
2 62220 10389
3 731894 10489
3 731907 10502
3 731962 10557
3 731999 10594
3 732000 10595
3 732050 10645
3 732076 10671
3 732097 10692
感谢任何可能有好的策略的人。
像下面这样的东西应该可以工作:
#!/usr/bin/env perl
use strict;
use warnings;
use constant SCAFFOLD_SPACING => 100;
my ($last_scaffold, $last_position, $continuous_axis, $found_data);
my $input = './input';
open my $fh, "<$input"
or die "Unable to open '$input' for reading : $!";
print join( "\t", qw( SCAFFOLD POSITION CONTINUOUS_AXIS ) ) . "\n"; # Output Header
while (<$fh>) {
next unless m|\d|; # Skip non-data lines
my ($scaffold, $position) = split /\s+/; # Split on whitespace
unless ($found_data++) {
# Initialize
$last_scaffold = $scaffold; # Set to first data value
$last_position = $position; # Set to first data value
$continuous_axis = $position; # Start continuous axis at first position
}
my $position_diff = $position - $last_position;
my $scaffold_diff = $scaffold - $last_scaffold;
if ($scaffold_diff == 0) {
$continuous_axis += $position_diff;
} else {
$continuous_axis += SCAFFOLD_SPACING;
}
print join( "\t", $scaffold, $position, $continuous_axis ) . "\n";
# Update
$last_scaffold = $scaffold;
$last_position = $position;
}