如何绘制分布在单个连续轴上数千个支架上的 SNP 的 p 值?

How can I plot p-values for SNPs that are spread across thousands of scaffolds on a single continuous axis?

我有关联映射派生的 SNP 的 P 值,这些 SNP 散布在非模型生物体中的数千个支架中。我想在曼哈顿式图上绘制每个 SNP 的 P 值。我不关心支架的顺序,但我想保留各自支架上 SNP 位置的相对顺序和间距。我只是想粗略地可视化有多少基因组区域与表型显着相关。例如:

我的数据是这样的:

SCAFFOLD    POSITION
1           8967    
1           8986    
1           9002    
1           9025    
1           9064    
2           60995   
2           61091   
2           61642   
2           61898   
2           61921   
2           62034   
2           62133   
2           62202   
2           62219   
2           62220   
3           731894  
3           731907  
3           731962  
3           731999  
3           732000  
3           732050  
3           732076  
3           732097

我想编写一个 perl 代码来创建第三列,该列保留同一支架上 SNP 之间的距离,同时任意地将支架间隔一定数量(下例中为 100):

SCAFFOLD    POSITION    CONTINUOUS_AXIS
1           8967        8967
1           8986        8986
1           9002        9002
1           9025        9025
1           9064        9064
2           60995       9164
2           61091       9260
2           61642       9811
2           61898       10067
2           61921       10090
2           62034       10203
2           62133       10302
2           62202       10371
2           62219       10388
2           62220       10389
3           731894      10489
3           731907      10502
3           731962      10557
3           731999      10594
3           732000      10595
3           732050      10645
3           732076      10671
3           732097      10692

感谢任何可能有好的策略的人。

像下面这样的东西应该可以工作:

#!/usr/bin/env perl

use strict;
use warnings;

use constant SCAFFOLD_SPACING => 100;

my ($last_scaffold, $last_position, $continuous_axis, $found_data);

my $input = './input';
open my $fh, "<$input"
    or die "Unable to open '$input' for reading : $!";

print join( "\t", qw( SCAFFOLD POSITION CONTINUOUS_AXIS ) ) . "\n"; # Output Header
while (<$fh>) {
    next unless m|\d|; # Skip non-data lines

    my ($scaffold, $position) = split /\s+/; # Split on whitespace

    unless ($found_data++) {
        # Initialize
        $last_scaffold   = $scaffold; # Set to first data value
        $last_position   = $position; # Set to first data value
        $continuous_axis = $position; # Start continuous axis at first position
    }

    my $position_diff = $position - $last_position;
    my $scaffold_diff = $scaffold - $last_scaffold;

    if ($scaffold_diff == 0) {
        $continuous_axis += $position_diff;
    } else {
        $continuous_axis += SCAFFOLD_SPACING;
    }
    print join( "\t", $scaffold, $position, $continuous_axis ) . "\n";

    # Update
    $last_scaffold = $scaffold;
    $last_position = $position;
}