通过保留 bash 中的第一行来删除与特定模式匹配的所有行
Deleting all lines matching a specific pattern by retaining the first line in bash
我想通过删除除第一行匹配模式之外的所有行来编辑 gtf 文件 'FAT1' 并修改坐标(第 3 列和第 4 列)。
#!genome-build GRCh38.p7
#!genome-version GRCh38
#!genome-date 2013-12
#!genome-build-accession NCBI:GCA_000001405.22
#!genebuild-last-updated 2016-06
1 havana exon 137682 137965 gene_id "ENSG00000239906"; gene_version "1"; gene_name "RP11-34P13.16"; gene_source "havana";
1 havana gene 139790 140339 gene_id "ENSG00000239906"; gene_version "1"; gene_name "RP11-34P13.14"; gene_source "havana";
1 havana exon 140001 140101 gene_id "ENSG00000269981"; gene_version "1"; gene_name "FAT1"; gene_source "havana";
1 havana gene 143401 145401 gene_id "ENSG00000269981"; gene_version "1"; gene_name "FAT1"; gene_source "havana";
预期输出
#!genome-build GRCh38.p7
#!genome-version GRCh38
#!genome-date 2013-12
#!genome-build-accession NCBI:GCA_000001405.22
#!genebuild-last-updated 2016-06
1 havana exon 137682 137965 gene_id "ENSG00000239906"; gene_version "1"; gene_name "RP11-34P13.16"; gene_source "havana";
1 havana gene 139790 140339 gene_id "ENSG00000239906"; gene_version "1"; gene_name "RP11-34P13.14"; gene_source "havana";
1 havana exon 147653 148000 gene_id "ENSG00000269981"; gene_version "1"; gene_name "FAT1"; gene_source "havana";
我试过这样的事情。
# Keep only the unique entry for FAT1 gene.
awk '/"ENSG00000269981"/&&c++ {next} 1' ref.gtf > ref_edit.gtf
#then manually edit the coordinates in vim editor
但我相信会有更合理的解决方案。
能否请您尝试以下。
awk -v new_fourth_col="147653" -v new_fifth_col="148000" '
BEGIN{
OFS="\t"
}
/gene_name "FAT1"/{
if(++count==1){
=new_fourth_col
=new_fifth_col
print
}
next
}
{
=
print
}
' Input_file
另外,我已将您的输出设为制表符分隔。
我想通过删除除第一行匹配模式之外的所有行来编辑 gtf 文件 'FAT1' 并修改坐标(第 3 列和第 4 列)。
#!genome-build GRCh38.p7
#!genome-version GRCh38
#!genome-date 2013-12
#!genome-build-accession NCBI:GCA_000001405.22
#!genebuild-last-updated 2016-06
1 havana exon 137682 137965 gene_id "ENSG00000239906"; gene_version "1"; gene_name "RP11-34P13.16"; gene_source "havana";
1 havana gene 139790 140339 gene_id "ENSG00000239906"; gene_version "1"; gene_name "RP11-34P13.14"; gene_source "havana";
1 havana exon 140001 140101 gene_id "ENSG00000269981"; gene_version "1"; gene_name "FAT1"; gene_source "havana";
1 havana gene 143401 145401 gene_id "ENSG00000269981"; gene_version "1"; gene_name "FAT1"; gene_source "havana";
预期输出
#!genome-build GRCh38.p7
#!genome-version GRCh38
#!genome-date 2013-12
#!genome-build-accession NCBI:GCA_000001405.22
#!genebuild-last-updated 2016-06
1 havana exon 137682 137965 gene_id "ENSG00000239906"; gene_version "1"; gene_name "RP11-34P13.16"; gene_source "havana";
1 havana gene 139790 140339 gene_id "ENSG00000239906"; gene_version "1"; gene_name "RP11-34P13.14"; gene_source "havana";
1 havana exon 147653 148000 gene_id "ENSG00000269981"; gene_version "1"; gene_name "FAT1"; gene_source "havana";
我试过这样的事情。
# Keep only the unique entry for FAT1 gene.
awk '/"ENSG00000269981"/&&c++ {next} 1' ref.gtf > ref_edit.gtf
#then manually edit the coordinates in vim editor
但我相信会有更合理的解决方案。
能否请您尝试以下。
awk -v new_fourth_col="147653" -v new_fifth_col="148000" '
BEGIN{
OFS="\t"
}
/gene_name "FAT1"/{
if(++count==1){
=new_fourth_col
=new_fifth_col
print
}
next
}
{
=
print
}
' Input_file
另外,我已将您的输出设为制表符分隔。