如何使用 curl 提交作业(序列文件)并在网络服务器中检索结果
how to submit a job (sequence file) and retrieve result in webserver using curl
我有一个fasta序列
>seq1
UUUAAAAUCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAA
想将此提交给网络服务器
http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/index.jsp
然后从以下位置检索结果(仅结果 table):http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/showresult.jsp
到输出文本文件。
我尝试了以下无效的代码。
post 工作
curl -X POST -d 'seq1\nUUUAAAAUCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAA' http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/ -H "Content-Type: application/json"
得到结果
curl -X POST http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/showresult.jsp/response -H "Content-Type: text/plain";echo
你能帮忙吗。我有这样的 1000 个序列。我需要从 Linux 终端自动执行它。
附加了一个不能完全工作的 perl 脚本。有什么建议,编辑?
#!/usr/bin/perl
use LWP::Simple;
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;
# Script parameters
# Script hidden parameters
$idCon="12345";
# Sequences source file
# IMPORTANT! use standard fasta file format
$inputFile="file.fa";
# Maximum number of sequences per request
$maxNumOfSequences=1;
# If you want to skip the N first requests
$skipRequests=0;
# Output files prefix
$outputFile="result_ssf";
# Promoter script URL
$URL = "http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/";
# Header and bottom line
$header = "sequenceName; primaryStru; secondStru; Pvalue; Classification\n";
#$URL2 = "http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/showresult.jsp";
##################################################################################
# The browser
printf "Creating the browser...\n";
$browser = LWP::UserAgent->new();
$browser->timeout(30);
printf "Opening input file...\n";
open(SEQUENCES, "<".$inputFile) or die $!;
printf "Opening output file...\n";
open OUTPUTFILE, ">".$outputFile or die $!;
printf OUTPUTFILE $header;
$sequences = "";
$sequenceName="";
$currentSec=0;
$currentRequest=0;
printf "Sending request...\n";
while(<SEQUENCES>) {
if ($sequenceName eq "") {
$sequenceName = $_;
} else {
$sequences = $sequences.$sequenceName.$_;
$currentSec = $currentSec+1;
$sequenceName = "";
}
if ($currentSec == $maxNumOfSequences) {
$currentRequest=$currentRequest+1;
if ($currentRequest > $skipRequests ) {
printf " # Request num. ".$currentRequest."\n";
my $response = $browser->post($URL,
[ "Predict" => $sequences,
"uploadFile" => ""
],
"Content_Type" => "form-data" );
if ($response->is_error()) {
printf "%s\n", $response->status_line;
exit 1;
}
$response = $browser->post($URL, ["showresult.jsp"]);
if ($response->is_error()) {
printf "%s\n", $response->status_line;
exit 1;
}
$contents = $response->content();
#$contents =~ s/(<BR>\n|<BODY>|<\/BODY>|<HEAD>|<\/HEAD>|<HTML>|<\/HTML>|<META(.*)>|<TITLE>(.*)<\/TITLE>)//ig;
$contents =~ s/(<BR>\n|<BODY>|<\/BODY>|<HEAD>|<\/HEAD>|<HTML>|<\/HTML>|<META(.*)>|<table>(.*)<\/table>)//ig;
if ($contents =~ m/$header(.*)\n\n-/s) {
print OUTPUTFILE ;
print OUTPUTFILE "\n";
}
}
$currentSec = 0;
$sequences = "";
}
}
close OUTPUTFILE;
close SEQUENCES;
您需要使用-F
to send multipart/form-data. Because of multiline string in the testdata
parameter you will need to store the data in a file before running the curl命令。
您还需要在两次调用之间存储 cookie,因为服务器以这种方式存储有关作业的信息(针对哪个结果进行处理):
echo -ne ">seq1\nUUUAAAAUCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAA" > test.txt
curl -v -c cookie.txt 'http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/Receive.jsp' \
-F "testdata=<test.txt" -F "Predict=Predict" -F "uploadFile="
curl -b cookie.txt 'http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/showresult.jsp'
您也可以删除 testdata
参数以仅使用 uploadFile
:
echo -ne ">seq1\nUUUAAAAUCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAAC" > test.txt
curl -v -c cookie.txt 'http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/Receive.jsp' \
-F "Predict=Predict" -F "uploadFile=@test.txt"
curl -b cookie.txt 'http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/showresult.jsp'
我有一个fasta序列
>seq1
UUUAAAAUCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAA
想将此提交给网络服务器 http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/index.jsp
然后从以下位置检索结果(仅结果 table):http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/showresult.jsp 到输出文本文件。
我尝试了以下无效的代码。
post 工作
curl -X POST -d 'seq1\nUUUAAAAUCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAA' http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/ -H "Content-Type: application/json"
得到结果
curl -X POST http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/showresult.jsp/response -H "Content-Type: text/plain";echo
你能帮忙吗。我有这样的 1000 个序列。我需要从 Linux 终端自动执行它。
附加了一个不能完全工作的 perl 脚本。有什么建议,编辑?
#!/usr/bin/perl
use LWP::Simple;
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;
# Script parameters
# Script hidden parameters
$idCon="12345";
# Sequences source file
# IMPORTANT! use standard fasta file format
$inputFile="file.fa";
# Maximum number of sequences per request
$maxNumOfSequences=1;
# If you want to skip the N first requests
$skipRequests=0;
# Output files prefix
$outputFile="result_ssf";
# Promoter script URL
$URL = "http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/";
# Header and bottom line
$header = "sequenceName; primaryStru; secondStru; Pvalue; Classification\n";
#$URL2 = "http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/showresult.jsp";
##################################################################################
# The browser
printf "Creating the browser...\n";
$browser = LWP::UserAgent->new();
$browser->timeout(30);
printf "Opening input file...\n";
open(SEQUENCES, "<".$inputFile) or die $!;
printf "Opening output file...\n";
open OUTPUTFILE, ">".$outputFile or die $!;
printf OUTPUTFILE $header;
$sequences = "";
$sequenceName="";
$currentSec=0;
$currentRequest=0;
printf "Sending request...\n";
while(<SEQUENCES>) {
if ($sequenceName eq "") {
$sequenceName = $_;
} else {
$sequences = $sequences.$sequenceName.$_;
$currentSec = $currentSec+1;
$sequenceName = "";
}
if ($currentSec == $maxNumOfSequences) {
$currentRequest=$currentRequest+1;
if ($currentRequest > $skipRequests ) {
printf " # Request num. ".$currentRequest."\n";
my $response = $browser->post($URL,
[ "Predict" => $sequences,
"uploadFile" => ""
],
"Content_Type" => "form-data" );
if ($response->is_error()) {
printf "%s\n", $response->status_line;
exit 1;
}
$response = $browser->post($URL, ["showresult.jsp"]);
if ($response->is_error()) {
printf "%s\n", $response->status_line;
exit 1;
}
$contents = $response->content();
#$contents =~ s/(<BR>\n|<BODY>|<\/BODY>|<HEAD>|<\/HEAD>|<HTML>|<\/HTML>|<META(.*)>|<TITLE>(.*)<\/TITLE>)//ig;
$contents =~ s/(<BR>\n|<BODY>|<\/BODY>|<HEAD>|<\/HEAD>|<HTML>|<\/HTML>|<META(.*)>|<table>(.*)<\/table>)//ig;
if ($contents =~ m/$header(.*)\n\n-/s) {
print OUTPUTFILE ;
print OUTPUTFILE "\n";
}
}
$currentSec = 0;
$sequences = "";
}
}
close OUTPUTFILE;
close SEQUENCES;
您需要使用-F
to send multipart/form-data. Because of multiline string in the testdata
parameter you will need to store the data in a file before running the curl命令。
您还需要在两次调用之间存储 cookie,因为服务器以这种方式存储有关作业的信息(针对哪个结果进行处理):
echo -ne ">seq1\nUUUAAAAUCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAA" > test.txt
curl -v -c cookie.txt 'http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/Receive.jsp' \
-F "testdata=<test.txt" -F "Predict=Predict" -F "uploadFile="
curl -b cookie.txt 'http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/showresult.jsp'
您也可以删除 testdata
参数以仅使用 uploadFile
:
echo -ne ">seq1\nUUUAAAAUCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAAC" > test.txt
curl -v -c cookie.txt 'http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/Receive.jsp' \
-F "Predict=Predict" -F "uploadFile=@test.txt"
curl -b cookie.txt 'http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/showresult.jsp'