Java 生物信息学——获取字符串中多个特定词的所有索引
Java Bioinformatics - get all indexes of multiple specific words in a string
我在大学的生物信息学课程中有一个项目,我项目中的一件事是基因预测。
我今天的问题是如何获取字符串中多个特定单词的所有索引。例如,在我这里的例子中,我想找到所有出现的
起始密码子 ("AUG")
和终止密码子 ("UAA","UAG", "UGA")
并使用它们来预测基因,只需尝试 开放阅读框 (ORF)
这是我的初始代码:
private void jButton3ActionPerformed(java.awt.event.ActionEvent evt) {
// TODO add your handling code here:
// textArea1.setText(null);\
String str = jTextField1.getText(), y = "", gene = "", dnax = "", text = "";
SymbolList dna = null;
int start_codon_index = -1, stop_codon_index = -1;
if ("".equals(str)) {
jTextArea1.setText("No DNA strand entered.. ");
} else {
if (checksum(str) == 100) {
try {
dna = DNATools.createDNA(str);
} catch (IllegalSymbolException ex) {
Logger.getLogger(m.class.getName()).log(Level.SEVERE, null, ex);
}
try {
dna = DNATools.toRNA(dna);
} catch (IllegalAlphabetException ex) {
Logger.getLogger(m.class.getName()).log(Level.SEVERE, null, ex);
}
dnax = dna.seqString().toUpperCase();
if (dnax.length() % 3 != 0) {
if (dnax.length() % 3 == 1) {
dnax += "-";
}
if (dnax.length() % 3 == 2) {
dnax += "-";
}
}
// System.out.println(dnax);
for (int g = 0; g < dnax.length(); g += 3) {
y = dnax.substring(g, g + 3);
if ("AUG".equals(y)) {
start_codon_index = g;
} else if (start_codon_index != -1 && ("UGA".equals(y) || "UAG".equals(y) || "UAA".equals(y))) {
stop_codon_index = g + 3;
}
}
if (stop_codon_index != -1 && start_codon_index != -1) {
String k = "";
int a = 0;
for (a = start_codon_index; a < stop_codon_index; a++) {
gene += dnax.charAt(a);
}
text += "\nGene start position: " + start_codon_index + "\nGene end position: " + a + "\n Gene: " + gene;
jTextArea1.setText(text);
} else {
jTextArea1.setText("No genes found in Seq: " + dnax);
}
} else {
jTextArea1.setText("Text entered is not a DNA strand..");
}
}
}
这里是 checksum() 方法:
private static int checksum(String x) {
int i = 0, checks = 0, count = 0;
char c;
x = x.toUpperCase();
while (i < x.length()) {
c = x.charAt(i);
if (c == 'A' || c == 'T' || c == 'G' || c == 'C' || c == '-') {
count++;
}
i++;
}
try {
checks = (count / x.length()) * 100;
} catch (Exception e) {
e.printStackTrace();
}
return checks;
}
我尝试过其他解决方案,但没有任何效果。欢迎任何 help/suggestion。
我想你是在问如何找到那些特定密码子的索引? dnax 是您要检查的字符串吗?
您可以使用 indexOf(String str, int fromIndex)
。如果没有找到子字符串,它 returns -1。
所以也许这样的事情可能会有所帮助,
List<Integer> startCodonIndices = new ArrayList<Integer>();
int index;
for (int i=0; i+3<dnax.length(); i++) {
index = indexOf("AUG", i);
startCodonIndices.add(index);
}
这是我根据您在我的代码中的建议更改和添加的内容:
for (int i = 0; i + 3 < dnax.length(); i++) {
index = indexOf("AUG", i);
startCodonIndices.add(index);
}
List stopCodonIndices = new ArrayList();
int i2, i3, i4;
for (int j = 0; j + 3 < dnax.length(); j++) {
i2 = indexOf("UGA", j);
i3 = indexOf("UAA", j);
i4 = indexOf("UAG", j);
stopCodonIndices.add(i2);
stopCodonIndices.add(i3);
stopCodonIndices.add(i4);
}
for (int n = 0; n < dnax.length(); n++) {
for (int k = 0; k < startCodonIndices.size() - 1; k++) {
for (int h = 0; h < stopCodonIndices.size() - 1; h++) {
gene = dnax.substring(k, h);
jTextArea1.append("\n" + gene);
}
}
}
Seq=ATGACCTGA 的结果:
A
AU
AUG
AUGA
AUGAC
AUGACC
AUGACCU
AUGACCUG
AUGACCUGA
错误:
Exception in thread "AWT-EventQueue-0" java.lang.StringIndexOutOfBoundsException: String index out of range: 10
at java.lang.String.substring(String.java:1951)
at bio.m.jButton3ActionPerformed(m.java:365)
at bio.m.access0(m.java:36)
at bio.m.actionPerformed(m.java:142)
at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2346)
at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:252)
at com.jtattoo.plaf.BaseButtonListener.mouseReleased(BaseButtonListener.java:60)
at java.awt.Component.processMouseEvent(Component.java:6525)
at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
at java.awt.Component.processEvent(Component.java:6290)
at java.awt.Container.processEvent(Container.java:2234)
at java.awt.Component.dispatchEventImpl(Component.java:4881)
at java.awt.Container.dispatchEventImpl(Container.java:2292)
at java.awt.Component.dispatchEvent(Component.java:4703)
at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4898)
at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4533)
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4462)
at java.awt.Container.dispatchEventImpl(Container.java:2278)
at java.awt.Window.dispatchEventImpl(Window.java:2750)
at java.awt.Component.dispatchEvent(Component.java:4703)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:751)
at java.awt.EventQueue.access0(EventQueue.java:97)
at java.awt.EventQueue.run(EventQueue.java:702)
at java.awt.EventQueue.run(EventQueue.java:696)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain.doIntersectionPrivilege(ProtectionDomain.java:75)
at java.security.ProtectionDomain.doIntersectionPrivilege(ProtectionDomain.java:86)
at java.awt.EventQueue.run(EventQueue.java:724)
at java.awt.EventQueue.run(EventQueue.java:722)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain.doIntersectionPrivilege(ProtectionDomain.java:75)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:721)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
它returns如你所见是一个基因,但是编辑的代码有问题
编辑:代码终于完成了它应该做的事情:从我的 RNA 序列中获取起始密码子和终止密码子
这是我的最终编辑:
List startCodonIndices = new ArrayList();
int index = 0;
for (int i = 0; i + 3 < dnax.length(); i++) {
index = dnax.indexOf("AUG", i);
if (index != -1) {
startCodonIndices.add(index);
}
List stopCodonIndices = new ArrayList();
int i2, i3, i4;
for (int j = 0; j + 3 < dnax.length(); j++) {
i2 = dnax.indexOf("UGA", j);
System.out.println("i2: "+i2);
if (i2 != -1) {
stopCodonIndices.add(i2);
}
i3 = dnax.indexOf("UAA", j);
System.out.println("i3: "+i3);
if (i3 != -1) {
stopCodonIndices.add(i3);
}
i4 = dnax.indexOf("UAG", j);
System.out.println("i4: "+i4);
if (i4 != -1) {
stopCodonIndices.add(i4);
}
}
for (int v = 0; v < startCodonIndices.size(); v++) {
for (int h = 0; h < stopCodonIndices.size(); h++) {
gene = dnax.substring((int)startCodonIndices.get(v), (int)stopCodonIndices.get(h)+3);
jTextArea1.setText(gene+"\n");
}}
谢谢小伙伴们的帮助!
我在大学的生物信息学课程中有一个项目,我项目中的一件事是基因预测。
我今天的问题是如何获取字符串中多个特定单词的所有索引。例如,在我这里的例子中,我想找到所有出现的
起始密码子 ("AUG")
和终止密码子 ("UAA","UAG", "UGA")
并使用它们来预测基因,只需尝试 开放阅读框 (ORF)
这是我的初始代码:
private void jButton3ActionPerformed(java.awt.event.ActionEvent evt) {
// TODO add your handling code here:
// textArea1.setText(null);\
String str = jTextField1.getText(), y = "", gene = "", dnax = "", text = "";
SymbolList dna = null;
int start_codon_index = -1, stop_codon_index = -1;
if ("".equals(str)) {
jTextArea1.setText("No DNA strand entered.. ");
} else {
if (checksum(str) == 100) {
try {
dna = DNATools.createDNA(str);
} catch (IllegalSymbolException ex) {
Logger.getLogger(m.class.getName()).log(Level.SEVERE, null, ex);
}
try {
dna = DNATools.toRNA(dna);
} catch (IllegalAlphabetException ex) {
Logger.getLogger(m.class.getName()).log(Level.SEVERE, null, ex);
}
dnax = dna.seqString().toUpperCase();
if (dnax.length() % 3 != 0) {
if (dnax.length() % 3 == 1) {
dnax += "-";
}
if (dnax.length() % 3 == 2) {
dnax += "-";
}
}
// System.out.println(dnax);
for (int g = 0; g < dnax.length(); g += 3) {
y = dnax.substring(g, g + 3);
if ("AUG".equals(y)) {
start_codon_index = g;
} else if (start_codon_index != -1 && ("UGA".equals(y) || "UAG".equals(y) || "UAA".equals(y))) {
stop_codon_index = g + 3;
}
}
if (stop_codon_index != -1 && start_codon_index != -1) {
String k = "";
int a = 0;
for (a = start_codon_index; a < stop_codon_index; a++) {
gene += dnax.charAt(a);
}
text += "\nGene start position: " + start_codon_index + "\nGene end position: " + a + "\n Gene: " + gene;
jTextArea1.setText(text);
} else {
jTextArea1.setText("No genes found in Seq: " + dnax);
}
} else {
jTextArea1.setText("Text entered is not a DNA strand..");
}
}
}
这里是 checksum() 方法:
private static int checksum(String x) {
int i = 0, checks = 0, count = 0;
char c;
x = x.toUpperCase();
while (i < x.length()) {
c = x.charAt(i);
if (c == 'A' || c == 'T' || c == 'G' || c == 'C' || c == '-') {
count++;
}
i++;
}
try {
checks = (count / x.length()) * 100;
} catch (Exception e) {
e.printStackTrace();
}
return checks;
}
我尝试过其他解决方案,但没有任何效果。欢迎任何 help/suggestion。
我想你是在问如何找到那些特定密码子的索引? dnax 是您要检查的字符串吗?
您可以使用 indexOf(String str, int fromIndex)
。如果没有找到子字符串,它 returns -1。
所以也许这样的事情可能会有所帮助,
List<Integer> startCodonIndices = new ArrayList<Integer>();
int index;
for (int i=0; i+3<dnax.length(); i++) {
index = indexOf("AUG", i);
startCodonIndices.add(index);
}
这是我根据您在我的代码中的建议更改和添加的内容:
for (int i = 0; i + 3 < dnax.length(); i++) {
index = indexOf("AUG", i);
startCodonIndices.add(index);
}
List stopCodonIndices = new ArrayList();
int i2, i3, i4;
for (int j = 0; j + 3 < dnax.length(); j++) {
i2 = indexOf("UGA", j);
i3 = indexOf("UAA", j);
i4 = indexOf("UAG", j);
stopCodonIndices.add(i2);
stopCodonIndices.add(i3);
stopCodonIndices.add(i4);
}
for (int n = 0; n < dnax.length(); n++) {
for (int k = 0; k < startCodonIndices.size() - 1; k++) {
for (int h = 0; h < stopCodonIndices.size() - 1; h++) {
gene = dnax.substring(k, h);
jTextArea1.append("\n" + gene);
}
}
}
Seq=ATGACCTGA 的结果:
A
AU
AUG
AUGA
AUGAC
AUGACC
AUGACCU
AUGACCUG
AUGACCUGA
错误:
Exception in thread "AWT-EventQueue-0" java.lang.StringIndexOutOfBoundsException: String index out of range: 10
at java.lang.String.substring(String.java:1951)
at bio.m.jButton3ActionPerformed(m.java:365)
at bio.m.access0(m.java:36)
at bio.m.actionPerformed(m.java:142)
at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2346)
at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:252)
at com.jtattoo.plaf.BaseButtonListener.mouseReleased(BaseButtonListener.java:60)
at java.awt.Component.processMouseEvent(Component.java:6525)
at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
at java.awt.Component.processEvent(Component.java:6290)
at java.awt.Container.processEvent(Container.java:2234)
at java.awt.Component.dispatchEventImpl(Component.java:4881)
at java.awt.Container.dispatchEventImpl(Container.java:2292)
at java.awt.Component.dispatchEvent(Component.java:4703)
at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4898)
at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4533)
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4462)
at java.awt.Container.dispatchEventImpl(Container.java:2278)
at java.awt.Window.dispatchEventImpl(Window.java:2750)
at java.awt.Component.dispatchEvent(Component.java:4703)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:751)
at java.awt.EventQueue.access0(EventQueue.java:97)
at java.awt.EventQueue.run(EventQueue.java:702)
at java.awt.EventQueue.run(EventQueue.java:696)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain.doIntersectionPrivilege(ProtectionDomain.java:75)
at java.security.ProtectionDomain.doIntersectionPrivilege(ProtectionDomain.java:86)
at java.awt.EventQueue.run(EventQueue.java:724)
at java.awt.EventQueue.run(EventQueue.java:722)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain.doIntersectionPrivilege(ProtectionDomain.java:75)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:721)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
它returns如你所见是一个基因,但是编辑的代码有问题
编辑:代码终于完成了它应该做的事情:从我的 RNA 序列中获取起始密码子和终止密码子 这是我的最终编辑:
List startCodonIndices = new ArrayList();
int index = 0;
for (int i = 0; i + 3 < dnax.length(); i++) {
index = dnax.indexOf("AUG", i);
if (index != -1) {
startCodonIndices.add(index);
}
List stopCodonIndices = new ArrayList();
int i2, i3, i4;
for (int j = 0; j + 3 < dnax.length(); j++) {
i2 = dnax.indexOf("UGA", j);
System.out.println("i2: "+i2);
if (i2 != -1) {
stopCodonIndices.add(i2);
}
i3 = dnax.indexOf("UAA", j);
System.out.println("i3: "+i3);
if (i3 != -1) {
stopCodonIndices.add(i3);
}
i4 = dnax.indexOf("UAG", j);
System.out.println("i4: "+i4);
if (i4 != -1) {
stopCodonIndices.add(i4);
}
}
for (int v = 0; v < startCodonIndices.size(); v++) {
for (int h = 0; h < stopCodonIndices.size(); h++) {
gene = dnax.substring((int)startCodonIndices.get(v), (int)stopCodonIndices.get(h)+3);
jTextArea1.setText(gene+"\n");
}}
谢谢小伙伴们的帮助!