如何删除第一个选项卡后的所有字符?
How can I remove all characters after the first instance of a tab?
我有一个很大的文本文件,大约有 200,000 行单词翻译。我想保留选项卡后面显示的翻译文本。
abaxial van osovine
abbacy opatstvo
abbaino kora
abbatial opatski
abbe opat
abbé opat
abbé sveæenik
hematological parameters hematološki pokazatelji
如何去除第一个选项卡实例之前的所有字符?
因此您可以使用正则表达式非常有效地处理字符串。
进口java.util.regex.Matcher;
导入 java.util.regex.Pattern;
public class 主 {
/**
* Splits the line related to translation into 2 groups by splitting it on
* two spaces " " and storing the splits into two named groups (key,
* value)</br>
* Group1 (key) is the text before the two spaces.</br>
* Group2 (value) is the text after the two spaces.</br>
*/
private static final Pattern TRANSLATION_PATTERN = Pattern.compile("<key>.*)\s\s+(<value>.*)");
public static String grabTextAfterTwoSpaces(String input) {
Matcher matcher = TRANSLATION_PATTERN.matcher(input);
/*
* You have to call .matches() for the regex to actually be applied.
*/
if (!matcher.matches()) {
throw new IllegalArgumentException(String.format("Provided input:[%s] did not contain two spaces", input));
}
return matcher.group("value");
}
public static void main(String[] args) {
System.out.println(grabTextAfterTwoSpaces("abaxial van osovine"));
System.out.println(grabTextAfterTwoSpaces("abbacy opatstvo"));
System.out.println(grabTextAfterTwoSpaces("abbaino kora"));
System.out.println(grabTextAfterTwoSpaces("abbatial opatski"));
System.out.println(grabTextAfterTwoSpaces("abbe opat"));
System.out.println(grabTextAfterTwoSpaces("abbé opat"));
System.out.println(grabTextAfterTwoSpaces("abbé sveæenik"));
System.out.println(grabTextAfterTwoSpaces("abbacy opatstvo"));
System.out.println(grabTextAfterTwoSpaces("hematological parameters hematološki pokazatelji"));
}
}
因此,如果您对组使用 "value",您将获得 2+ 个空格后的所有内容。
osovine
opatstvo
kora
opatski
opat
opat
sveæenik
opatstvo
hematološki pokazatelji
您可以使用此正则表达式匹配翻译前的所有内容:
.+? {2,}
在线尝试这个正则表达式:https://regex101.com/r/P0TY1k/1
使用此正则表达式在您的字符串上调用 replaceAll
。
yourString.replaceAll(".+? {2,}", "");
编辑:如果分隔符不是 2 个空格而是一个制表符,您可以试试这个正则表达式:
.+?(?: {2,}|\t)
我有一个很大的文本文件,大约有 200,000 行单词翻译。我想保留选项卡后面显示的翻译文本。
abaxial van osovine
abbacy opatstvo
abbaino kora
abbatial opatski
abbe opat
abbé opat
abbé sveæenik
hematological parameters hematološki pokazatelji
如何去除第一个选项卡实例之前的所有字符?
因此您可以使用正则表达式非常有效地处理字符串。
进口java.util.regex.Matcher; 导入 java.util.regex.Pattern;
public class 主 {
/**
* Splits the line related to translation into 2 groups by splitting it on
* two spaces " " and storing the splits into two named groups (key,
* value)</br>
* Group1 (key) is the text before the two spaces.</br>
* Group2 (value) is the text after the two spaces.</br>
*/
private static final Pattern TRANSLATION_PATTERN = Pattern.compile("<key>.*)\s\s+(<value>.*)");
public static String grabTextAfterTwoSpaces(String input) {
Matcher matcher = TRANSLATION_PATTERN.matcher(input);
/*
* You have to call .matches() for the regex to actually be applied.
*/
if (!matcher.matches()) {
throw new IllegalArgumentException(String.format("Provided input:[%s] did not contain two spaces", input));
}
return matcher.group("value");
}
public static void main(String[] args) {
System.out.println(grabTextAfterTwoSpaces("abaxial van osovine"));
System.out.println(grabTextAfterTwoSpaces("abbacy opatstvo"));
System.out.println(grabTextAfterTwoSpaces("abbaino kora"));
System.out.println(grabTextAfterTwoSpaces("abbatial opatski"));
System.out.println(grabTextAfterTwoSpaces("abbe opat"));
System.out.println(grabTextAfterTwoSpaces("abbé opat"));
System.out.println(grabTextAfterTwoSpaces("abbé sveæenik"));
System.out.println(grabTextAfterTwoSpaces("abbacy opatstvo"));
System.out.println(grabTextAfterTwoSpaces("hematological parameters hematološki pokazatelji"));
}
}
因此,如果您对组使用 "value",您将获得 2+ 个空格后的所有内容。
osovine
opatstvo
kora
opatski
opat
opat
sveæenik
opatstvo
hematološki pokazatelji
您可以使用此正则表达式匹配翻译前的所有内容:
.+? {2,}
在线尝试这个正则表达式:https://regex101.com/r/P0TY1k/1
使用此正则表达式在您的字符串上调用 replaceAll
。
yourString.replaceAll(".+? {2,}", "");
编辑:如果分隔符不是 2 个空格而是一个制表符,您可以试试这个正则表达式:
.+?(?: {2,}|\t)