在 Java 中没有 Scanner.class 的 k-shingles 中分隔文本
Separate text in k-shingles without Scanner.class in Java
我正在尝试在 k-shingles 中分隔文本,遗憾的是我无法使用扫描仪。如果最后一个木瓦太短,我想用“_”填充。我已经走到这一步了:
public class Projektarbeit {
public static void main(String[] args) {
testKShingling(7, "ddssggeezzfff");
}
public static void testKShingling(int k, String source) {
//first eliminate whitespace and then fill up with withespaces to match target.length%shingle.length() == 0
String txt = source.replaceAll("\s", "");
//get shingles
ArrayList<String> shingles = new ArrayList<String>();
int i;
int l = txt.length();
String shingle = "";
if (k == 1) {
for(i = 0; i < l; i++){
shingle = txt.substring(i, i + k);
shingles.add(shingle);
};
}
else {
for(i = 0; i < l; i += k - 1){
try {
shingle = txt.substring(i, i + k);
shingles.add(shingle);
}
catch(Exception e) {
txt = txt.concat("_");
i -= k - 1;
};
};
}
System.out.println(shingles);
}
}
Output: [ddssgge, eezzfff, f______]
它几乎可以工作,但是在示例中使用给定参数的情况下,最后一个木瓦不是必需的(它应该是 [ddssgge, eezzfff]
知道怎么做更漂亮吗?
要使发布的代码正常工作,您只需添加 break
和 catch 块的末尾:
catch(Exception e) {
txt = txt.concat("_");
i -= k - 1;
break;
};
话虽如此,我不会使用异常来控制程序。例外只是:应该用于 运行 时间错误。
通过控制循环参数避免StringIndexOutOfBoundsException
:
public static void main(String[] args) {
testKShingling(3, "ddssggeezzfff");
}
public static void testKShingling(int substringLength, String source) {
//todo validate input
String txt = source.replaceAll("\s", "");
//get shingles
ArrayList<String> shingles = new ArrayList<>();
int stringLength = txt.length();
if (substringLength == 1) {
for(int index = 0; index < stringLength; index++){
String shingle = txt.substring(index, index + substringLength);
shingles.add(shingle);
};
}
else {
for(int index = 0; index < stringLength -1 ; index += substringLength - 1){
int endIndex = Math.min(index + substringLength, stringLength);
String shingle = txt.substring(index, endIndex);
if(shingle.length() < substringLength){
shingle = extend(shingle, substringLength);
}
shingles.add(shingle);
};
}
System.out.println(shingles);
}
private static String extend(String shingle, int toLength) {
String s = shingle;
for(int index = 0; index < toLength - shingle.length(); index ++){
s = s.concat("_");
}
return s;
}
testKShingling
的替代实现:
public static void testKShingling(int substringLength, String source) {
//todo validate input
String txt = source.replaceAll("\s", "");
ArrayList<String> shingles = new ArrayList<>();
if (substringLength == 1) {
for(char c : txt.toCharArray()){
shingles.add(Character.toString(c));
};
}
else {
while(txt.length() > substringLength) {
String shingle = txt.substring(0, substringLength);
shingles.add(shingle);
txt = txt.substring(substringLength - 1); //remove first substringLength - 1 chars
}
if(txt.length() < substringLength){ //check the length of what's left
txt = extend(txt, substringLength);
}
shingles.add(txt); //add what's left
}
System.out.println(shingles);
}
我正在尝试在 k-shingles 中分隔文本,遗憾的是我无法使用扫描仪。如果最后一个木瓦太短,我想用“_”填充。我已经走到这一步了:
public class Projektarbeit {
public static void main(String[] args) {
testKShingling(7, "ddssggeezzfff");
}
public static void testKShingling(int k, String source) {
//first eliminate whitespace and then fill up with withespaces to match target.length%shingle.length() == 0
String txt = source.replaceAll("\s", "");
//get shingles
ArrayList<String> shingles = new ArrayList<String>();
int i;
int l = txt.length();
String shingle = "";
if (k == 1) {
for(i = 0; i < l; i++){
shingle = txt.substring(i, i + k);
shingles.add(shingle);
};
}
else {
for(i = 0; i < l; i += k - 1){
try {
shingle = txt.substring(i, i + k);
shingles.add(shingle);
}
catch(Exception e) {
txt = txt.concat("_");
i -= k - 1;
};
};
}
System.out.println(shingles);
}
}
Output: [ddssgge, eezzfff, f______]
它几乎可以工作,但是在示例中使用给定参数的情况下,最后一个木瓦不是必需的(它应该是 [ddssgge, eezzfff]
知道怎么做更漂亮吗?
要使发布的代码正常工作,您只需添加 break
和 catch 块的末尾:
catch(Exception e) {
txt = txt.concat("_");
i -= k - 1;
break;
};
话虽如此,我不会使用异常来控制程序。例外只是:应该用于 运行 时间错误。
通过控制循环参数避免StringIndexOutOfBoundsException
:
public static void main(String[] args) {
testKShingling(3, "ddssggeezzfff");
}
public static void testKShingling(int substringLength, String source) {
//todo validate input
String txt = source.replaceAll("\s", "");
//get shingles
ArrayList<String> shingles = new ArrayList<>();
int stringLength = txt.length();
if (substringLength == 1) {
for(int index = 0; index < stringLength; index++){
String shingle = txt.substring(index, index + substringLength);
shingles.add(shingle);
};
}
else {
for(int index = 0; index < stringLength -1 ; index += substringLength - 1){
int endIndex = Math.min(index + substringLength, stringLength);
String shingle = txt.substring(index, endIndex);
if(shingle.length() < substringLength){
shingle = extend(shingle, substringLength);
}
shingles.add(shingle);
};
}
System.out.println(shingles);
}
private static String extend(String shingle, int toLength) {
String s = shingle;
for(int index = 0; index < toLength - shingle.length(); index ++){
s = s.concat("_");
}
return s;
}
testKShingling
的替代实现:
public static void testKShingling(int substringLength, String source) {
//todo validate input
String txt = source.replaceAll("\s", "");
ArrayList<String> shingles = new ArrayList<>();
if (substringLength == 1) {
for(char c : txt.toCharArray()){
shingles.add(Character.toString(c));
};
}
else {
while(txt.length() > substringLength) {
String shingle = txt.substring(0, substringLength);
shingles.add(shingle);
txt = txt.substring(substringLength - 1); //remove first substringLength - 1 chars
}
if(txt.length() < substringLength){ //check the length of what's left
txt = extend(txt, substringLength);
}
shingles.add(txt); //add what's left
}
System.out.println(shingles);
}