处理 1gb 数据文件以读取单词并计算最大长度单词?
handle 1gb data file to read words and calculate max length word?
我写了这段代码,但它在 1gb 大小的文件上会失败。
public class TestFiles {
public static void main(String[] args) {
int minLength = Integer.MAX_VALUE;
int maxLength = Integer.MIN_VALUE;
String minWord = "";
String maxWord = "";
List<String> words = new ArrayList<>();
try {
File myObj = new File("C:\Users\Downloads\java.txt");
Scanner myReader = new Scanner(myObj);
while (myReader.hasNextLine()) {
String data = myReader.nextLine();
String[] dataArray = data.split(" ");
List<String> list = Arrays.asList(dataArray);
for (String s : list) {
if (s.length() < minLength) {
minLength = s.length();
minWord = s;
} else if (s.length() > maxLength) {
maxLength = s.length();
maxWord = s;
}
}
}
myReader.close();
} catch (Exception e) {
// TODO: handle exception
}
System.out.println("min length " + minLength + " - max lenth " + maxLength);
System.out.println("min length word " + minWord + " - max lenth word " + maxLength);
}
}
你能回答一下吗?我该如何解决这个问题?
int len = s.length();
if (len < minLength) {
minLength = len;
minWord = s;
}
if (len > maxLength) {
maxLength = len;
maxWord = s;
}
如果大字符串位于第一行的第一个索引处,您的测试用例将失败。
顺便说一句,我认为你应该将大测试分解为小测试,尝试为单行查找小字符串和大字符串,然后是多行和文件中的数据
当 1gb 的单词被压缩成 1 行时,问题就变得很明显了!*
解决方案:不是“逐行”处理输入,而是“逐字处理”,足够高效! ;)
瞧:
public class TestFiles {
public static void main(String[] args) {
int minLength = Integer.MAX_VALUE;
int maxLength = Integer.MIN_VALUE;
String minWord = "";
String maxWord = "";
try {
File myObj = new File("C:\Users\Downloads\java.txt");
Scanner myReader = new Scanner(myObj);
while (myReader.hasNext()) {
String word = myReader.next();
if (word.length() < minLength) {
minLength = word.length();
minWord = word;
}
if (word.length() > maxLength) {
maxLength = word.length();
maxWord = word;
}
}
}
myReader.close();
} catch (Exception e) {
// TODO: handle exception
}
System.out.println("min length " + minLength + " - max lenth " + maxLength);
System.out.println("min length word " + minWord + " - max lenth word " + maxLength);
}
}
*当“很多”字在一行时,我们可能会在这里遇到问题:
myReader.hasNextLine()
,
String data = myReader.nextLine()
和
String[] dataArray = data.split(" ");
@huy 的回答也正确:else
“对大多数情况有效”,但“对极端情况不正确”。
我写了这段代码,但它在 1gb 大小的文件上会失败。
public class TestFiles {
public static void main(String[] args) {
int minLength = Integer.MAX_VALUE;
int maxLength = Integer.MIN_VALUE;
String minWord = "";
String maxWord = "";
List<String> words = new ArrayList<>();
try {
File myObj = new File("C:\Users\Downloads\java.txt");
Scanner myReader = new Scanner(myObj);
while (myReader.hasNextLine()) {
String data = myReader.nextLine();
String[] dataArray = data.split(" ");
List<String> list = Arrays.asList(dataArray);
for (String s : list) {
if (s.length() < minLength) {
minLength = s.length();
minWord = s;
} else if (s.length() > maxLength) {
maxLength = s.length();
maxWord = s;
}
}
}
myReader.close();
} catch (Exception e) {
// TODO: handle exception
}
System.out.println("min length " + minLength + " - max lenth " + maxLength);
System.out.println("min length word " + minWord + " - max lenth word " + maxLength);
}
}
你能回答一下吗?我该如何解决这个问题?
int len = s.length();
if (len < minLength) {
minLength = len;
minWord = s;
}
if (len > maxLength) {
maxLength = len;
maxWord = s;
}
如果大字符串位于第一行的第一个索引处,您的测试用例将失败。
顺便说一句,我认为你应该将大测试分解为小测试,尝试为单行查找小字符串和大字符串,然后是多行和文件中的数据
当 1gb 的单词被压缩成 1 行时,问题就变得很明显了!*
解决方案:不是“逐行”处理输入,而是“逐字处理”,足够高效! ;)
瞧:
public class TestFiles {
public static void main(String[] args) {
int minLength = Integer.MAX_VALUE;
int maxLength = Integer.MIN_VALUE;
String minWord = "";
String maxWord = "";
try {
File myObj = new File("C:\Users\Downloads\java.txt");
Scanner myReader = new Scanner(myObj);
while (myReader.hasNext()) {
String word = myReader.next();
if (word.length() < minLength) {
minLength = word.length();
minWord = word;
}
if (word.length() > maxLength) {
maxLength = word.length();
maxWord = word;
}
}
}
myReader.close();
} catch (Exception e) {
// TODO: handle exception
}
System.out.println("min length " + minLength + " - max lenth " + maxLength);
System.out.println("min length word " + minWord + " - max lenth word " + maxLength);
}
}
*当“很多”字在一行时,我们可能会在这里遇到问题:
myReader.hasNextLine()
,String data = myReader.nextLine()
和String[] dataArray = data.split(" ");
@huy 的回答也正确:else
“对大多数情况有效”,但“对极端情况不正确”。