Java: 检测csv或txt文件的分隔符
Java: Detect the delimiter of a csv or txt file
我看到这个问题已经被问过好几次了,但是他们用的是其他语言,我看不懂答案。
我正在通过套接字接收 .csv 或 .txt 文件。
有什么方法可以检测 CSV 或 TXT 文件中一行的分隔符或 "splitter"?
这是处理文件写入的服务器代码,
try{
final ServerSocket server = new ServerSocket(8998);
socket = server.accept();
File sdcard = Environment.getExternalStorageDirectory();
File myFile = new File(sdcard,"TestReceived"+curDate+".csv");
final BufferedReader br = new BufferedReader(new InputStreamReader(socket.getInputStream()));
final PrintWriter pw = new PrintWriter(new FileWriter(myFile));
String line;
String[] wordsarray;
int bc = 0;
int dc = 0;
int pq = 0;
int rq = 0;
int id = 0;
line = br.readLine();
wordsarray = line.split(",");
for (int x = 0; x<wordsarray.length; x++){
switch(wordsarray[x]){
case "COLUMN NAME A": id = x;
break;
case "COLUMN NAME B": bc = x;
break;
case "COLUMN NAME C": dc = x;
break;
case "COLUMN NAME D": pq = x;
break;
case "COLUMN NAME E": rq = x;
break;
}
}
pw.println(wordsarray[dc]+"\t"+wordsarray[rq]+"\t"+wordsarray[pq]+"\t"+wordsarray[bc]+"\t"+wordsarray[id]);
for (line = br.readLine(); line != null; line = br.readLine()) {
wordsarray = line.split(",");
pw.println(wordsarray[dc]+"\t"+wordsarray[rq]+"\t"+wordsarray[pq]+"\t"+wordsarray[bc]+"\t"+wordsarray[id]);
}
pw.flush();
pw.close();
br.close();
socket.close();
server.close();
}
catch (Exception e){
e.printStackTrace();
}
如果我在 line.split();
上加一个逗号并且文件有不同的分隔符,它会产生重复的行,我什至不知道为什么会这样
COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E
但是如果文件有匹配的逗号分隔符,它会产生正确的输出。
COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E
有什么方法可以自动检测文件的分隔符,这样我就不必担心文件使用的是哪个分隔符?或者有更好的解决方案吗?
使用 BufferedReader
, place a mark(...)
, read the first line。如果该行包含 \t
制表符,则您的文件是制表符分隔的,否则假定它是逗号分隔的。
然后使用 CSV/TSV 解析器解析文件,例如Apache Commons CSV.
try (BufferedReader in = Files.newBufferedReader(Paths.get(filename))) {
in.mark(1024);
String line = in.readLine();
if (line == null)
throw new IOException("File is empty: " + filename);
CSVFormat fileFormat = (line.indexOf('\t') != -1 ? CSVFormat.TDF
: CSVFormat.RFC4180)
.withHeader();
in.reset();
for (CSVRecord record : fileFormat.parse(in)) {
String lastName = record.get("Last Name");
String firstName = record.get("First Name");
...
}
}
我看到这个问题已经被问过好几次了,但是他们用的是其他语言,我看不懂答案。
我正在通过套接字接收 .csv 或 .txt 文件。 有什么方法可以检测 CSV 或 TXT 文件中一行的分隔符或 "splitter"?
这是处理文件写入的服务器代码,
try{
final ServerSocket server = new ServerSocket(8998);
socket = server.accept();
File sdcard = Environment.getExternalStorageDirectory();
File myFile = new File(sdcard,"TestReceived"+curDate+".csv");
final BufferedReader br = new BufferedReader(new InputStreamReader(socket.getInputStream()));
final PrintWriter pw = new PrintWriter(new FileWriter(myFile));
String line;
String[] wordsarray;
int bc = 0;
int dc = 0;
int pq = 0;
int rq = 0;
int id = 0;
line = br.readLine();
wordsarray = line.split(",");
for (int x = 0; x<wordsarray.length; x++){
switch(wordsarray[x]){
case "COLUMN NAME A": id = x;
break;
case "COLUMN NAME B": bc = x;
break;
case "COLUMN NAME C": dc = x;
break;
case "COLUMN NAME D": pq = x;
break;
case "COLUMN NAME E": rq = x;
break;
}
}
pw.println(wordsarray[dc]+"\t"+wordsarray[rq]+"\t"+wordsarray[pq]+"\t"+wordsarray[bc]+"\t"+wordsarray[id]);
for (line = br.readLine(); line != null; line = br.readLine()) {
wordsarray = line.split(",");
pw.println(wordsarray[dc]+"\t"+wordsarray[rq]+"\t"+wordsarray[pq]+"\t"+wordsarray[bc]+"\t"+wordsarray[id]);
}
pw.flush();
pw.close();
br.close();
socket.close();
server.close();
}
catch (Exception e){
e.printStackTrace();
}
如果我在 line.split();
上加一个逗号并且文件有不同的分隔符,它会产生重复的行,我什至不知道为什么会这样
COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E
但是如果文件有匹配的逗号分隔符,它会产生正确的输出。
COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E
有什么方法可以自动检测文件的分隔符,这样我就不必担心文件使用的是哪个分隔符?或者有更好的解决方案吗?
使用 BufferedReader
, place a mark(...)
, read the first line。如果该行包含 \t
制表符,则您的文件是制表符分隔的,否则假定它是逗号分隔的。
然后使用 CSV/TSV 解析器解析文件,例如Apache Commons CSV.
try (BufferedReader in = Files.newBufferedReader(Paths.get(filename))) {
in.mark(1024);
String line = in.readLine();
if (line == null)
throw new IOException("File is empty: " + filename);
CSVFormat fileFormat = (line.indexOf('\t') != -1 ? CSVFormat.TDF
: CSVFormat.RFC4180)
.withHeader();
in.reset();
for (CSVRecord record : fileFormat.parse(in)) {
String lastName = record.get("Last Name");
String firstName = record.get("First Name");
...
}
}