在 Java 中解析 Whatsapp 日志文件
Parse Whatsapp Log-File in Java
我目前正在开发一个小工具来分析 Whatsapp 中群聊的使用情况。
我正在尝试使用 whatsapp 日志文件来实现它。我设法将原始 .txt
格式化为以下格式以使用格式化文本:
29. Jan. 12:01 - Random Name: message text
29. Jan. 12:22 - Random Name: message text
29. Jan. 12:24 - Random Name: message text
29. Jan. 12:38 - Random Name: message text
29. Jan. 12:52 - Random Name: message text
到目前为止,还不错。问题是有一些软盘线,例如:
29. Jan. 08:42 - Random Name2: message text 1
additional text of the message 1
29. Jan. 08:43 - Random Name2: message text 2
甚至更糟:
15. Jan. 14:00 - Random Name: First part of the message
second part
third part
forth part
fifth part
29. Jan. 08:43 - Random Name2: message text 2
我想我需要一种算法来解决这个问题,但我是编程新手,无法创建如此复杂的算法。
Python同样的问题:parse a whatsApp conversation log
[编辑]
这是我的代码,它不起作用。 (我知道这很糟糕)
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
public class FormatList {
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
FileReader fr = new FileReader("Whatsapp_formated.txt");
BufferedReader br = new BufferedReader(fr);
FileWriter fw = new FileWriter("Whatsapp_formated2.txt");
BufferedWriter ausgabe = new BufferedWriter(fw);
String line="";
String buffer="";
while((line = br.readLine())!=null)
{
System.out.println("\n"+line);
if(line.isEmpty())
{
}
else{
if(line.charAt(0)=='0'||line.charAt(0)=='1'||line.charAt(0)=='2'||line.charAt(0)=='3'||line.charAt(0)=='4'||line.charAt(0)=='5'||line.charAt(0)=='6'||line.charAt(0)=='7'||line.charAt(0)=='8'||line.charAt(0)=='9')
{
buffer = line;
}
else
{
buffer += line;
}
ausgabe.write(buffer);
ausgabe.newLine();
System.out.println(buffer);
}
ausgabe.close();
}
}
}
[编辑 2]
最后我想读出文件并分析每一行:
29. Jan. 12:01 - Random Name: message text
我可以分辨出它是什么时候发送的,是谁发送的以及what/how他写了多少
如果我现在得到以下行:
additional text of the message 1
不知道是什么时候写的,也不知道是谁发的
试试这个代码。看看它是否达到预期效果。
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
public class WhatsappFormatted {
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
char preString = '-';
char searchString = ':';
FileReader fr = new FileReader("Whatsapp_formated.txt");
BufferedReader br = new BufferedReader(fr);
FileWriter fw = new FileWriter("Whatsapp_formated2.txt");
BufferedWriter ausgabe = new BufferedWriter(fw);
String line = "";
String buffer = "";
String lastMember = null;
while ((line = br.readLine()) != null) {
System.out.println("\n" + line);
if (!line.isEmpty())
if (Character.isDigit(line.charAt(0)) && Character.isDigit(line.charAt(1))) {
lastMember = line.substring(0, line.indexOf(searchString, line.indexOf(preString)) + 1);
buffer = line.trim();
} else {
buffer += "\n" + lastMember + line.trim();
}
ausgabe.write(buffer);
ausgabe.newLine();
System.out.println(buffer);
}
ausgabe.close();
}
}
嗯,我想出了一个解决你的问题的方法,我相信,根据我的理解。
给定一个格式如下的文件:
29. Jan. 12:01 - Random Name: message text
29. Jan. 12:22 - Random Name: message text
29. Jan. 12:24 - Random Name: message text
29. Jan. 12:38 - Random Name: message text
29. Jan. 12:52 - Random Name: message text
29. Jan. 08:42 - Random Name2: message text 1
additional text of the message 1
29. Jan. 08:43 - Random Name2: message text 2
15. Jan. 14:00 - Random Name: First part of the message
second part
third part
forth part
fifth part
29. Jan. 08:43 - Random Name2: message text 2
(这是我的 "data" 文件夹中名为 "wsp.log" 的文件。因此访问它的路径是 "data/wsp.log")
我期待这样的事情:
29. Jan. 12:01 - Random Name: message text
29. Jan. 12:22 - Random Name: message text
29. Jan. 12:24 - Random Name: message text
29. Jan. 12:38 - Random Name: message text
29. Jan. 12:52 - Random Name: message text
29. Jan. 08:42 - Random Name2: message text 1 additional text of the message 1
29. Jan. 08:43 - Random Name2: message text 2
15. Jan. 14:00 - Random Name: First part of the message second part third part forth part fifth part
29. Jan. 08:43 - Random Name2: message text 2
据此,我实现了以下class:
public class LogReader {
public void processWspLogFile() throws IOException {
//a. I would reference to my file
File wspLogFile = new File("data/wsp.log");
//b. I would use the mechanism to read the file using BufferedReader
BufferedReader bufferedReader = new BufferedReader(new FileReader(wspLogFile));
String currLine = null;//This is the current line (like my cursor)
//This will hold the data of the file in String format
StringBuilder stringFormatter = new StringBuilder();
boolean firstIterationDone = false;//The first line will always contains the format, so I will always append it, from the second I will start making the checkings...
// Now I can use some regex (I'm not really good at this stuff, I just used a Web Page: http://txt2re.com/)
/* This regex will match the lines that contains the date in this format "29. Jan. 12:22", when I take a look at your file
I can see that the "additional text of the message" does not contains any date, so I can use that as my point of separation*/
String regex = "(\d)(\d)(\.)(\s+)([a-z])([a-z])([a-z])(\.)(\s+)(\d)(\d)(:)(\d)(\d)";
//As part of using regex, I would like to create a Pattern to make the lines on the list match this expression
Pattern wspLogDatePattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
//Use of the line separator of the O.S
String lineSeparator = System.getProperty("line.separator");
while ((currLine = bufferedReader.readLine()) != null) {
if (!firstIterationDone) {
stringFormatter.append(currLine);
firstIterationDone = true;
} else {
Matcher wspLogDateMatcher = wspLogDatePattern.matcher(currLine);
//The first time we will check if the second line has the pattern, if it does, we append a line separator
if (wspLogDateMatcher.find()) {
//It is a "normal" line
stringFormatter.append(lineSeparator).append(currLine);
} else {
//But if it doesn't, we append it on the same line
stringFormatter.append(" ").append(currLine.trim());
}
}
}
System.out.println(stringFormatter.toString());
}
}
我将以这种方式调用:
public static void main(String[] args) throws IOException {
new LogReader().processWspLogFile();
}
希望这可以给您一些想法或对您的目的有用。我知道需要一些改进,代码总是需要重构 :),但现在它可以达到预期的格式。编码愉快 :).
我目前正在开发一个小工具来分析 Whatsapp 中群聊的使用情况。
我正在尝试使用 whatsapp 日志文件来实现它。我设法将原始 .txt
格式化为以下格式以使用格式化文本:
29. Jan. 12:01 - Random Name: message text
29. Jan. 12:22 - Random Name: message text
29. Jan. 12:24 - Random Name: message text
29. Jan. 12:38 - Random Name: message text
29. Jan. 12:52 - Random Name: message text
到目前为止,还不错。问题是有一些软盘线,例如:
29. Jan. 08:42 - Random Name2: message text 1
additional text of the message 1
29. Jan. 08:43 - Random Name2: message text 2
甚至更糟:
15. Jan. 14:00 - Random Name: First part of the message
second part
third part
forth part
fifth part
29. Jan. 08:43 - Random Name2: message text 2
我想我需要一种算法来解决这个问题,但我是编程新手,无法创建如此复杂的算法。
Python同样的问题:parse a whatsApp conversation log
[编辑]
这是我的代码,它不起作用。 (我知道这很糟糕)
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
public class FormatList {
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
FileReader fr = new FileReader("Whatsapp_formated.txt");
BufferedReader br = new BufferedReader(fr);
FileWriter fw = new FileWriter("Whatsapp_formated2.txt");
BufferedWriter ausgabe = new BufferedWriter(fw);
String line="";
String buffer="";
while((line = br.readLine())!=null)
{
System.out.println("\n"+line);
if(line.isEmpty())
{
}
else{
if(line.charAt(0)=='0'||line.charAt(0)=='1'||line.charAt(0)=='2'||line.charAt(0)=='3'||line.charAt(0)=='4'||line.charAt(0)=='5'||line.charAt(0)=='6'||line.charAt(0)=='7'||line.charAt(0)=='8'||line.charAt(0)=='9')
{
buffer = line;
}
else
{
buffer += line;
}
ausgabe.write(buffer);
ausgabe.newLine();
System.out.println(buffer);
}
ausgabe.close();
}
}
}
[编辑 2]
最后我想读出文件并分析每一行:
29. Jan. 12:01 - Random Name: message text
我可以分辨出它是什么时候发送的,是谁发送的以及what/how他写了多少
如果我现在得到以下行:
additional text of the message 1
不知道是什么时候写的,也不知道是谁发的
试试这个代码。看看它是否达到预期效果。
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
public class WhatsappFormatted {
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
char preString = '-';
char searchString = ':';
FileReader fr = new FileReader("Whatsapp_formated.txt");
BufferedReader br = new BufferedReader(fr);
FileWriter fw = new FileWriter("Whatsapp_formated2.txt");
BufferedWriter ausgabe = new BufferedWriter(fw);
String line = "";
String buffer = "";
String lastMember = null;
while ((line = br.readLine()) != null) {
System.out.println("\n" + line);
if (!line.isEmpty())
if (Character.isDigit(line.charAt(0)) && Character.isDigit(line.charAt(1))) {
lastMember = line.substring(0, line.indexOf(searchString, line.indexOf(preString)) + 1);
buffer = line.trim();
} else {
buffer += "\n" + lastMember + line.trim();
}
ausgabe.write(buffer);
ausgabe.newLine();
System.out.println(buffer);
}
ausgabe.close();
}
}
嗯,我想出了一个解决你的问题的方法,我相信,根据我的理解。
给定一个格式如下的文件:
29. Jan. 12:01 - Random Name: message text
29. Jan. 12:22 - Random Name: message text
29. Jan. 12:24 - Random Name: message text
29. Jan. 12:38 - Random Name: message text
29. Jan. 12:52 - Random Name: message text
29. Jan. 08:42 - Random Name2: message text 1
additional text of the message 1
29. Jan. 08:43 - Random Name2: message text 2
15. Jan. 14:00 - Random Name: First part of the message
second part
third part
forth part
fifth part
29. Jan. 08:43 - Random Name2: message text 2
(这是我的 "data" 文件夹中名为 "wsp.log" 的文件。因此访问它的路径是 "data/wsp.log")
我期待这样的事情:
29. Jan. 12:01 - Random Name: message text
29. Jan. 12:22 - Random Name: message text
29. Jan. 12:24 - Random Name: message text
29. Jan. 12:38 - Random Name: message text
29. Jan. 12:52 - Random Name: message text
29. Jan. 08:42 - Random Name2: message text 1 additional text of the message 1
29. Jan. 08:43 - Random Name2: message text 2
15. Jan. 14:00 - Random Name: First part of the message second part third part forth part fifth part
29. Jan. 08:43 - Random Name2: message text 2
据此,我实现了以下class:
public class LogReader {
public void processWspLogFile() throws IOException {
//a. I would reference to my file
File wspLogFile = new File("data/wsp.log");
//b. I would use the mechanism to read the file using BufferedReader
BufferedReader bufferedReader = new BufferedReader(new FileReader(wspLogFile));
String currLine = null;//This is the current line (like my cursor)
//This will hold the data of the file in String format
StringBuilder stringFormatter = new StringBuilder();
boolean firstIterationDone = false;//The first line will always contains the format, so I will always append it, from the second I will start making the checkings...
// Now I can use some regex (I'm not really good at this stuff, I just used a Web Page: http://txt2re.com/)
/* This regex will match the lines that contains the date in this format "29. Jan. 12:22", when I take a look at your file
I can see that the "additional text of the message" does not contains any date, so I can use that as my point of separation*/
String regex = "(\d)(\d)(\.)(\s+)([a-z])([a-z])([a-z])(\.)(\s+)(\d)(\d)(:)(\d)(\d)";
//As part of using regex, I would like to create a Pattern to make the lines on the list match this expression
Pattern wspLogDatePattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
//Use of the line separator of the O.S
String lineSeparator = System.getProperty("line.separator");
while ((currLine = bufferedReader.readLine()) != null) {
if (!firstIterationDone) {
stringFormatter.append(currLine);
firstIterationDone = true;
} else {
Matcher wspLogDateMatcher = wspLogDatePattern.matcher(currLine);
//The first time we will check if the second line has the pattern, if it does, we append a line separator
if (wspLogDateMatcher.find()) {
//It is a "normal" line
stringFormatter.append(lineSeparator).append(currLine);
} else {
//But if it doesn't, we append it on the same line
stringFormatter.append(" ").append(currLine.trim());
}
}
}
System.out.println(stringFormatter.toString());
}
}
我将以这种方式调用:
public static void main(String[] args) throws IOException {
new LogReader().processWspLogFile();
}
希望这可以给您一些想法或对您的目的有用。我知道需要一些改进,代码总是需要重构 :),但现在它可以达到预期的格式。编码愉快 :).