无法使用 Apache Commons 从 github CSV URL 解析 header
Unable to Parse header from github CSV URL using Apache Commons
我正在尝试使用 Apache commons csv 库从 github 访问 CSV 文件 url 中存在的每条记录的 header 值。
这是我的代码:
@Service
public class CoronaVirusDataService {
private static String virus_data_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/Aysen_Chile_07032021/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv";
@PostConstruct
public void getVirusData()
{
try
{
URL url = new URL(virus_data_url);
HttpURLConnection con = (HttpURLConnection) url.openConnection();
BufferedReader in = new BufferedReader( new InputStreamReader(con.getInputStream()));
while((in.readLine()) != null)
{
StringReader csvReader = new StringReader(in.readLine());
Iterable<CSVRecord> records = CSVFormat.DEFAULT.withFirstRecordAsHeader().parse(csvReader);
for (CSVRecord record : records) {
String country = record.get("Country/Region");
System.out.println(country);
}
}
in.close();
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
当我 运行 应用程序时出现此错误:
java.lang.IllegalArgumentException: A header name is missing in [, Afghanistan, 33.93911, 67.709953, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 4, 4, 4, 4, 5, 7, 8, 11, 12, 13, 15, 16, 18, 20, 24, 25, 29, 30, 34, 41, 43, 76, 80, 91, 107, 118, 146, 175, 197, 240, 275, 300, 338, 368, 424, 445, 485, 532, 556, 608, 666, 715, 785, 841, 907, 934, 997, 1027, 1093]
at org.apache.commons.csv.CSVParser.createHeaders(CSVParser.java:501)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:412)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:378)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:1157)
at com.p1.Services.CoronaVirusDataService.getVirusData(CoronaVirusDataService.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
如果您想将第一行读取为 header,则不应逐行读取,因为 Apache CSV 会尝试将每一行读取为 header。所以异常被抛出。相反,您应该传递 reader 来读取数据。
下面的代码工作正常。
@Service
public class CoronaVirusDataService {
private static String virus_data_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/Aysen_Chile_07032021/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv";
@PostConstruct
public void getVirusData()
{
try
{
URL url = new URL(virus_data_url);
HttpURLConnection con = (HttpURLConnection) url.openConnection();
BufferedReader in = new BufferedReader( new InputStreamReader(con.getInputStream()));
Iterable<CSVRecord> records = CSVFormat.DEFAULT.withFirstRecordAsHeader().parse(in);
for (CSVRecord record : records) {
String country = record.get("Country/Region");
System.out.println(country);
}
in.close();
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
您想解析 headers 标准 CSV 格式的 HTTP 文件。如果您尝试在 Java 中进行解析,代码将会很长。但是,使用 open-source Java 包 SPL 很容易完成此操作。你只需要一行代码:
A
1
=httpfile("https://raw.githubusercontent.com/CSSEGISandData/COVID-/Aysen_Chile_07032021/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")
.import@ct(Country/Region)
SPL 提供 JDBC driver 供 Java 调用。只需将上面的 SPL 脚本存储为 httpcsv.splx 并在调用存储过程时在 Java 中调用它:
…
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
st=con.prepareCall("call httpcsv()");
st.execute();
…
或者以我们执行 SQL 语句的方式在 Java 程序中执行 SPL 字符串:
…
st = con.prepareStatement("==httpfile(\"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/Aysen_Chile_07032021/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv\").import@ct(Country/Region)");
st.execute();
…
我正在尝试使用 Apache commons csv 库从 github 访问 CSV 文件 url 中存在的每条记录的 header 值。
这是我的代码:
@Service
public class CoronaVirusDataService {
private static String virus_data_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/Aysen_Chile_07032021/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv";
@PostConstruct
public void getVirusData()
{
try
{
URL url = new URL(virus_data_url);
HttpURLConnection con = (HttpURLConnection) url.openConnection();
BufferedReader in = new BufferedReader( new InputStreamReader(con.getInputStream()));
while((in.readLine()) != null)
{
StringReader csvReader = new StringReader(in.readLine());
Iterable<CSVRecord> records = CSVFormat.DEFAULT.withFirstRecordAsHeader().parse(csvReader);
for (CSVRecord record : records) {
String country = record.get("Country/Region");
System.out.println(country);
}
}
in.close();
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
当我 运行 应用程序时出现此错误:
java.lang.IllegalArgumentException: A header name is missing in [, Afghanistan, 33.93911, 67.709953, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 4, 4, 4, 4, 5, 7, 8, 11, 12, 13, 15, 16, 18, 20, 24, 25, 29, 30, 34, 41, 43, 76, 80, 91, 107, 118, 146, 175, 197, 240, 275, 300, 338, 368, 424, 445, 485, 532, 556, 608, 666, 715, 785, 841, 907, 934, 997, 1027, 1093]
at org.apache.commons.csv.CSVParser.createHeaders(CSVParser.java:501)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:412)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:378)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:1157)
at com.p1.Services.CoronaVirusDataService.getVirusData(CoronaVirusDataService.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
如果您想将第一行读取为 header,则不应逐行读取,因为 Apache CSV 会尝试将每一行读取为 header。所以异常被抛出。相反,您应该传递 reader 来读取数据。 下面的代码工作正常。
@Service
public class CoronaVirusDataService {
private static String virus_data_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/Aysen_Chile_07032021/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv";
@PostConstruct
public void getVirusData()
{
try
{
URL url = new URL(virus_data_url);
HttpURLConnection con = (HttpURLConnection) url.openConnection();
BufferedReader in = new BufferedReader( new InputStreamReader(con.getInputStream()));
Iterable<CSVRecord> records = CSVFormat.DEFAULT.withFirstRecordAsHeader().parse(in);
for (CSVRecord record : records) {
String country = record.get("Country/Region");
System.out.println(country);
}
in.close();
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
您想解析 headers 标准 CSV 格式的 HTTP 文件。如果您尝试在 Java 中进行解析,代码将会很长。但是,使用 open-source Java 包 SPL 很容易完成此操作。你只需要一行代码:
A | |
---|---|
1 | =httpfile("https://raw.githubusercontent.com/CSSEGISandData/COVID-/Aysen_Chile_07032021/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv") .import@ct(Country/Region) |
SPL 提供 JDBC driver 供 Java 调用。只需将上面的 SPL 脚本存储为 httpcsv.splx 并在调用存储过程时在 Java 中调用它:
…
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
st=con.prepareCall("call httpcsv()");
st.execute();
…
或者以我们执行 SQL 语句的方式在 Java 程序中执行 SPL 字符串:
…
st = con.prepareStatement("==httpfile(\"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/Aysen_Chile_07032021/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv\").import@ct(Country/Region)");
st.execute();
…