在 java 中显示读取 mapreduce 程序的 CSV 文件时出错
Error showing to read CSV file for mapreduce program in java
下面的代码是 Mapper
class in mapreduce
。我要编写的代码是读取 CSV
文件并在每行中存储两列数据(第 1 列表示 userId
和第 6 列显示 CheckOutDateTime
书)到 HashMap
.我认为我在 StubMapper
class 中的 getMapFromCSV
功能代码似乎是错误的。有人可以启发我吗?在底部,我将错误输出。谢谢大家的帮助和建议。
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Date;
import java.util.HashMap;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class StubMapper extends Mapper<LongWritable, Text, Text, MinMaxCountTuple> {
private Text outUserId = new Text();
private MinMaxCountTuple outTuple = new MinMaxCountTuple();
private final static SimpleDateFormat frmt =
new SimpleDateFormat("yyyy-MM--dd'T'HH:mm:ss.SSS");
public static HashMap<String, String> getMapFromCSV(String filePath) throws IOException
{
HashMap<String, String> words = new HashMap<String, String>();
BufferedReader in = new BufferedReader(new FileReader(filePath));
String line;
//= in.readLine())
while ((line = in.readLine()) != null) {
String columns[] = line.split("\t");
if (!words.containsKey(columns[1])) {
words.put(columns[1], columns[6]);
}
}
//in.close();
return words;
}
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
HashMap<String, String> parsed = getMapFromCSV(value.toString());
//String columns[] = value.toString().split("\t");
String strDate = parsed.get("CheckoutDateTime");
//String userId = columns[1];
//String strDate = columns[6];
String userId = parsed.get("BibNumber");
try {
Date creationDate = frmt.parse(strDate);
outTuple.setMin(creationDate);
outTuple.setMax(creationDate);
outTuple.setCount(1);
outUserId.set(userId);
context.write(outUserId, outTuple);
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
并显示跟随错误,我无法弄清楚。我认为问题似乎发生在 StubMapper
class 中的 getMapFromCSV
函数中。
该函数的参数将具有 CSV
个属性的信息。我要存储到 HashMap
中的是成对的键和值。但是,我不知道该如何改变。请说明您是否知道我该如何修复它。
java.io.FileNotFoundException: Code,Description,Code Type,Format Group,Format Subgroup,Category Group,Category Subgroup (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:120)
at java.io.FileInputStream.<init>(FileInputStream.java:79)
at java.io.FileReader.<init>(FileReader.java:41)
at StubMapper.getMapFromCSV(StubMapper.java:27)
at StubMapper.map(StubMapper.java:50)
at StubMapper.map(StubMapper.java:14)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
at org.apache.hadoop.mapred.Child.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
错误出现在这一行:
BufferedReader in = new BufferedReader(new FileReader(filePath));
- 检查
filePath
的值
- 检查文件是否位于
filePath
- 检查文件内容是否有效
您在 mapreduce
中缺少重要概念。问题出在下面这一行
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
// Below is the problematic line
HashMap<String, String> parsed = getMapFromCSV(value.toString());
也许您假设 Text value
是 CSV filename
,因此尝试从文件中获取值。
不是这样的。 Text value
映射器的输入是 CSV 文件中的一行。
假设您的 CSV 结构如下:
Code,Description,Code Type,Format Group,Format Subgroup,Category Group,Category Subgroup
111,sample description,codeType1,IN,....
您的代码应该类似于
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
if(value.toString().startWith("Code,Description")){
// Skip header line (first line) of CSV
return;
}
String data[] = value.toString().split(",", -1);
String code= data[0];
String codeType = data[2];
....
....
and so one
下面的代码是 Mapper
class in mapreduce
。我要编写的代码是读取 CSV
文件并在每行中存储两列数据(第 1 列表示 userId
和第 6 列显示 CheckOutDateTime
书)到 HashMap
.我认为我在 StubMapper
class 中的 getMapFromCSV
功能代码似乎是错误的。有人可以启发我吗?在底部,我将错误输出。谢谢大家的帮助和建议。
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Date;
import java.util.HashMap;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class StubMapper extends Mapper<LongWritable, Text, Text, MinMaxCountTuple> {
private Text outUserId = new Text();
private MinMaxCountTuple outTuple = new MinMaxCountTuple();
private final static SimpleDateFormat frmt =
new SimpleDateFormat("yyyy-MM--dd'T'HH:mm:ss.SSS");
public static HashMap<String, String> getMapFromCSV(String filePath) throws IOException
{
HashMap<String, String> words = new HashMap<String, String>();
BufferedReader in = new BufferedReader(new FileReader(filePath));
String line;
//= in.readLine())
while ((line = in.readLine()) != null) {
String columns[] = line.split("\t");
if (!words.containsKey(columns[1])) {
words.put(columns[1], columns[6]);
}
}
//in.close();
return words;
}
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
HashMap<String, String> parsed = getMapFromCSV(value.toString());
//String columns[] = value.toString().split("\t");
String strDate = parsed.get("CheckoutDateTime");
//String userId = columns[1];
//String strDate = columns[6];
String userId = parsed.get("BibNumber");
try {
Date creationDate = frmt.parse(strDate);
outTuple.setMin(creationDate);
outTuple.setMax(creationDate);
outTuple.setCount(1);
outUserId.set(userId);
context.write(outUserId, outTuple);
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
并显示跟随错误,我无法弄清楚。我认为问题似乎发生在 StubMapper
class 中的 getMapFromCSV
函数中。
该函数的参数将具有 CSV
个属性的信息。我要存储到 HashMap
中的是成对的键和值。但是,我不知道该如何改变。请说明您是否知道我该如何修复它。
java.io.FileNotFoundException: Code,Description,Code Type,Format Group,Format Subgroup,Category Group,Category Subgroup (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:120)
at java.io.FileInputStream.<init>(FileInputStream.java:79)
at java.io.FileReader.<init>(FileReader.java:41)
at StubMapper.getMapFromCSV(StubMapper.java:27)
at StubMapper.map(StubMapper.java:50)
at StubMapper.map(StubMapper.java:14)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
at org.apache.hadoop.mapred.Child.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
错误出现在这一行:
BufferedReader in = new BufferedReader(new FileReader(filePath));
- 检查
filePath
的值
- 检查文件是否位于
filePath
- 检查文件内容是否有效
您在 mapreduce
中缺少重要概念。问题出在下面这一行
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
// Below is the problematic line
HashMap<String, String> parsed = getMapFromCSV(value.toString());
也许您假设 Text value
是 CSV filename
,因此尝试从文件中获取值。
不是这样的。 Text value
映射器的输入是 CSV 文件中的一行。
假设您的 CSV 结构如下:
Code,Description,Code Type,Format Group,Format Subgroup,Category Group,Category Subgroup
111,sample description,codeType1,IN,....
您的代码应该类似于
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
if(value.toString().startWith("Code,Description")){
// Skip header line (first line) of CSV
return;
}
String data[] = value.toString().split(",", -1);
String code= data[0];
String codeType = data[2];
....
....
and so one