如何从 Mapreduce 作业查询存储在 hdfs 中的嵌入式数据库?
How to query an embedded database stored in hdfs from a Mapreduce job?
我正在尝试从 Hadoop MapReduce 映射器查询 GeoLite 数据库以解析 IP 地址的国家/地区。我尝试了两种方法:
1.Using File
仅适用于本地文件系统,我收到一个文件未找到异常
File database = new File("hdfs://localhost:9000/input/GeoLite2-City.mmdb"); // <<< HERE
DatabaseReader reader = new DatabaseReader.Builder(database).build();
2.Using 流,但我在运行时遇到此错误
Error: Java Heap Space
Path pt = new Path("hdfs://localhost:9000/input/GeoLite2-City.mmdb");
FileSystem fs = FileSystem.get(new Configuration());
FSDataInputStream stream = fs.open(pt);
DatabaseReader reader = new DatabaseReader.Builder(stream).build();
InetAddress ipAddress = InetAddress.getByName(address.getHostAddress());
CityResponse response = null;
try {
response = reader.city(ipAddress);
} catch (GeoIp2Exception ex) {
ex.printStackTrace();
return;
}
我的问题:Hadoop中的mapper如何查询geolite数据库?
我通过分布式缓存方法解决了它,通过将 GeoLite 数据库文件缓存到 MapReduce 作业中的每个映射器。
@Override
public void setup(Context context)
{
Configuration conf = context.getConfiguration();
try {
cachefiles = DistributedCache.getLocalCacheFiles(conf);
File database = new File(cachefiles[0].toString()); //
reader = new DatabaseReader.Builder(database).build();
} catch (IOException e) {
e.printStackTrace();
}
}
public void map(Object key, Text line, Context context) throws IOException,
InterruptedException {
.....
InetAddress ipAddress = InetAddress.getByName(address.getHostAddress());
CityResponse response = null;
try {
response = reader.city(ipAddress);
} catch (GeoIp2Exception ex) {
ex.printStackTrace();
return;
}
......
我正在尝试从 Hadoop MapReduce 映射器查询 GeoLite 数据库以解析 IP 地址的国家/地区。我尝试了两种方法:
1.Using File
仅适用于本地文件系统,我收到一个文件未找到异常
File database = new File("hdfs://localhost:9000/input/GeoLite2-City.mmdb"); // <<< HERE
DatabaseReader reader = new DatabaseReader.Builder(database).build();
2.Using 流,但我在运行时遇到此错误
Error: Java Heap Space
Path pt = new Path("hdfs://localhost:9000/input/GeoLite2-City.mmdb");
FileSystem fs = FileSystem.get(new Configuration());
FSDataInputStream stream = fs.open(pt);
DatabaseReader reader = new DatabaseReader.Builder(stream).build();
InetAddress ipAddress = InetAddress.getByName(address.getHostAddress());
CityResponse response = null;
try {
response = reader.city(ipAddress);
} catch (GeoIp2Exception ex) {
ex.printStackTrace();
return;
}
我的问题:Hadoop中的mapper如何查询geolite数据库?
我通过分布式缓存方法解决了它,通过将 GeoLite 数据库文件缓存到 MapReduce 作业中的每个映射器。
@Override
public void setup(Context context)
{
Configuration conf = context.getConfiguration();
try {
cachefiles = DistributedCache.getLocalCacheFiles(conf);
File database = new File(cachefiles[0].toString()); //
reader = new DatabaseReader.Builder(database).build();
} catch (IOException e) {
e.printStackTrace();
}
}
public void map(Object key, Text line, Context context) throws IOException,
InterruptedException {
.....
InetAddress ipAddress = InetAddress.getByName(address.getHostAddress());
CityResponse response = null;
try {
response = reader.city(ipAddress);
} catch (GeoIp2Exception ex) {
ex.printStackTrace();
return;
}
......