博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
MapReduce 多表连接
阅读量:5266 次
发布时间:2019-06-14

本文共 1830 字,大约阅读时间需要 6 分钟。

题目描述:
现在有两个文件,1为存放公司名字和城市ID,2为存放城市ID和城市名
表一:
factoryname,addressed
Beijing Red Star,1
Shenzhen Thunder,3
Guangzhou Honda,2
Beijing Rising,1
Guangzhou Development Bank,2
Tencent,3
Back of Beijing,1
 
表2:
1,Beijing
2,Guangzhou
3,Shenzhen
4,Xian
 
现在要求输出公司名和城市名。例如:
Beijing Red Star Beijing
 
这个类似数据库里的多表连接。整体思路和单表连接差不多。还是利用reduce阶段对城市ID进行归并,我们在map阶段统一输出key=城市ID value=falg+“+”+城市名or公司名。然后通过reduce对flag的解析,分析后者是城市名还是公司名,并放到两个数组中,最后利用笛卡尔积将其输出
具体代码
public 
class 
MyMapper 
extends 
Mapper<LongWritable, Text, Text, Text> {
 
                 
public 
void 
map(LongWritable ikey, Text ivalue, Context context )
                                                 
throws 
IOException, InterruptedException {
                                String line=ivalue.toString();
                                StringTokenizer st= 
new 
StringTokenizer(line,
"," 
);
                                String value0=st.nextToken();
                                String value1=st.nextToken();
                                 
if
(value0.compareTo(
"factoryname" 
)!=0){
                                                 
if
(value0.length()==1){
                                                                 context.write(
new 
Text(value0), 
new 
Text(
"1" 
+
"+"
+value1));
                                                } 
else
{
                                                                 context.write(
new 
Text(value1), 
new 
Text(
"2" 
+
"+"
+value0));
                                                }
                                }
                }
 
}
 
 
public 
class 
MyReducer 
extends 
Reducer<Text, Text, Text, Text> {
 
                 
public 
void 
reduce(Text _key, Iterable<Text> values, Context context)
                                                 
throws 
IOException, InterruptedException {
                                 
// process values
                                ArrayList<String> address= 
new 
ArrayList<String>();
                                ArrayList<String> factory= 
new 
ArrayList<String>();
                                 
for 
(Text val : values) {
                                                String line=val.toString();
                                                StringTokenizer st=
new 
StringTokenizer(line,
"+" 
);
                                                 
int 
flag=Integer.parseInt(st.nextToken());
                                                 
if
(flag==1){
                                                                String addressname=st.nextToken();
                                                                 address.add(addressname);
                                                                
                                                                
                                                } 
else 
if 
(flag==2){
                                                                String factoryname=st.nextToken();
                                                                factory.add(factoryname);
                                                }
                                }
                                 
if
(address.size()!=0&&factory.size()!=0){
                                                 
for
(
int 
i=0;i<address.size();i++){
                                                                 
for
(
int 
j=0;j<factory.size();j++){
                                                                                context.write( 
new 
Text(address.get(i)),
new 
Text(factory.get(j)));
                                                                }
                                                }
                                }
                }
 
}

转载于:https://www.cnblogs.com/sunrye/p/4543359.html

你可能感兴趣的文章
12010 解密QQ号(队列)
查看>>
Docker简明教程(以安装wget程序为例)
查看>>
2014年辛星完全解读Javascript第一节
查看>>
装配SpringBean(一)--依赖注入
查看>>
daydayup2 codeforces143C
查看>>
ANT打包J2EE项目war包
查看>>
UESTC-我要长高 DP优化
查看>>
java选择文件时提供图像缩略图[转]
查看>>
当DIV内出现滚动条,fixed实效怎么办?
查看>>
方维分享系统二次开发, 给评论、主题、回复、活动 加审核的功能
查看>>
Matlab parfor-loop并行运算
查看>>
Oracle HRMS API's
查看>>
mysql_real_escape_string() vs addslashes() vs addcslashes()
查看>>
string与stringbuilder的区别
查看>>
2012-01-12 16:01 hibernate注解以及简单实例
查看>>
iOS8统一的系统提示控件——UIAlertController
查看>>
PAT甲级——1101 Quick Sort (快速排序)
查看>>
python创建进程的两种方式
查看>>
1.2 基础知识——关于猪皮(GP,Generic Practice)
查看>>
迭代器Iterator
查看>>