本博客属原创文章,转载请注明出处:http://guoyunsky.iteye.com/blog/1235945
请先阅读:
1.Hadoop MapReduce 学习笔记(一) 序言和准备
2.Hadoop MapReduce 学习笔记(二) 序言和准备 2
下一篇: Hadoop MapReduce 学习笔记(九) MapReduce实现类似SQL的order by/排序 正确写法
排序是很重要的一个环节,类似SQL中的SELECT * FROM TABLE ORDER BY ID,如何用MapReduce实现呢?
package com.guoyun.hadoop.mapreduce.study; import java.io.IOException; import java.util.PriorityQueue; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * 对多列数据中某一列进行从小到大排序,然后返回所有数据,类似SQL: * SELECT * FROM TABLE ORDER BY ID ASC; * 但这是一个错误的写法,会内存溢出,避免内存溢出请查看:@OrderByMapReduceFixTest */ public class OrderBySingleMapReduceTest extends MyMapReduceMultiColumnTest { public static final Logger log=LoggerFactory.getLogger(OrderBySingleMapReduceTest.class); public OrderBySingleMapReduceTest(long dataLength) throws Exception { super(dataLength); // TODO Auto-generated constructor stub } public OrderBySingleMapReduceTest(String outputPath) throws Exception { super(outputPath); // TODO Auto-generated constructor stub } public OrderBySingleMapReduceTest(long dataLength, String inputPath, String outputPath) throws Exception { super(dataLength, inputPath, outputPath); // TODO Auto-generated constructor stub } private static class MyReducer extends Reducer<Text,MultiColumnWritable,NullWritable,MultiColumnWritable>{ PriorityQueue<MultiColumnWritable> queue=new PriorityQueue<MultiColumnWritable>(); @Override protected void reduce(Text key, Iterable<MultiColumnWritable> values, Context context) throws IOException, InterruptedException { MultiColumnWritable copy=null; for(MultiColumnWritable value:values){ copy=MultiColumnWritable.copy(value); queue.add(copy); } while(!queue.isEmpty()){ copy=queue.poll(); if(copy!=null){ context.write(NullWritable.get(), copy); } } } } /** * @param args */ public static void main(String[] args) { MyMapReduceTest mapReduceTest=null; Configuration conf=null; Job job=null; FileSystem fs=null; Path inputPath=null; Path outputPath=null; long begin=0; String input="testDatas/mapreduce/MRInput_Single_OrderBy"; String output="testDatas/mapreduce/MROutput_Single_OrderBy"; try { mapReduceTest=new OrderBySingleMapReduceTest(1000,input,output); inputPath=new Path(mapReduceTest.getInputPath()); outputPath=new Path(mapReduceTest.getOutputPath()); conf=new Configuration(); job=new Job(conf,"OrderBy"); fs=FileSystem.getLocal(conf); if(fs.exists(outputPath)){ if(!fs.delete(outputPath,true)){ System.err.println("Delete output file:"+mapReduceTest.getOutputPath()+" failed!"); return; } } job.setJarByClass(OrderBySingleMapReduceTest.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(MultiColumnWritable.class); job.setOutputKeyClass(NullWritable.class); job.setOutputValueClass(MultiColumnWritable.class); job.setMapperClass(MultiSupMapper.class); job.setReducerClass(MyReducer.class); job.setNumReduceTasks(2); FileInputFormat.addInputPath(job, inputPath); FileOutputFormat.setOutputPath(job, outputPath); begin=System.currentTimeMillis(); job.waitForCompletion(true); System.out.println("==================================================="); if(mapReduceTest.isGenerateDatas()){ System.out.println("The maxValue is:"+mapReduceTest.getMaxValue()); System.out.println("The minValue is:"+mapReduceTest.getMinValue()); } System.out.println("Spend time:"+(System.currentTimeMillis()-begin)); // Spend time:13361 } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } } }
更多技术文章、感悟、分享、勾搭,请用微信扫描:
相关推荐
Hadoop 用mapreduce实现Wordcount实例,绝对能用
用MapReduce实现TF-IDF,Hadoop版本是2.7.7,参考某教程亲自手写的,可以运行,有问题可以留言
在hadoop平台上,用mapreduce编程实现大数据的词频统计
upon the widely used and highly successful Hadoop MapReduce v1. The recipes that will help you analyze large and complex datasets with next generation Hadoop MapReduce will provide you with the skills...
本书对Hadoop Mapreduce进行详细讲解,切合实际应用,能够更深入地学习MapReduce,确实是一本不错的书。
Hadoop MapReduce Cookbook 高清完整版PDF下载 Hadoop MapReduce Cookbook
Hadoop mapreduce 实现KMeans,可用
Java操作Hadoop Mapreduce基本实践源码.
基于Hadoop Mapreduce 实现酒店评价文本情感分析(python源码+项目说明).zip基于Hadoop Mapreduce 实现酒店评价文本情感分析(python源码+项目说明).zip基于Hadoop Mapreduce 实现酒店评价文本情感分析(python...
本章介绍了 Hadoop MapReduce,同时发现它有以下缺点: 1、程序设计模式不容易使用,而且 Hadoop 的 Map Reduce API 太过低级,很难提高开发者的效率。 2、有运行效率问题,MapReduce 需要将中间产生的数据保存到...
基于Hadoop Mapreduce 实现酒店评价文本情感分析(python开发源码+项目说明).zip基于Hadoop Mapreduce 实现酒店评价文本情感分析(python开发源码+项目说明).zip基于Hadoop Mapreduce 实现酒店评价文本情感分析...
Hadoop MapReduce v2 Cookbook (第二版), Packt Publishing
基于Apriori算法的频繁项集Hadoop mapreduce
这本书都是实例,很接地气,多加练习和阅读,可稳步上升
hadoop mapreduce helloworld 能调试 详细内容请看:http://blog.csdn.net/wild46cat/article/details/53641765
Hadoop mapreduce 实现MatrixMultiply矩阵相乘
Hadoop mapreduce 实现NaiveBayes朴素贝叶斯
Hadoop mapreduce 实现InvertedIndexer倒排索引,能用。
Hadoop mapreduce 实现MR_DesicionTreeBuilder 决策树
MapReduce is the distribution system that the Hadoop MapReduce engine uses to distribute work around a cluster by working parallel on smaller data sets. It is useful in a wide range of applications, ...