如何使用Yarn客户端提交MapReduce任务？

使用Yarn客户端提交MapReduce任务，首先确保Hadoop和Yarn已正确安装并配置。编写一个MapReduce程序并将其打包为JAR文件。使用yarn命令行工具提交任务，指定主类、JAR文件路径以及其他必要的参数。，，“shell，yarn jar your_mapreduce_job.jar com.example.YourMainClass -input /path/to/input -output /path/to/output，“，，这将在Yarn集群上启动一个MapReduce作业，处理指定的输入数据并将结果输出到指定的目录。

Yarn 提交 MapReduce 任务

在 Hadoop 生态系统中，MapReduce 是一种编程模型，用于处理和生成大数据集，Yarn（Yet Another Resource Negotiator）是 Hadoop 的资源管理平台，它允许多个应用程序共享集群资源，使用 Yarn 客户端提交 MapReduce 任务可以有效地利用集群资源，实现分布式计算。

1. 环境准备

在开始之前，确保你的 Hadoop 集群已经正确安装并配置了 Yarn 和 MapReduce，你需要以下组件：

Hadoop 分布式文件系统（HDFS）

Yarn 资源管理器（ResourceManager）

NodeManager

MapReduce 应用程序

2. 编写 MapReduce 程序

你需要编写一个 MapReduce 程序，以下是一个简单的 WordCount 例子：

WordCount.java

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();
        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }
    public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();
        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

3. 编译和打包

将上述 Java 代码编译并打包成一个 JAR 文件：


$ javac -cphadoop classpath -d . WordCount.java
$ jar -cvf wordcount.jar *.class

4. 使用 Yarn 客户端提交任务

通过 Yarn 客户端提交 MapReduce 任务，你可以使用yarn 命令行工具，假设输入数据存储在 HDFS 的/user/hadoop/input 目录下，输出结果将存储在/user/hadoop/output 目录中，执行以下命令提交任务：

$ yarn jar wordcount.jar WordCount /user/hadoop/input /user/hadoop/output

5. 查看任务状态和结果

提交任务后，可以使用以下命令查看任务的状态：

$ yarn application -list

任务完成后，可以通过 HDFS 查看输出结果：

$ hadoop fs -cat /user/hadoop/output/part-r-00000

如何使用Yarn客户端提交MapReduce任务？

相关推荐

MapReduce流程中，Join顺序的正确步骤是什么？

如何准备MapReduce样例的初始数据？

如何理解MapReduce输出中的LZO_OUTPUT格式？

MapReduce中的Map阶段如何处理输入数据？

发表回复