博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
elasticsearch备份与恢复4_使用ES-Hadoop将ES中的索引数据写入HDFS中
阅读量:6574 次
发布时间:2019-06-24

本文共 9344 字,大约阅读时间需要 31 分钟。

背景知识见链接:elasticsearch备份与恢复3_使用ES-Hadoop将HDFS数据写入Elasticsearch中

项目参考《Elasticsearch集成Hadoop最佳实践》的tweets2HdfsMapper项目

项目源码:https://gitee.com/constfafa/ESToHDFS.git

开发过程:

1. 先在kibana中查看下索引的信息

  1.  
    "hits": [
  2.  
    {
  3.  
    "_index": "xxx-words",
  4.  
    "_type": "history",
  5.  
    "_id": "zankHWUBk5wX4tbY-gpZ",
  6.  
    "_score": 1,
  7.  
    "_source": {
  8.  
    "word": "abc",
  9.  
    "createTime": "2018-08-09 16:56:00",
  10.  
    "userId": "263",
  11.  
    "datetime": "2018-08-09T16:56:00Z"
  12.  
    }
  13.  
    },
  14.  
    {
  15.  
    "_index": "xxx-words",
  16.  
    "_type": "history",
  17.  
    "_id": "zqntHWUBk5wX4tbYFAqy",
  18.  
    "_score": 1,
  19.  
    "_source": {
  20.  
    "word": "bcd",
  21.  
    "createTime": "2018-08-09 16:59:00",
  22.  
    "userId": "263",
  23.  
    "datetime": "2018-08-09T16:59:00Z"
  24.  
    }
  25.  
    }
  26.  
    ]

之后直接执行 hadoop jar history2hdfs-job.jar

执行过程如下

  1.  
    [root@docker02 jar]# hadoop jar history2hdfs-job.jar
  2.  
    18/06/07 04:04:36 WARN util.www.leyouzaixan.cn NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3.  
    18/06/07 04:04:42 INFO client.RMProxy: Connecting to ResourceManager at /192.168.211.104:8032
  4.  
    18/06/07 04:04:48 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
  5.  
    18/06/07 04:04:55 INFO util.Version: Elasticsearch Hadoop v6.2.3 [039a45c5a1]
  6.  
    18/06/07 04:04:58 INFO mr.EsInputFormat: Reading from [hzeg-history-words/history]
  7.  
    18/06/07 04:04:58 INFO mr.EsInputFormat: Created [2] splits
  8.  
    18/06/07 04:05:00 INFO mapreduce.JobSubmitter: number of splits:2
  9.  
    18/06/07 04:05:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1528305729734_0007
  10.  
    18/06/07 04:05:05 INFO impl.YarnClientImpl: Submitted application application_1528305729734_0007
  11.  
    18/06/07 04:05:06 INFO mapreduce.Job: The url to track the job: http://docker02:8088/proxy/application_1528305729734_0007/
  12.  
    18/06/07 04:05:06 INFO mapreduce.Job: Running job: job_1528305729734_0007
  13.  
    18/06/07 04:09:31 INFO mapreduce.Job: Job job_1528305729734_0007 running in uber mode : false
  14.  
    18/06/07 04:09:42 INFO mapreduce.Job: map 0% reduce 0%
  15.  
    18/06/07 04:15:36 INFO mapreduce.Job: map 100% reduce 0%
  16.  
    18/06/07 04:17:26 INFO mapreduce.Job: Job job_1528305729734_0007 completed successfully
  17.  
    18/06/07 04:17:56 INFO mapreduce.Job: Counters: 47
  18.  
    File System Counters
  19.  
    FILE: Number of bytes read=0
  20.  
    FILE: Number of bytes written=230906
  21.  
    FILE: Number of read operations=0
  22.  
    FILE: Number of large read operations=0
  23.  
    FILE: Number of write operations=0
  24.  
    HDFS: Number
    of bytes read=74694
  25.  
    HDFS: Number
    of bytes written=222
  26.  
    HDFS: Number
    of read operations=8
  27.  
    HDFS: Number
    of large read operations=0
  28.  
    HDFS: Number
    of write operations=4
  29.  
    Job Counters
  30.  
    Launched map tasks=
    2
  31.  
    Rack-
    local map tasks=2
  32.  
    Total time spent by all maps
    in occupied slots (ms)=791952
  33.  
    Total time spent by all reduces
    in occupied slots (ms)=0
  34.  
    Total time spent by all map tasks (ms)=
    791952
  35.  
    Total vcore-seconds taken by all map tasks=
    791952
  36.  
    Total megabyte-seconds taken by all map tasks=
    810958848
  37.  
    Map-Reduce Framework
  38.  
    Map input records=
    2
  39.  
    Map output records=
    2
  40.  
    Input split bytes=
    74694
  41.  
    Spilled Records=
    0
  42.  
    Failed Shuffles=
    0
  43.  
    Merged Map outputs=
    0
  44.  
    GC time elapsed (ms)=
    8106
  45.  
    CPU time spent (ms)=
    20240
  46.  
    Physical memory (bytes) snapshot=
    198356992
  47.  
    Virtual memory (bytes) snapshot=4128448512
  48.  
    Total committed heap usage (bytes)=
    32157696
  49.  
    File Input Format Counters
  50.  
    Bytes
    Read=0
  51.  
    File Output Format Counters
  52.  
    Bytes Written=
    222
  53.  
    Elasticsearch Hadoop Counters
  54.  
    Bulk Retries=
    0
  55.  
    Bulk Retries Total Time(ms)=
    0
  56.  
    Bulk Total=
    0
  57.  
    Bulk Total Time(ms)=
    0
  58.  
    Bytes Accepted=
    0
  59.  
    Bytes Received=
    1102
  60.  
    Bytes Retried=
    0
  61.  
    Bytes Sent=
    296
  62.  
    Documents Accepted=
    0
  63.  
    Documents Received=
    0
  64.  
    Documents Retried=
    0
  65.  
    Documents Sent=
    0
  66.  
    Network Retries=
    0
  67.  
    Network Total Time(ms)=
    5973
  68.  
    Node Retries=
    0
  69.  
    Scroll Total=
    1
  70.  
    Scroll Total Time(ms)=
    666

后面仍旧报job history server找不到,并不影响

执行下面语句检查文件及数据是否正确

可以看到,最终实现了将索引文件存入HDFS的功能

自己在实际使用中遇到了下面的几个问题

1. spring boot gradle项目为了使用此功能,加入如下依赖后,

  1.  
    compile group:
    'org.apache.hadoop', name: 'hadoop-core', version: '1.2.1'
  2.  
    compile group:
    'org.apache.hadoop', name: 'hadoop-hdfs', version: '2.7.2'
  3.  
    compile group:
    'org.elasticsearch', name: 'elasticsearch-hadoop', version: '6.2.4'

发现spring boot启动报错A child container failed during start

查看gradle dependency

其中hadoop-core:1.2.1

  1.  
    +
    --- org.apache.hadoop:hadoop-core:1.2.1
  2.  
    | +
    --- commons-cli:commons-cli:1.2
  3.  
    | +
    --- xmlenc:xmlenc:0.52
  4.  
    | +
    --- com.sun.jersey:jersey-core:1.8 -> 1.9
  5.  
    | +
    --- com.sun.jersey:jersey-json:1.8
  6.  
    | | +
    --- org.codehaus.jettison:jettison:1.1
  7.  
    | | | \
    --- stax:stax-api:1.0.1
  8.  
    | | +
    --- com.sun.xml.bind:jaxb-www.fengshen157.com/ impl:2.2.3-1
  9.  
    | | | \
    --- javax.xml.bind:jaxb-api:2.2.2 -> 2.3.0
  10.  
    | | +
    --- org.codehaus.jackson:jackson-core-asl:1.7.1 -> 1.9.13
  11.  
    | | +
    --- org.codehaus.jackson:jackson-mapper-asl:1.7.1 -> 1.9.13
  12.  
    | | | \
    --- org.codehaus.jackson:jackson-core-asl:1.9.13
  13.  
    | | +
    --- org.codehaus.jackson:jackson-www.yongshiyule178.com jaxrs:1.7.1
  14.  
    | | | +
    --- org.codehaus.jackson:jackson-core-asl:1.7.1 -> 1.9.13
  15.  
    | | | \
    --- org.codehaus.jackson:jackson-www.mhylpt.com mapper-asl:1.7.1 -> 1.9.13 (*)
  16.  
    | | +
    --- org.codehaus.jackson:jackson-xc:1.7.1
  17.  
    | | | +
    --- org.codehaus.jackson:jackson-www.dasheng178.com  core-asl:1.7.1 -> 1.9.13
  18.  
    | | | \
    --- org.codehaus.jackson:jackson-www.taohuayuan178.com mapper-asl:1.7.1 -> 1.9.13 (*)
  19.  
    | | \
    --- com.www.tianjiuyule178.com sun.jersey:jersey-core:1.8 -> 1.9
  20.  
    | +
    --- com.sun.jersey: www.thd540.com jersey-server:1.8 -> 1.9
  21.  
    | | +
    --- asm:asm:3.1
  22.  
    | | \
    --- com.sun.jersey:jersey-core:1.9
  23.  
    | +
    --- commons-io:commons-io:2.1 -> 2.4
  24.  
    | +
    --- commons-httpclient:commons-httpclient:3.0.1 -> 3.1
  25.  
    | | +
    --- commons-logging:commons-logging:1.0.4 -> 1.2
  26.  
    | | \
    --- commons-codec:commons-codec:1.2 -> 1.11
  27.  
    | +
    --- commons-codec:commons-codec:1.4 -> 1.11
  28.  
    | +
    --- org.apache.commons:commons-math:2.1
  29.  
    | +
    --- commons-configuration:commons-configuration:1.6
  30.  
    | | +
    --- commons-collections:commons-collections:3.2.1
  31.  
    | | +
    --- commons-lang:commons-lang:2.4 -> 2.6
  32.  
    | | +
    --- commons-logging:commons-logging:1.1.1 -> 1.2
  33.  
    | | +
    --- commons-digester:commons-digester:1.8
  34.  
    | | | +
    --- commons-beanutils:commons-beanutils:1.7.0 (*)
  35.  
    | | | \
    --- commons-logging:commons-logging:1.1 -> 1.2
  36.  
    | | \
    --- commons-beanutils:commons-beanutils-core:1.8.0
  37.  
    | | \
    --- commons-logging:commons-logging:1.1.1 -> 1.2
  38.  
    | +
    --- commons-net:commons-net:1.4.1
  39.  
    | | \
    --- oro:oro:2.0.8
  40.  
    | +
    --- org.mortbay.jetty:jetty:6.1.26
  41.  
    | | +
    --- org.mortbay.jetty:jetty-util:6.1.26
  42.  
    | | \
    --- org.mortbay.jetty:servlet-api:2.5-20081211
  43.  
    | +
    --- org.mortbay.jetty:jetty-util:6.1.26
  44.  
    | +
    --- tomcat:jasper-runtime:5.5.12
  45.  
    | +
    --- tomcat:jasper-compiler:5.5.12
  46.  
    | +
    --- org.mortbay.jetty:jsp-api-2.1:6.1.14
  47.  
    | | \
    --- org.mortbay.jetty:servlet-api-2.5:6.1.14
  48.  
    | +
    --- org.mortbay.jetty:jsp-2.1:6.1.14
  49.  
    | | +
    --- org.eclipse.jdt:core:3.1.1
  50.  
    | | +
    --- org.mortbay.jetty:jsp-api-2.1:6.1.14 (*)
  51.  
    | | \
    --- ant:ant:1.6.5
  52.  
    | +
    --- commons-el:commons-el:1.0
  53.  
    | | \
    --- commons-logging:commons-logging:1.0.3 -> 1.2
  54.  
    | +
    --- net.java.dev.jets3t:jets3t:0.6.1
  55.  
    | | +
    --- commons-codec:commons-codec:1.3 -> 1.11
  56.  
    | | +
    --- commons-logging:commons-logging:1.1.1 -> 1.2
  57.  
    | | \
    --- commons-httpclient:commons-httpclient:3.1 (*)
  58.  
    | +
    --- hsqldb:hsqldb:1.8.0.10
  59.  
    | +
    --- oro:oro:2.0.8
  60.  
    | +
    --- org.eclipse.jdt:core:3.1.1
  61.  
    | \
    --- org.codehaus.jackson:jackson-mapper-asl:1.8.8 -> 1.9.13 (*)

hadoop-hdfs:2.7.1

  1.  
    +
    --- org.apache.hadoop:hadoop-hdfs:2.7.1
  2.  
    | +
    --- com.google.guava:guava:11.0.2 -> 18.0
  3.  
    | +
    --- org.mortbay.jetty:jetty:6.1.26 (*)
  4.  
    | +
    --- org.mortbay.jetty:jetty-util:6.1.26
  5.  
    | +
    --- com.sun.jersey:jersey-core:1.9
  6.  
    | +
    --- com.sun.jersey:jersey-server:1.9 (*)
  7.  
    | +
    --- commons-cli:commons-cli:1.2
  8.  
    | +
    --- commons-codec:commons-codec:1.4 -> 1.11
  9.  
    | +
    --- commons-io:commons-io:2.4
  10.  
    | +
    --- commons-lang:commons-lang:2.6
  11.  
    | +
    --- commons-logging:commons-logging:1.1.3 -> 1.2
  12.  
    | +
    --- commons-daemon:commons-daemon:1.0.13
  13.  
    | +
    --- log4j:log4j:1.2.17
  14.  
    | +
    --- com.google.protobuf:protobuf-java:2.5.0
  15.  
    | +
    --- javax.servlet:servlet-api:2.5
  16.  
    | +
    --- org.codehaus.jackson:jackson-core-asl:1.9.13
  17.  
    | +
    --- org.codehaus.jackson:jackson-mapper-asl:1.9.13 (*)
  18.  
    | +
    --- xmlenc:xmlenc:0.52
  19.  
    | +
    --- io.netty:netty-all:4.0.23.Final -> 4.1.22.Final
  20.  
    | +
    --- xerces:xercesImpl:2.9.1
  21.  
    | | \
    --- xml-apis:xml-apis:1.3.04 -> 1.4.01
  22.  
    | +
    --- org.apache.htrace:htrace-core:3.1.0-incubating
  23.  
    | \
    --- org.fusesource.leveldbjni:leveldbjni-all:1.8

原因:

发现其都是要依赖servlet-api-2.5的,而springboot要依赖于servlet3以及更高版本,

解决方案:

要exclude servlet-api-2.5

在build.gradle中配置

  1.  
    configurations{
  2.  
    all*.exclude
    group:'javax.servlet'
  3.  
    }

参考:

java - SpringBoot Catalina LifeCycle Exception - Stack Overflow

解决gradle管理依赖中 出现servlet-api.jar冲突的问题。 - CSDN博客

2. 使用的hadoop hdfs的版本是2.7.2,当es向hdfs中写入数据时报错:

  1.  
    ERROR security.UserGroupInformation: PriviledgedActionException as:root cause:org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
  2.  
    org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4

原因:

hadoop-core 1.2.1太老了,不支持2.7.2了

所以解决办法是:

修改依赖

  1.  
    compile group:
    'org.apache.hadoop', name: 'hadoop-common', version: '2.7.2'
  2.  
    compile group:
    'org.apache.hadoop', name: 'hadoop-mapreduce-client-core', version: '2.7.2'
  3.  
    compile group:
    'org.apache.hadoop', name: 'hadoop-client', version: '2.7.2'
  4.  
    compile group:
    'org.apache.hadoop', name: 'hadoop-hdfs', version: '2.7.2'
  5.  
    compile group:
    'org.elasticsearch', name: 'elasticsearch-hadoop', version: '6.2.4'

参考:

intellij的maven工程"Server IPC version 9 cannot communicate with client version"错误的解决办法 - 

你可能感兴趣的文章
stark组件(1):动态生成URL
查看>>
169. Majority Element
查看>>
下拉菜单
查看>>
[清华集训2014]玛里苟斯
查看>>
Doctype作用?严格模式与混杂模式如何区分?它们有何意义
查看>>
【MVC+EasyUI实例】对数据网格的增删改查(上)
查看>>
第三章:如何建模服务
查看>>
Project Euler 345: Matrix Sum
查看>>
你可能不知道的技术细节:存储过程参数传递的影响
查看>>
POJ1703 Find them, Catch them
查看>>
HTML转义字符大全(转)
查看>>
[摘录]调动员工积极性的七个关键
查看>>
Linux getcwd()的实现【转】
查看>>
Backup Volume 操作 - 每天5分钟玩转 OpenStack(59)
查看>>
.htaccess 基础教程(四)Apache RewriteCond 规则参数
查看>>
转: maven进阶:一个多模块项目
查看>>
Android控件之HorizontalScrollView 去掉滚动条
查看>>
UVM中的class--2
查看>>
任务调度器配置文件
查看>>
ORACLE 存储过程异常捕获并抛出
查看>>