Apache Phoenix IndexTool 因 java.lang.ClassNotFoundException 失败:org.apache.tephra.TransactionSystemClient

贝洛

我有安装了 Apache Phoenix Parcel 的 Cloudera CDH 5.14.2 集群 ( APACHE_PHOENIX-4.14.0-cdh5.14.2.p0.3)。

我有一个包含二级索引的表,我想使用IndexToolApache Phoenix 提供的来填充这个索引但这给了我以下错误:

19/01/02 13:58:10 INFO mapreduce.Job: The url to track the job: http://mor-master-01.triviadata.local:8088/proxy/application_1546422102410_0020/
19/01/02 13:58:10 INFO mapreduce.Job: Running job: job_1546422102410_0020
19/01/02 13:58:18 INFO mapreduce.Job: Job job_1546422102410_0020 running in uber mode : false
19/01/02 13:58:18 INFO mapreduce.Job:  map 0% reduce 0%
19/01/02 13:58:22 INFO mapreduce.Job: Task Id : attempt_1546422102410_0020_m_000000_0, Status : FAILED
Error: java.lang.ClassNotFoundException: org.apache.tephra.TransactionSystemClient
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.apache.phoenix.transaction.TransactionFactory$Provider.<clinit>(TransactionFactory.java:27)
        at org.apache.phoenix.query.QueryServicesOptions.<clinit>(QueryServicesOptions.java:270)
        at org.apache.phoenix.query.QueryServicesImpl.<init>(QueryServicesImpl.java:36)
        at org.apache.phoenix.jdbc.PhoenixDriver.getQueryServices(PhoenixDriver.java:197)
        at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:235)
        at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:150)
        at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:221)
        at java.sql.DriverManager.getConnection(DriverManager.java:664)
        at java.sql.DriverManager.getConnection(DriverManager.java:208)
        at org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:113)
        at org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:58)
        at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:180)
        at org.apache.phoenix.mapreduce.PhoenixInputFormat.createRecordReader(PhoenixInputFormat.java:76)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:521)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

当我检查我HBASE_CLASSPATH的命令时${HBASE_HOME}/bin/hbase classpath,我看到它包含以下 jars:

/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/conf
/usr/java/latest/lib/tools.jar
/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/lib/*.jar
/etc/hadoop/conf
$HBASE_HOME/lib/hadoop/lib/*
$HBASE_HOME/lib/hadoop/.//*
$HBASE_HOME/lib/hadoop-hdfs/./
$HBASE_HOME/hadoop-hdfs/lib/*
$HBASE_HOME/hadoop-hdfs/.//*
$HBASE_HOME/hadoop-yarn/lib/*
$HBASE_HOME/hadoop-yarn/.//*
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/lib/*
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/.//*
/etc/hadoop/conf
$HBASE_HOME/lib/hadoop/*
$HBASE_HOME/lib/hadoop/lib/*
$HBASE_HOME/lib/zookeeper/*
$HBASE_HOME/lib/zookeeper/lib/*
$HBASE_HOME/jars/hbase-common-1.2.0-cdh5.14.2.jar
$HBASE_HOME/jars/hbase-client-1.2.0-cdh5.14.2.jar
$HBASE_HOME/jars/hbase-hadoop-compat-1.2.0-cdh5.14.2.jar
$HBASE_HOME/jars/htrace-core-3.0.4.jar
$HBASE_HOME/jars/htrace-core-3.2.0-incubating.jar
$HBASE_HOME/jars/htrace-core4-4.0.1-incubating.jar
$HBASE_HOME/jars/hbase-protocol-1.2.0-cdh5.14.2.jar
$HBASE_HOME/jars/hbase-server-1.2.0-cdh5.14.2.jar
$HBASE_HOME/jars/metrics-core-2.2.0.jar
$HBASE_HOME/jars/metrics-core-3.1.2.jar
$PHOENIX_HOME/lib/phoenix/lib/tephra-hbase-compat-1.2-cdh-0.14.0-incubating.jar
$PHOENIX_HOME/lib/phoenix/lib/tephra-api-0.14.0-incubating.jar
$PHOENIX_HOME/lib/phoenix/lib/tephra-core-0.14.0-incubating.jar
$PHOENIX_HOME/lib/phoenix/lib/phoenix-core-4.14.0-cdh5.14.2.jar
$PHOENIX_HOME/lib/phoenix/lib/twill-zookeeper-0.8.0.jar
$PHOENIX_HOME/lib/phoenix/lib/twill-discovery-api-0.8.0.jar
$PHOENIX_HOME/lib/phoenix/lib/twill-discovery-core-0.8.0.jar
$PHOENIX_HOME/lib/phoenix/lib/joda-time-1.6.jar
$PHOENIX_HOME/lib/phoenix/lib/antlr-runtime-3.5.2.jar

当我检查源代码及其依赖项时,我发现缺少的类是 $PHOENIX_HOME/lib/phoenix/lib/tephra-core-0.14.0-incubating.jar

当我为缺少的类 grep 这个 JAR 的内容时,我看到它在那里:

# jar tf $PHOENIX_HOME/lib/phoenix/lib/tephra-core-0.14.0-incubating.jar | grep TransactionSystemClient
org/apache/tephra/TransactionSystemClient.class

你知道为什么 MR 工作找不到这个特定的班级吗?

如果有帮助,我的二级索引表定义如下:

0: jdbc:phoenix:localhost:2181/hbase> create table t1(v1 varchar, v2 varchar, v3 integer constraint primary_key primary key(v1)) immutable_rows=true, compression='SNAPPY';
1: jdbc:phoenix:localhost:2181/hbase> create index glb_idx on t1(v2) async;

IndexTool用命令运行

${HBASE_HOME}/bin/hbase org.apache.phoenix.mapreduce.index.IndexTool -dt T1 -it GLB_IDX -op /tmp

当我同步创建索引并将一些数据 upsert 到表中时,索引填充正确,因此 Phoenix 二级索引配置看起来没问题。

贝洛

这是此命令启动的 map-reduce 作业的类路径上缺少 JAR 的问题。通过反复试验,我汇总了IndexTool在 CDH 5.14.2上运行所需的依赖项列表

cat classpath.txt

/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/lib/hbase/bin/../conf
/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/hbase-common-1.2.0-cdh5.14.2.jar
/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/hbase-client-1.2.0-cdh5.14.2.jar
/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/hbase-hadoop-compat-1.2.0-cdh5.14.2.jar
/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/htrace-core-3.0.4.jar
/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/htrace-core-3.2.0-incubating.jar
/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/htrace-core4-4.0.1-incubating.jar
/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/hbase-protocol-1.2.0-cdh5.14.2.jar
/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/hbase-server-1.2.0-cdh5.14.2.jar
/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/metrics-core-2.2.0.jar
/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/metrics-core-3.1.2.jar
/opt/cloudera/parcels/APACHE_PHOENIX-4.14.0-cdh5.14.2.p0.3/lib/phoenix/lib/tephra-hbase-compat-1.2-cdh-0.14.0-incubating.jar
/opt/cloudera/parcels/APACHE_PHOENIX-4.14.0-cdh5.14.2.p0.3/lib/phoenix/lib/tephra-api-0.14.0-incubating.jar
/opt/cloudera/parcels/APACHE_PHOENIX-4.14.0-cdh5.14.2.p0.3/lib/phoenix/lib/tephra-core-0.14.0-incubating.jar
/opt/cloudera/parcels/APACHE_PHOENIX-4.14.0-cdh5.14.2.p0.3/lib/phoenix/lib/phoenix-core-4.14.0-cdh5.14.2.jar
/opt/cloudera/parcels/APACHE_PHOENIX-4.14.0-cdh5.14.2.p0.3/lib/phoenix/lib/twill-zookeeper-0.8.0.jar
/opt/cloudera/parcels/APACHE_PHOENIX-4.14.0-cdh5.14.2.p0.3/lib/phoenix/lib/twill-discovery-api-0.8.0.jar
/opt/cloudera/parcels/APACHE_PHOENIX-4.14.0-cdh5.14.2.p0.3/lib/phoenix/lib/twill-discovery-core-0.8.0.jar
/opt/cloudera/parcels/APACHE_PHOENIX-4.14.0-cdh5.14.2.p0.3/lib/phoenix/lib/joda-time-1.6.jar
/opt/cloudera/parcels/APACHE_PHOENIX-4.14.0-cdh5.14.2.p0.3/lib/phoenix/lib/antlr-runtime-3.5.2.jar
/usr/local/idx-tool-classpath/disruptor-3.3.6.jar

然后使用 Cloudera Manager,我将所有这些 JAR 添加到每个工作节点上的mapreduce.application.classpath属性中mapred-site.xml

其中一些依赖项也需要提交 MR 作业,因此HADDOP_CLASSPATH我在运行作业的边缘节点上设置以包含所有这些 JAR。

export HADDOP_CLASSPATH=$(paste -s -d: classpath.txt) 

然后我可以使用以下命令运行作业

hadoop jar /opt/cloudera/parcels/APACHE_PHOENIX-4.14.0-cdh5.14.2.p0.3/lib/phoenix/lib/phoenix-core-4.14.0-cdh5.14.2.jar org.apache.phoenix.mapreduce.index.IndexTool -s SCHEMA_NAME -dt TEST_TABLE -it INDEX_TABLE -op /tmp

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章