Page 1 of 1

Pig Connector

Posted: Mon Nov 12, 2012 2:59 pm
by martijn
Hello,

When i try to load data from vertica into hdfs using Pig.

Code: Select all

grunt> register /usr/local/hadoop-vertica.jar
grunt> register /usr/local/pig-vertica.jar
grunt> A = LOAD 'sql://{select * from table LIMIT 100}' USING com.vertica.pig.VerticaLoader('192.168.55.48,192.168.55.13,192.168.55.173', 'baseline', '5433',  'dbadmin', 'db');
grunt> STORE A INTO '/user/test.txt'
I get the following error:

Code: Select all

ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A
	at org.apache.pig.PigServer.openIterator(PigServer.java:862)
	at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:682)
	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
	at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
	at org.apache.pig.Main.run(Main.java:490)
	at org.apache.pig.Main.main(Main.java:111)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias A
	at org.apache.pig.PigServer.storeEx(PigServer.java:961)
	at org.apache.pig.PigServer.store(PigServer.java:924)
	at org.apache.pig.PigServer.openIterator(PigServer.java:837)
	... 12 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117: Unexpected error when launching map reduce job.
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:322)
	at org.apache.pig.PigServer.launchPlan(PigServer.java:1275)
	at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1260)
	at org.apache.pig.PigServer.storeEx(PigServer.java:957)
	... 14 more
Caused by: java.lang.RuntimeException: Could not resolve error that occured when launching map reduce job: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
	at com.vertica.hadoop.VerticaUtil.getSplits(VerticaUtil.java:102)
	at com.vertica.hadoop.VerticaInputFormat.getSplits(VerticaInputFormat.java:140)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:273)
	at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1014)
	at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1031)
	at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:943)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:531)
	at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:318)
	at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.startReadyJobs(JobControl.java:238)
	at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:269)
	at java.lang.Thread.run(Thread.java:662)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)

	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:631)
	at java.lang.Thread.dispatchUncaughtException(Thread.java:1874)

Correct me if iam wrong but, it looks like the vertica connector expects JobContext to be an class but it is an Interface.
Iam using Cloudera's CDH 4.1.1 hadoop distribution.

Does anyone know what i need to do to make this work, or what i did wrong?

Greetings,
Martijn

Re: Pig Connector

Posted: Mon Nov 12, 2012 3:13 pm
by id10t
Hi!


Have you tried it locally (pig -x local)? Same error?

I can think only about:
* you have no permissions for Pig temporary folder or defined same as Hadoop/MapReduce.
* Vertica splits failed (if Pig require splits by partitions - it should fail)

Re: Pig Connector

Posted: Mon Nov 12, 2012 4:50 pm
by martijn
sKwa wrote:Hi!


Have you tried it locally (pig -x local)? Same error?

I can think only about:
* you have no permissions for Pig temporary folder or defined same as Hadoop/MapReduce.
* Vertica splits failed (if Pig require splits by partitions - it should fail)
Yes same error in local mode.

After some more research and looking into the Hadoop and Pig connector source code.
Its looks like the connecter is not compatible with Hadoop 2.0.0 (which i'm running)
In Hadoop version 1.x JobContext was a Class but they changed it to an Interface.

Re: Pig Connector

Posted: Mon Nov 12, 2012 6:07 pm
by id10t
Hi!

Yeap! You are right, just found:
Question:

The Vertica 6 Hadoop Connector supports the combinations of Apache, Hadoop, and Apache Pig listed below.


Solution:

Use the Vertica 6 Hadoop connector with only these version pairs:
• Hadoop 0.20.2 and Pig 0.7.0
• Hadoop 0.20.205.0 and Pig 0.9.1
• Hadoop 1.0.0 and Pig 0.9.2
Date of info: 6/25/2012

Re: Pig Connector

Posted: Mon Nov 12, 2012 6:18 pm
by id10t
BTW: I see you know java, take a look on hadoop connector source on GitHub. Current source, that in "trunk" so terrible that i think it's a Halloween joke and code that deprecated much better. For big vendor I think it's a shame to put such code in trunk. I suggest you to rewrite it, it's not so hard (current code mostly based on Cloudera DBInputFile.java)

[me? :-) Waiting for a new connector, so far I'm writing myself connections and splits, 'coz `LIMIT-OFFSET` method for even a couple millions rows just will kill db.]

Re: Pig Connector

Posted: Tue Nov 13, 2012 12:47 pm
by martijn
I made a working version of the Hadoop Connector (both Hadoop and PIG) for Hadoop version 2.x. Basically I just rebuilt the JAR provided by Vertica with Hadoop 2.x dependencies, and now it seems to work. It has not been extensively tested yet...

Download it here: http://dl.dropbox.com/u/122838/hadoop-vertica.jar

Re: Pig Connector

Posted: Mon Oct 27, 2014 6:05 am
by vijayrkadel
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. tried to access method org.apache.hadoop.mapred.TaskReport.downgradeArray([Lorg/apache/hadoop/mapreduce/TaskReport;)[Lorg/apache/hadoop/mapred/TaskReport; from class org.apache.hadoop.mapred.DowngradeHelper

I am using hadoop 2.2.5 and pig 0.13 and i have used same "hadoop-vertica.jar" as give above for hadoop 2.x ? and in local mode this is working ..

Please help me out