Stream data to vertica using copy, perl and variables

Moderator: NorbertKrupa

Post Reply
Canarchiste
Newbie
Newbie
Posts: 7
Joined: Mon Aug 22, 2016 6:24 am

Stream data to vertica using copy, perl and variables

Post by Canarchiste » Mon Aug 22, 2016 8:35 am

Hello everyone !

First of all, please apologize my poor english, let me know if there is something you can't understand.

Hope you can help me on my problem, let me introduce the background :
I have to load huge amounts of datas from an Oracle database to a Vertica database on a daily basis.
For performances problematics, I can't use Java or tmp files such as .csv or .txt (around 50 tables involved, some gets several millions lines, every night)
I am restricted to windows.

So the basic solution for this is the following :

Code: Select all

echo @MyPath\Myscript.sql | sqlplus user/pwd@mydb | vsql -h host -d mydb -U user -w pwd -c "COPY Table(DATA) FROM LOCAL STDIN DELIMITER '|' DIRECT STREAM NAME 'Loading data from myscript' EXCEPTIONS 'exception.txt' REJECTED DATA 'rejected.txt';"
This work, until here no problems.
Of course, I want to check also my Constraints. There is PK, FK and UK to check. I don't want to go on complex and fancy SELECT/INSERT/DELETE queries or/with some TMP tables when i can use SELECT ANALYSE_CONSTRAINT and the NO COMMIT options.

Here come the core problem (not the core question, please read until the end).

If i do such thing as :

Code: Select all

vsql -h host -d mydb -w pwd -c "COMMIT;" | echo @MyPath\Myscript.sql | sqlplus user/pwd@mydb | vsql -h host -d mydb -U user -w pwd -c "COPY Table(DATA) FROM LOCAL STDIN DELIMITER '|' DIRECT STREAM NAME 'Loading data from myscript' EXCEPTIONS 'exception.txt' REJECTED DATA 'rejected.txt' NO COMMIT;"
This will not work, because for him there is nothing to commit as far as the copy is in another session. Fair enough.

Next :

Code: Select all

 
#Initialize ON_ERROR_STOP for avoid commit if constraint violated
vsql -h host -d mydb -U user -w pwd -c "\set ON_ERROR_STOP on"

echo @MyPath/Myscript.sql | sqlplus user/pwd@mydb | vsql -h host -d mydb -U user -w pwd -c "COPY Table(DATA) FROM LOCAL STDIN DELIMITER '|' DIRECT STREAM NAME 'Loading data from myscript' EXCEPTIONS 'exception.txt' REJECTED DATA 'rejected.txt' NO COMMIT;SELECT 1/(1-SIGN( COUNT(*)))FROM (SELECT ANALYZE_CONSTRAINTS('Table')) as CHECK_CONSTRAINT; COMMIT;"
This will work but will not raise any message if an error is raised, and it is a bit nasty as far as -c command is supposed to execute only one command (and will return the result of the first query (COPY), explaining why there is no error message returned. But tried, it actually executes the others commands).

You can't know when an error is raised (there is 50 tables implied) neither knowing what keys have been violated.
So bad solution.

Next solution : using sql script launched by vsql.

Code: Select all

echo @MyPath\Myscript.sql | sqlplus user/pwd@mydb | vsql -h host -d mydb -U user -w pwd -f myVertScript.sql
myVertScript.sql :

Code: Select all

COPY Table(DATA) FROM LOCAL STDIN DELIMITER '|' DIRECT STREAM NAME 'Loading data from myscript' EXCEPTIONS 'exception.txt' REJECTED DATA 'rejected.txt' NO COMMIT;
SELECT ANALYZE_CONSTRAINTS('Table');
COMMIT;
Of course this will not work. He will consider than you are putting the following AS A STRING into STDIN :

Code: Select all

SELECT ANALYZE_CONSTRAINTS('Table');
COMMIT;
So I was thinking about this : what about getting the sqlplus request result into a variable, using the -v option to transfer the variable with the datas to load into the sql script, then put it into stdin ?

I used a very simple perl script to get this variable (I tried in a first place to use DBI but you can't with STDIN. There is a postgre perl module allowing you to use pg_putcopydata() to load your datas into STDIN pipe, but there is nothing like that for vertica. Once again this is build to use .txt or .csv files) :

Code: Select all

#execute the cmd prompt (which call the sqlplus query) and store the result into variable $test1
my $test1=`sqlplus -s user/pwd@mydb \@MyPath\\Myscript.sql`;

#execute the cmd prompt which call the vsql command and store the result into variable $test2
my $test2=`vsql -h host -d mydb -U user -w pwd -v var="$test1" -f myVertScript.sql`;

printf "$test2";
myVertScript.sql

Code: Select all

COPY Table(DATA) FROM LOCAL STDIN DELIMITER '|' DIRECT STREAM NAME 'Loading data from myscript' EXCEPTIONS 'exception.txt' REJECTED DATA 'rejected.txt' NO COMMIT;
\\echo :var
\.
SELECT ANALYZE_CONSTRAINTS('table') as CHECK_CONSTRAINT;
COMMIT;
But didn't worked, as far as he still consider \echo :var as a string despite the escape.
Tried also :
:var
\echo :var
I also tried to put the copy statements into the :var (concatenate with the datas to load, separated by \r\n, all in the perl script) but vsql consider they are (copy and datas) on the same line despite the \r\n (using an echo into the myVertScript.sql showed me the \r\n is well printed)

So here is the core question :
Is there any way to put :var value into STDIN within a (v)sql script ?

This is very frustrating. I got my datas into my vsql session, but the COPY seems just to don't have the tools to catch them directly instead of using some slow-to-build-and-to-get heavy txt files (some of the tables get several millions lines to transfer every night).
But STDIN of course, which i can't use into a script for the reason above.

Hope you can help me !
Best regards,


EDIT : Just seen the dedicated Data load field... Maybe this topic should be moved.

EDIT2 :
I am aware about EnableNewPrimaryKeysByDefault and EnableNewUniqueKeysByDefault but we still need to use analyze_constraint for the FK. So it is nice to double tap your checking process but it is not enough. Moreover it is triggering ERROR even with NO COMMIT parameter used, meaning you can't use it with analyze_constraint().

Canarchiste
Newbie
Newbie
Posts: 7
Joined: Mon Aug 22, 2016 6:24 am

Re: Stream data to vertica using copy, perl and variables

Post by Canarchiste » Mon Aug 29, 2016 7:17 am

Well, it appears that if you pipe the data through a txt file (piping, NOT spooling) the loading speed is quite the same than piping through the STDIN. You can even earn some perf by compressing the data also.

Not using pearl anymore, this looks like this :

Code: Select all

sqlplus -s %oracleInfo% @myscript.sql | gzip -f > myData.gz & vsql -E -h %vHost% -d %vDbName% -U %vUser% -w %vPwd% -v var=%batchId% -f myVertScript.sql
Hope it gonna help someone.

Post Reply

Return to “Vertica Database Development”