Page 1 of 2

UDF CsvParser can't load Date

Posted: Wed Jul 24, 2013 10:58 pm
by SThak
I am trying to use TraditionalCsvParser from Vertica's SDK examples and have trouble loading data with a date in it. Other entries work fine. Has anyone experienced this problem? How to fix it?

Re: UDF CsvParser can't load Date

Posted: Thu Jul 25, 2013 7:43 am
by id10t
Hi!

Can you post a log of compilation? If there are any errors or warnings? Did you install libboost-dev?

Because I have no this problem:

Code: Select all

daniel=> CREATE LIBRARY TraditionalCsvParserLib AS '/opt/vertica/sdk/examples/ParserFunctions/CsvParser.so';
CREATE LIBRARY
daniel=> CREATE PARSER CSVParser AS LANGUAGE 'C++' NAME 'CsvParserFactory' LIBRARY TraditionalCsvParserLib;
CREATE PARSER FUNCTION
daniel=> create table csv (d date, i int, v varchar);
CREATE TABLE

Code: Select all

daniel=> copy csv from stdin with parser CsvParser();
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 2000-01-01,666,a
>> \.
daniel=> select * from csv;
     d      |  i  | v 
------------+-----+---
 2000-01-01 | 666 | a
(1 row)
My compilation log:

Code: Select all

daniel@synapse:~$ g++ -D HAVE_LONG_INT_64 \
> -I /opt/vertica/sdk/include \
> -I /opt/vertica/sdk/examples/HelperLibraries \
> -I /usr/include/boost \
> -Wall -shared -Wno-unused-value -fPIC \
>  -o CsvParser.so /opt/vertica/sdk/examples/ParserFunctions/TraditionalCsvParser.cpp \
>  /opt/vertica/sdk/include/Vertica.cpp
In file included from /opt/vertica/sdk/include/VerticaUDx.h:57:0,
                 from /opt/vertica/sdk/include/Vertica.h:76,
                 from /opt/vertica/sdk/examples/ParserFunctions/CoroutineHelpers.h:13,
                 from /opt/vertica/sdk/examples/ParserFunctions/ContinuousUDParser.h:11,
                 from /opt/vertica/sdk/examples/ParserFunctions/TraditionalCsvParser.cpp:3:
/opt/vertica/sdk/include/BasicsUDxShared.h:278:50: warning: narrowing conversion of ‘18444492273895866368ull’ from ‘long long unsigned int’ to ‘Vertica::vint {aka long long int}’ inside { } is ill-formed in C++11 [-Wnarrowing]
In file included from /opt/vertica/sdk/include/VerticaUDx.h:57:0,
                 from /opt/vertica/sdk/include/Vertica.h:76,
                 from /opt/vertica/sdk/include/Vertica.cpp:38:
/opt/vertica/sdk/include/BasicsUDxShared.h:278:50: warning: narrowing conversion of ‘18444492273895866368ull’ from ‘long long unsigned int’ to ‘Vertica::vint {aka long long int}’ inside { } is ill-formed in C++11 [-Wnarrowing]

Re: UDF CsvParser can't load Date

Posted: Thu Jul 25, 2013 5:02 pm
by SThak
Thanks for your reply. I do not have libboost-dev installed, but I don't get any warnings. What is libboost-dev?
Here is my compilation log.
g++ -D HAVE_LONG_INT_64 -I /opt/vertica/sdk/include -I/opt/vertica/sdk/examples/HelperLibraries -Wall -shared -Wno-unused-value -fPIC -o CsvParser.so: examples/ParserFunctions/TraditionalCsvParser.cpp /opt/vertica/sdk/include/Vertica.cpp
CREATE LIBRARY CsvParserLib AS '/home/dbadmin/CsvParser.so';
CREATE PARSER CsvParser AS LANGUAGE 'C++' NAME ‘CsvParserFactory' LIBRARY CsvParserLib;
CREATE TABLE GL.GL_BALANCES
( LEDGER_ID NUMBER(15,0) NOT NULL,
CODE_COMBINATION_ID NUMBER(15,0) NOT NULL,
CURRENCY_CODE VARCHAR2(15) NOT NULL,
PERIOD_NAME VARCHAR2(15) NOT NULL,
ACTUAL_FLAG VARCHAR2(1) NOT NULL,
LAST_UPDATE_DATE DATE NOT NULL,
LAST_UPDATED_BY NUMBER(15,0) NOT NULL,
BUDGET_VERSION_ID NUMBER(15,0),
ENCUMBRANCE_TYPE_ID NUMBER(15,0),
TRANSLATED_FLAG VARCHAR2(1)
);
copy GL.GL_BALANCES from '/home/dbadmin/Data/GLtest.csv' with parser CsvParser() no commit;
GLtest.csv >
289,110052,GBP,APR-01/02,B,1-Oct-03,1001406,1637,,
289,110053,GBP,APR-01/02,A,1-Oct-03,1001406,,,
.....

On a different note, the above compilation creates file *.so: instead of *.so and I have to rename it to *.so before creating library. If I put a space between .so and :, I get error. What am I doing wrong?

Re: UDF CsvParser can't load Date

Posted: Thu Jul 25, 2013 8:00 pm
by id10t
Hi!

hm... now I understand.

I think its a bug:

Lets check formats:

Code: Select all

daniel=> select '2018-01-21'::date;
  ?column?  
------------
 2018-01-21
(1 row)

Code: Select all

daniel=> select to_date('2018-01-21', 'yyyy-mm-dd');
  to_date   
------------
 2018-01-21
(1 row)
Now parser:

Code: Select all

daniel=> copy GL_BALANCES from stdin with parser CsvParser() abort on error  ;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 289,110052,GBP,APR-01/02,B,2018-01-21,1001406,1637,,
>> \.
No errors

Now will try to format our date:

Code: Select all

daniel=> copy GL_BALANCES from stdin with parser CsvParser(format='d-Mon-YY') abort on error  ;
ERROR 0:  Error calling prepare() in User Function CsvParserFactory at [./TraditionalCsvParser.cpp:372], error code: 0, message: Parameter format can only be used as a column option
hm... ok... will try column option:

Code: Select all

daniel=> copy GL_BALANCES column option (LAST_UPDATE_DATE format 'yyyy-mm-dd') from stdin with parser CsvParser() abort on error  ;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 289,110052,GBP,APR-01/02,B,2018-01-21,1001406,1637,, 
>> \.
ERROR 2035:  COPY: Input record 1 has been rejected (Invalid column: name=LAST_UPDATE_DATE, index=6, type=Date(-1), value=2018-01-21)
:shock: and now error

>> What is libboost-dev?

Without header tokenizer.hpp it wont compile. May be on your distro it called little different.
The Vertica SDK provides example parser functions in /opt/vertica/sdk/examples/ParserFunctions. Included in the examples are two CSV parsers; "Traditional CSV" and an RFC 4180 CSV parser. The traditional CSV parser uses the boost::tokenizer library to read the CSV output from common programs such as Microsoft Excel. The RFC 4180 parser parses CSV files written to the RFC 4180 standard and uses libcsv.

Code: Select all

cat TraditionalCsvParser.cpp
/* Copyright (c) 2005 - 2012 Vertica, an HP company -*- C++ -*- */

#include "ContinuousUDParser.h"
#include "StringParsers.h"
#include <pthread.h>

#include <boost/tokenizer.hpp>      //<-- you need boost for this header
...
>> On a different note, the above compilation creates file *.so: instead of *.so ...
A name of a lib doesn't matter. The best way to verify that everything is ok, that all linked:

Code: Select all

$ ldd CsvParser.so 
	linux-vdso.so.1 =>  (0x00007fffa4bc4000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f92fd52a000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f92fd314000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f92fcf54000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f92fcc58000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f92fdb20000)
You must resolve any "not found" or "none"


>> What am I doing wrong?
Why do you have ':' in command line? Did you copy it from make file? I think its come from make file syntax, you dont need it in the command line.

Re: UDF CsvParser can't load Date

Posted: Fri Jul 26, 2013 5:38 pm
by SThak
Thanks for all your help. Let me see if I can figure out where the bug is in the code and try to fix it.

Re: UDF CsvParser can't load Date

Posted: Tue Aug 06, 2013 5:36 pm
by SThak
Need help with CSVParser - If I change the format of the date to 10/1/2013, it works fine for small loads, but for bigger loads, it fails at last column. This is the error it logs in error.log.
COPY: Input record 1 has been rejected (Invalid column: name=PROJECT_TO_DATE_CR_BEQ, index=37, length=1,type=Numeric(38,0), value=
.). Please see /home/dbadmin/VMartDB/v_vmartdb_node0001_catalog/CopyErrorLogs/GL_BALANCES-GL2.csv-copy-from-rejected-data, record 1 for the rejected record.

Any idea what is going wrong here and why the date format 01-Oct-03 does not work but 10/1/2003 does?

Re: UDF CsvParser can't load Date

Posted: Tue Aug 06, 2013 7:21 pm
by JimKnicely
What is your delimiter? The error is actually saying you are referencing an invalid column name...