UDF CsvParser can't load Date

Moderator: NorbertKrupa

SThak
Newbie
Newbie
Posts: 10
Joined: Wed Jul 24, 2013 10:30 pm

UDF CsvParser can't load Date

Post by SThak » Wed Jul 24, 2013 10:58 pm

I am trying to use TraditionalCsvParser from Vertica's SDK examples and have trouble loading data with a date in it. Other entries work fine. Has anyone experienced this problem? How to fix it?

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: UDF CsvParser can't load Date

Post by id10t » Thu Jul 25, 2013 7:43 am

Hi!

Can you post a log of compilation? If there are any errors or warnings? Did you install libboost-dev?

Because I have no this problem:

Code: Select all

daniel=> CREATE LIBRARY TraditionalCsvParserLib AS '/opt/vertica/sdk/examples/ParserFunctions/CsvParser.so';
CREATE LIBRARY
daniel=> CREATE PARSER CSVParser AS LANGUAGE 'C++' NAME 'CsvParserFactory' LIBRARY TraditionalCsvParserLib;
CREATE PARSER FUNCTION
daniel=> create table csv (d date, i int, v varchar);
CREATE TABLE

Code: Select all

daniel=> copy csv from stdin with parser CsvParser();
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 2000-01-01,666,a
>> \.
daniel=> select * from csv;
     d      |  i  | v 
------------+-----+---
 2000-01-01 | 666 | a
(1 row)
My compilation log:

Code: Select all

daniel@synapse:~$ g++ -D HAVE_LONG_INT_64 \
> -I /opt/vertica/sdk/include \
> -I /opt/vertica/sdk/examples/HelperLibraries \
> -I /usr/include/boost \
> -Wall -shared -Wno-unused-value -fPIC \
>  -o CsvParser.so /opt/vertica/sdk/examples/ParserFunctions/TraditionalCsvParser.cpp \
>  /opt/vertica/sdk/include/Vertica.cpp
In file included from /opt/vertica/sdk/include/VerticaUDx.h:57:0,
                 from /opt/vertica/sdk/include/Vertica.h:76,
                 from /opt/vertica/sdk/examples/ParserFunctions/CoroutineHelpers.h:13,
                 from /opt/vertica/sdk/examples/ParserFunctions/ContinuousUDParser.h:11,
                 from /opt/vertica/sdk/examples/ParserFunctions/TraditionalCsvParser.cpp:3:
/opt/vertica/sdk/include/BasicsUDxShared.h:278:50: warning: narrowing conversion of ‘18444492273895866368ull’ from ‘long long unsigned int’ to ‘Vertica::vint {aka long long int}’ inside { } is ill-formed in C++11 [-Wnarrowing]
In file included from /opt/vertica/sdk/include/VerticaUDx.h:57:0,
                 from /opt/vertica/sdk/include/Vertica.h:76,
                 from /opt/vertica/sdk/include/Vertica.cpp:38:
/opt/vertica/sdk/include/BasicsUDxShared.h:278:50: warning: narrowing conversion of ‘18444492273895866368ull’ from ‘long long unsigned int’ to ‘Vertica::vint {aka long long int}’ inside { } is ill-formed in C++11 [-Wnarrowing]

SThak
Newbie
Newbie
Posts: 10
Joined: Wed Jul 24, 2013 10:30 pm

Re: UDF CsvParser can't load Date

Post by SThak » Thu Jul 25, 2013 5:02 pm

Thanks for your reply. I do not have libboost-dev installed, but I don't get any warnings. What is libboost-dev?
Here is my compilation log.
g++ -D HAVE_LONG_INT_64 -I /opt/vertica/sdk/include -I/opt/vertica/sdk/examples/HelperLibraries -Wall -shared -Wno-unused-value -fPIC -o CsvParser.so: examples/ParserFunctions/TraditionalCsvParser.cpp /opt/vertica/sdk/include/Vertica.cpp
CREATE LIBRARY CsvParserLib AS '/home/dbadmin/CsvParser.so';
CREATE PARSER CsvParser AS LANGUAGE 'C++' NAME ‘CsvParserFactory' LIBRARY CsvParserLib;
CREATE TABLE GL.GL_BALANCES
( LEDGER_ID NUMBER(15,0) NOT NULL,
CODE_COMBINATION_ID NUMBER(15,0) NOT NULL,
CURRENCY_CODE VARCHAR2(15) NOT NULL,
PERIOD_NAME VARCHAR2(15) NOT NULL,
ACTUAL_FLAG VARCHAR2(1) NOT NULL,
LAST_UPDATE_DATE DATE NOT NULL,
LAST_UPDATED_BY NUMBER(15,0) NOT NULL,
BUDGET_VERSION_ID NUMBER(15,0),
ENCUMBRANCE_TYPE_ID NUMBER(15,0),
TRANSLATED_FLAG VARCHAR2(1)
);
copy GL.GL_BALANCES from '/home/dbadmin/Data/GLtest.csv' with parser CsvParser() no commit;
GLtest.csv >
289,110052,GBP,APR-01/02,B,1-Oct-03,1001406,1637,,
289,110053,GBP,APR-01/02,A,1-Oct-03,1001406,,,
.....

On a different note, the above compilation creates file *.so: instead of *.so and I have to rename it to *.so before creating library. If I put a space between .so and :, I get error. What am I doing wrong?

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: UDF CsvParser can't load Date

Post by id10t » Thu Jul 25, 2013 8:00 pm

Hi!

hm... now I understand.

I think its a bug:

Lets check formats:

Code: Select all

daniel=> select '2018-01-21'::date;
  ?column?  
------------
 2018-01-21
(1 row)

Code: Select all

daniel=> select to_date('2018-01-21', 'yyyy-mm-dd');
  to_date   
------------
 2018-01-21
(1 row)
Now parser:

Code: Select all

daniel=> copy GL_BALANCES from stdin with parser CsvParser() abort on error  ;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 289,110052,GBP,APR-01/02,B,2018-01-21,1001406,1637,,
>> \.
No errors

Now will try to format our date:

Code: Select all

daniel=> copy GL_BALANCES from stdin with parser CsvParser(format='d-Mon-YY') abort on error  ;
ERROR 0:  Error calling prepare() in User Function CsvParserFactory at [./TraditionalCsvParser.cpp:372], error code: 0, message: Parameter format can only be used as a column option
hm... ok... will try column option:

Code: Select all

daniel=> copy GL_BALANCES column option (LAST_UPDATE_DATE format 'yyyy-mm-dd') from stdin with parser CsvParser() abort on error  ;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 289,110052,GBP,APR-01/02,B,2018-01-21,1001406,1637,, 
>> \.
ERROR 2035:  COPY: Input record 1 has been rejected (Invalid column: name=LAST_UPDATE_DATE, index=6, type=Date(-1), value=2018-01-21)
:shock: and now error

>> What is libboost-dev?

Without header tokenizer.hpp it wont compile. May be on your distro it called little different.
The Vertica SDK provides example parser functions in /opt/vertica/sdk/examples/ParserFunctions. Included in the examples are two CSV parsers; "Traditional CSV" and an RFC 4180 CSV parser. The traditional CSV parser uses the boost::tokenizer library to read the CSV output from common programs such as Microsoft Excel. The RFC 4180 parser parses CSV files written to the RFC 4180 standard and uses libcsv.

Code: Select all

cat TraditionalCsvParser.cpp
/* Copyright (c) 2005 - 2012 Vertica, an HP company -*- C++ -*- */

#include "ContinuousUDParser.h"
#include "StringParsers.h"
#include <pthread.h>

#include <boost/tokenizer.hpp>      //<-- you need boost for this header
...
>> On a different note, the above compilation creates file *.so: instead of *.so ...
A name of a lib doesn't matter. The best way to verify that everything is ok, that all linked:

Code: Select all

$ ldd CsvParser.so 
	linux-vdso.so.1 =>  (0x00007fffa4bc4000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f92fd52a000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f92fd314000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f92fcf54000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f92fcc58000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f92fdb20000)
You must resolve any "not found" or "none"


>> What am I doing wrong?
Why do you have ':' in command line? Did you copy it from make file? I think its come from make file syntax, you dont need it in the command line.

SThak
Newbie
Newbie
Posts: 10
Joined: Wed Jul 24, 2013 10:30 pm

Re: UDF CsvParser can't load Date

Post by SThak » Fri Jul 26, 2013 5:38 pm

Thanks for all your help. Let me see if I can figure out where the bug is in the code and try to fix it.

SThak
Newbie
Newbie
Posts: 10
Joined: Wed Jul 24, 2013 10:30 pm

Re: UDF CsvParser can't load Date

Post by SThak » Tue Aug 06, 2013 5:36 pm

Need help with CSVParser - If I change the format of the date to 10/1/2013, it works fine for small loads, but for bigger loads, it fails at last column. This is the error it logs in error.log.
COPY: Input record 1 has been rejected (Invalid column: name=PROJECT_TO_DATE_CR_BEQ, index=37, length=1,type=Numeric(38,0), value=
.). Please see /home/dbadmin/VMartDB/v_vmartdb_node0001_catalog/CopyErrorLogs/GL_BALANCES-GL2.csv-copy-from-rejected-data, record 1 for the rejected record.

Any idea what is going wrong here and why the date format 01-Oct-03 does not work but 10/1/2003 does?

User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Re: UDF CsvParser can't load Date

Post by JimKnicely » Tue Aug 06, 2013 7:21 pm

What is your delimiter? The error is actually saying you are referencing an invalid column name...
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

Post Reply

Return to “Vertica User Defined Functions (UDFs)”