I am trying to load a CSV file which has a string column in quoted string. Unfortunately I have \n (new line) in the quoted string. Here is the example
Cust_id, cust_address, city,zip
1, "1289 cobb parkway
Bufford", "ATLANTA",34343
2, "1234 IVY lane
Decatur", "ATLANTA",23435
I am trying to use following copy command
COPY tempdb.test_cust
FROM LOCAL 'test.dat'
DELIMITER ','
ENCLOSED BY '"'
NO ESCAPE
SKIP 1
DIRECT;
Above copy script fails with error message
vsql:test.vsql:24: ERROR 2035: COPY(tempdb.test_cust): Input record 1 has been rejected (Too few columns found)
It appears to me line break in between quoted string is considered as record delimiter. Do you have any solution to load this type of delimited file using COPY?
Failed to load CSV file which as line break in quoted column
Moderator: NorbertKrupa
Re: Failed to load CSV file which as line break in quoted co
Hello Vijay,
Welcome to Vertica forums.
The solution to your problem can be in two steps.
Step 1
If you file is as mentioned pattern, then you need a record terminator on every alternate line
Either you can do this in Linux using a simple awk command
I have placed a record terminator ';' on every alternate line.
Step2
Now coming to Vertica, as you have record terminator defined in the csv file now, you can use the record terminator function with the COPY command this way.
This should do the trick for you.
Note - If doing this in two step might not be a good idea, then there is a package called "External Filter" on Github which we can use to do this in a single COPY command. With external filter you have full access to command line tools while using the COPY command.
Hope this helps..
Welcome to Vertica forums.
The solution to your problem can be in two steps.
Step 1
If you file is as mentioned pattern, then you need a record terminator on every alternate line
Either you can do this in Linux using a simple awk command
Code: Select all
1, "1289 cobb parkway
Bufford", "ATLANTA",34343
2, "1234 IVY lane
Decatur", "ATLANTA",23435
awk '!(NR%2){$(NF+1)=";"}1' filename
1, "1289 cobb parkway
Bufford", "ATLANTA",34343;
2, "1234 IVY lane
Decatur", "ATLANTA",23435;
Step2
Now coming to Vertica, as you have record terminator defined in the csv file now, you can use the record terminator function with the COPY command this way.
Code: Select all
COPY tempdb.test_cust
FROM LOCAL 'test.dat'
DELIMITER ','
RECORD TERMINATOR ';'
ENCLOSED BY '"'
NO ESCAPE
SKIP 1
DIRECT;
This should do the trick for you.
Note - If doing this in two step might not be a good idea, then there is a package called "External Filter" on Github which we can use to do this in a single COPY command. With external filter you have full access to command line tools while using the COPY command.
Hope this helps..
Re: Failed to load CSV file which as line break in quoted co
Thanks Navin. I looked at record set. Not all quoted strings have line break. Some records have multiple line break in quoted string. I thought "enclosed by" phrase would not consider \n as record delimiter. Is there any other alternative?
Re: Failed to load CSV file which as line break in quoted co
Hi!
May be too late, but my version (did just for fun):
May be too late, but my version (did just for fun):
- HELP for Python script (let script name 'remove_newline.py'):
You can change any csv option, for example delimiter, line terminator(in case of windows - '\r\n') and so on. See help.Code: Select all
daniel@synapse:/tmp$ ./replace_newline.py -h Usage: replace_newline.py [csv options] -f FILE [-a ACTION] Options: -h, --help show this help message and exit -f FILE, --file=FILE csv FILE to parse. -a ACTION, --action=ACTION 0 - escape "\n" to "\\n"; 1 - replace with space; 2 - remove [default: 1] -d DELIM, --delimiter=DELIM specifies a one-character string as field separator. [default: ,] -q QUOTE, --quote-char=QUOTE specifies a one-character as quoting character. [default: "] -e ESC, --escape-char=ESC specifies a one-character as escape charchter. [default: none] -D, --double-quote two consecutive quotes are interpreted as one. [default: False] -t, --trim removes initial whitespaces. [default: True] -l NEW_LINE, --line-terminator=NEW_LINE specifies the character sequence which rows terminates. [default: \n] -Q QUOTE_TYPE, --quoting=QUOTE_TYPE 0- only when required; 1- all; 2- quote non-numeric; 3- none [default: 0]
- Python script (let script name 'remove_newline.py'):
Code: Select all
#!/usr/bin/env python import csv from optparse import OptionParser # init command line arguments parser usage = "USAGE: %prog [csv options] -f FILE [-a ACTION]" parser = OptionParser(usage=usage) _ = parser.add_option _('-f', '--file', dest='file', metavar='FILE', help='csv FILE to parse.') _('-a', '--action', dest='action', type='int', metavar='ACTION', default=1,help='0 - escape "\\n" to "\\\\n"; 1 - replace with space; 2 - remove [default: %default]') _('-d', '--delimiter', dest='delimiter', default=',', metavar='DELIM',help='specifies a one-character string as field separator. [default: %default]') _('-q', '--quote-char', dest='quotechar', default='"', metavar='QUOTE',help='specifies a one-character as quoting character. [default: %default]') _('-e', '--escape-char', dest='escapechar', metavar='ESC', default=None,help='specifies a one-character as escape charchter. [default: %default]') _('-D', '--double-quote', dest='doublequote', action="store_true", default=False,help='two consecutive quotes are interpreted as one. [default: %default]') _('-t', '--trim', dest='skipinitialspace', action="store_true", default=True,help='removes initial whitespaces. [default: %default]') _('-l', '--line-terminator', dest='lineterminator', default='\n', metavar='NEW_LINE',help='specifies the character sequence which rows terminates. [default: \\n]') _('-Q', '--quoting', dest='quoting', type='int', default=0, metavar='QUOTE_TYPE',help='0- only when required; 1- all; 2- quote non-numeric; 3- none [default: %default]') # parse command line arguments params = parser.parse_args()[0] # validate command line arguments if not params.file: parser.error('option -f require') # init action: apply escape/replace with single space/remove on each new line char replace_char = ['\\\\n', ' ', ''][params.__dict__.pop('action')] action = lambda field: field.replace('\n', replace_char) # start with open(params.file, 'r') as csv_file: del params.file params.strict = True # errors will remain to Vertica csv_reader = csv.reader(csv_file, **params.__dict__) for row in csv_reader: print params.delimiter.join(map(action, row)) # end
- Csv file example:
Code: Select all
daniel@synapse:/tmp$ cat test.csv Cust_id, cust_address, city,zip 1, "1289 cobb parkway Bufford", "ATLANTA",34343 2, "1234 IVY lane Decatur", "ATLANTA",23435 3, "11223 XXX yyy Foo Bar","ATLANTA",555666
- Script in action:
Code: Select all
# set script executable daniel@synapse:/tmp$ chmod +x /tmp/replace_newline.py
Code: Select all
daniel@synapse:/tmp$ ./replace_newline.py -f /tmp/test.csv Cust_id,cust_address,city,zip 1,1289 cobb parkway Bufford,ATLANTA,34343 2,1234 IVY lane Decatur,ATLANTA,23435 3,11223 XXX yyy Foo Bar,ATLANTA,555666
Code: Select all
daniel@synapse:/tmp$ ./replace_newline.py -f /tmp/test.csv --action=0 Cust_id,cust_address,city,zip 1,1289 cobb parkway\\nBufford,ATLANTA,34343 2,1234 IVY lane\\nDecatur,ATLANTA,23435 3,11223 XXX yyy\\nFoo\\n\\nBar,ATLANTA,555666 daniel@synapse:/tmp$
- How to use with COPY:
Code: Select all
daniel@synapse:/tmp$ vsql -c "select * from customer" cust_id | cust_address | city | zip ---------+--------------+------+----- (0 rows) daniel@synapse:/tmp$ ./replace_newline.py -f /tmp/test.csv | vsql -c "copy customer from stdin direct delimiter ','" daniel@synapse:/tmp$ vsql -c "select * from customer" cust_id | cust_address | city | zip ---------+---------------------------+---------+-------- 1 | 1289 cobb parkway Bufford | ATLANTA | 34343 2 | 1234 IVY lane Decatur | ATLANTA | 23435 3 | 11223 XXX yyy Foo Bar | ATLANTA | 555666 (3 rows) [code] daniel@synapse:/tmp$ ./replace_newline.py -f /tmp/test.csv --action=0 | vsql -c "copy customer from stdin direct delimiter ','" daniel@synapse:/tmp$ vsql -c "select * from customer" cust_id | cust_address | city | zip ---------+----------------------------+---------+-------- 1 | 1289 cobb parkway\nBufford | ATLANTA | 34343 2 | 1234 IVY lane\nDecatur | ATLANTA | 23435 3 | 11223 XXX yyy\nFoo\n\nBar | ATLANTA | 555666 (3 rows)
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: Failed to load CSV file which as line break in quoted co
sKwa,
That's a pretty cool solution!
That's a pretty cool solution!
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.