The Vertica Database Forums

Posted: **Tue May 23, 2017 5:44 pm**

I am working to transfer data from a SAP IQ database to a Vertica database.
The DDL contains a number of IDENTITY columns, and I need to preserve the data from the IQ server when it's pulled into Vertica.
The reason for this is to avoid having to re-key all the data in the child tables which depend on the identity column values in the parent.
So far, I have had some success by pulling the IQ data out along with dummy rows for the "missing" identity column values, and ordering it by the identity column. When I load the data sans the identity column value, I can then "delete" the dummy rows and end up with basically the correct set of data.
Where I run into trouble is that im trying to get the identity column to start at "1". I don't recall if it ever started at "1", but after several inserts/truncates the values are now starting around 3000001. I tried to subtract 3000000 from the column value, but am not able to update the identity column.
Can anyone suggest a way to get the identity column to start with "1" so that I can load the data with the gaps?

Posted: **Wed May 24, 2017 4:07 pm**

Hi,

Dealing with IDENTITY columns is a pain.

IDENTITY columns start at 1 by default. You are probably seeing such high numbers for the IDENTITY because each node caches 25,000 values by default.

One solution for your question could be to alter the sequence that the IDENTITY is using and make it restart at 1. But, as the following example shows, that wont work!

Example:

dbadmin=> CREATE TABLE pepsi (c1 IDENTITY, c2 VARCHAR(10));
CREATE TABLE

dbadmin=> \ds *pepsi*
List of Sequences
Schema | Sequence | CurrentValue | IncrementBy | Minimum | Maximum | AllowCycle | Comment
--------+--------------+--------------+-------------+---------+---------------------+------------+---------
public | pepsi_c1_seq | 0 | 1 | 1 | 9223372036854775807 | f |
(1 row)

dbadmin=> INSERT INTO pepsi (c2) VALUES ('TEST1');
OUTPUT
--------
1
(1 row)

dbadmin=> SELECT * FROM pepsi;
c1 | c2
----+-------
1 | TEST1
(1 row)

dbadmin=> ALTER SEQUENCE pepsi_c1_seq RESTART 1;
ALTER SEQUENCE

dbadmin=> INSERT INTO pepsi (c2) VALUES ('TEST2);
OUTPUT
--------
1
(1 row)

dbadmin=> SELECT * FROM pepsi;
c1 | c2
----+-------
1 | TEST1
2 | TEST2
(2 rows)

dbadmin=> TRUNCATE TABLE pepsi;
TRUNCATE TABLE

dbadmin=> ALTER SEQUENCE pepsi_c1_seq RESTART 1;
ALTER SEQUENCE

dbadmin=> INSERT INTO pepsi (c2) VALUES ('TEST1');
OUTPUT
--------
1
(1 row)

dbadmin=> SELECT * FROM pepsi;
c1 | c2
----+-------
3 | TEST1
(1 row)

If the new tables in Vertica require IDENTITY columns, below is how I handle keeping source IDENTITY values in Vertica, including gaps...

Note that this code works for preserving identity values, but only on a single node. The reason is because in a multi-node cluster each node is providing a cached IDENTITY value so the data can appear out of order when inserted on multiple nodes, even if each node only caches one value.

Here is an example on a single node:

dbadmin=> SELECT COUNT(*) FROM nodes;
COUNT
-------
1
(1 row)

dbadmin=> CREATE TABLE t1_temp (c1 INT, c2 VARCHAR); -- Load this table with client data
CREATE TABLE

dbadmin=> CREATE TABLE t1 (c1 IDENTITY, c2 VARCHAR); -- This is the new Vertica table
CREATE TABLE

dbadmin=> -- Fake data
dbadmin=> INSERT INTO t1_temp select 1, 'val1';
OUTPUT
--------
1
(1 row)

dbadmin=> INSERT INTO t1_temp select 2, 'val2';
OUTPUT
--------
1
(1 row)

dbadmin=> INSERT INTO t1_temp select 3, 'val3';
OUTPUT
--------
1
(1 row)

dbadmin=> INSERT INTO t1_temp select 5, 'val5';
OUTPUT
--------
1
(1 row)

dbadmin=> COMMIT;
COMMIT

dbadmin=> SELECT * FROM t1_temp ORDER BY 1;
c1 | c2
----+------
1 | val1
2 | val2
3 | val3
5 | val5
(4 rows)

dbadmin=> INSERT INTO t1 (c2)
dbadmin-> SELECT tab2.c2
dbadmin-> FROM (SELECT row_number() over () c1 FROM tables CROSS JOIN columns CROSS JOIN columns cols2) tab1
dbadmin-> LEFT JOIN (SELECT c1, c2 FROM t1_temp) tab2 ON tab2.c1 = tab1.c1
dbadmin-> WHERE tab1.c1 <= (SELECT MAX(c1) FROM t1_temp)
dbadmin-> ORDER BY tab1.c1;
OUTPUT
--------
5
(1 row)

dbadmin=> DELETE FROM t1 WHERE c2 IS NULL;
OUTPUT
--------
1
(1 row)

dbadmin=> SELECT * FROM t1 ORDER BY c1;
c1 | c2
----+------
1 | val1
2 | val2
3 | val3
5 | val5
(4 rows)

dbadmin=> DROP TABLE t1_temp;
DROP TABLE

Notice that the data in T1 now matches the original IDENTITY values. You can now INSERT okay using the IDENTITY:

dbadmin=> INSERT INTO t1 (c2) VALUES ('new-value');
OUTPUT
--------
1
(1 row)

dbadmin=> SELECT * FROM t1 ORDER BY c1;
c1 | c2
--------+-----------
1 | val1
2 | val2
3 | val3
5 | val5
250001 | new-value
(5 rows)

So if this is a new client, with three nodes for example, I can think of several options:

1. Create a three node cluster

Create a database on only one of the nodes
Load all of the data (including identities) using SQL like that was shown above
Add the other two nodes to the database
Alter the K-Safety to 1
Rebalance the data

2. Create a three node cluster

Create a database on only one of the nodes
Create a database on all three nodes
Lower the K-Safety to 1
Create tables having all the projections on a single node
Load all of the data (including identities) using SQL like that was shown above
Alter the K-Safety to 1
Rebalance the data

If the client doesn't care about using IDENTITY columns in Vertica, I think a a better solution is to use a sequence as a default value for an INT column.

Example Code:

dbadmin=> DROP TABLE IF EXISTS t1;
DROP TABLE

dbadmin=> CREATE TABLE t1 (c1 INT, c2 VARCHAR); -- Load this table with client data
CREATE TABLE

dbadmin=> -- Fake data load
dbadmin=> INSERT INTO t1 select 1, 'val1';
OUTPUT
--------
1
(1 row)

dbadmin=> INSERT INTO t1 select 2, 'val2';
OUTPUT
--------
1
(1 row)

dbadmin=> INSERT INTO t1 select 3, 'val3';
OUTPUT
--------
1
(1 row)

dbadmin=> INSERT INTO t1 select 5, 'val5';
OUTPUT
--------
1
(1 row)

dbadmin=> COMMIT;
COMMIT

dbadmin=> SELECT * FROM t1 ORDER BY 1;
c1 | c2
----+------
1 | val1
2 | val2
3 | val3
5 | val5
(4 rows)

dbadmin=> CREATE SEQUENCE t1_seq START WITH 6; -- Start with the max Identity value + 1
CREATE SEQUENCE

dbadmin=> ALTER TABLE t1 ALTER COLUMN c1 SET DEFAULT t1_seq.NEXTVAL;
ALTER TABLE

dbadmin=> INSERT INTO t1 (c2) SELECT 'new-value';
OUTPUT
--------
1
(1 row)

dbadmin=> SELECT * FROM t1 ORDER BY c1;
c1 | c2
----+-----------
1 | val1
2 | val2
3 | val3
5 | val5
6 | new-value
(5 rows)

The Vertica Database Forums

Changing the value of an IDENTITY column?

Changing the value of an IDENTITY column?

Re: Changing the value of an IDENTITY column?