Page 1 of 1

"ERROR 5127" on Backup

Posted: Mon Oct 14, 2013 1:57 pm
by otw
Hi all,

we just recently updated our Vertica from 5.0 to 6.1
After that we added two new nodes to the, until then, single instance (so three alltogether now).

But now the backups fails with a rather strange eror:

Code: Select all

[dbadmin@n131 ~]$ /opt/vertica/bin/vbr.py -t backup --debug 2
Preparing...
Found Database port:  5433
[...]
v_x_node0001 10.x.x.131 /data/vertica_db/x/v_x_node0001_catalog
v_x_node0002 10.x.x.132 /data/vertica_db/x/v_x_node0002_catalog
v_x_node0003 10.x.x.133 /data/vertica_db/x/v_x_node0003_catalog
set([('10.x.x.133', True), ('10.x.x.193', False), ('10.x.x.131', True), ('10.x.x.132', True)])
/opt/vertica/bin/vsql  -dx -p5433 -Udbadmin -h 10.x.x.133 -X -q -c "select database_snapshot('test', true);
"
ERROR 5127:  Unable to create snapshot Could not link file [/data/vertica_db/x/v_x_node0003_catalog/Snapshots/test/data/vertica_db/x/procedures/dumpquerytotmp.sh] to [/data/vertica_db/x/procedures/dumpquerytotmp.sh]: File exists

When communicating with vertica, the process failed with code 1
backup failed!
I'm starting the backup from the original single instance host (131) and backup should go to a separate backup host (193)

I already tried deleting one or the other of the files, but no change.
With one, I get same error, with the other that the file is missing.

Any idea what else I can do?

Re: "ERROR 5127" on Backup

Posted: Mon Oct 14, 2013 4:05 pm
by JimKnicely
Hi,

Not sure if this is relevant, but these are listed as assumptions for the Python script vbt.py:
  • # 1. Python 2.7+ and rsync 3.0.5+ packaged by vertica exist on all hosts.
    # 2. It's ok to use backup hosts outside the cluster, as long as python and rsync with same version/path are installed.
    # 3. Current user can ssh on all hosts influenced without password prompt.
    # 4. Current user has write access on all target directories, and must be a Vertica super user.
    # 5. Only one vbr script instance is running on the same cluster & all backup nodes. Cancellation can be done by ^C.
    # 6. When doing backup, the cluster must be UP. Vbr script must be run on one of the cluster node.
    # 7. When restoring full cluster, cluster must exist and the nodes to be restored must be in DOWN state. When restoring objects, cluster must be UP. Vbr script must be run on one of the cluster node.
    # 8. When using copycluster, source cluster and all nodes must be UP, target cluster must be DOWN. Vbr script must be run on one of the source cluster node.
Do you meet all of these requirements?

Re: "ERROR 5127" on Backup

Posted: Mon Oct 14, 2013 5:13 pm
by otw
Hi,

yeah, I have seen those infos in the script and check them.

- Package versions etc. are identical as all hosts are running Centos 5.9 with latest updates.
- passwordless access for 'dbadmin' also works fine in any direction and 'dbadmin' has all filesystem permissions where needed

I also tried running it from the other cluster nodes (even node3 where the error message originates) - no difference, same error :(

Now also tried changing snapshot name (new one, never used before): same error

Re: "ERROR 5127" on Backup

Posted: Mon Oct 14, 2013 5:58 pm
by JimKnicely
Hmm. I would try backing up a single table to see if at least that works.

Re: "ERROR 5127" on Backup

Posted: Mon Oct 14, 2013 8:55 pm
by scutter
Since you've already tried a new snapshot name with the same results, I wonder if there is a bug related to to backing up external procedures scripts. If you have a test environment where you can create a new database for testing, try backing up a multi-node database that contains an external procedure just to make sure that this piece is working correctly.

--Sharon

Re: "ERROR 5127" on Backup

Posted: Tue Oct 15, 2013 11:48 am
by otw
scutter wrote:Since you've already tried a new snapshot name with the same results, I wonder if there is a bug related to to backing up external procedures scripts. If you have a test environment where you can create a new database for testing, try backing up a multi-node database that contains an external procedure just to make sure that this piece is working correctly.

--Sharon
Hello Sharon,

thanks, that was just the right keyword :)
I thought that file in the "procedures" directory belongs to Vertica, but it is just a custom script my colleagues are using (sorry, I'm not the Vertica specialist, just trying to back it up ;))

So that external procedure in our case is doing nothing very important so we dropped it for testing.
After that, the backup-script is running like a charm!!!

We're letting the backup run for now, so we have an up-to-date backup again.

Afterwards we'll try to add the external procedure again and see what happens then.