Page 1 of 1

Remove a node that's never coming back

Posted: Mon Aug 07, 2017 4:58 pm
by jaspot
Greetings,

We had an eight node cluster on Vertica 6, and node002 is gone and will never be coming back after catastrophic hardware failure. We've re balanced the data away (which in our understanding means we're now k-safe again), but would like to remove the failed node entirely. When attempting to do so via admintools it tells us we need all nodes to be up before we can remove the node.

We'd really rather prefer not to stand up another node just to bring the node down.

Is this our best method going forward, since we can not remove the node cleanly through admintools without all Vertica nodes being available?

Re: Remove a node that's never coming back

Posted: Mon Aug 07, 2017 11:59 pm
by sKwa
Hi!

Removing a node from admintools.conf will not do what you need because your database is up.

Re: Remove a node that's never coming back

Posted: Tue Aug 08, 2017 2:44 pm
by JimKnicely
Hi,

The only way that I am aware of to completely remove a node that has gone AWOL is to use the catalog editor's DROP NODE command.

If you have Vertica support, I recommend that you open a case so that they can help you with this!

Thanks!

Re: Remove a node that's never coming back

Posted: Tue Aug 08, 2017 9:29 pm
by sKwa
Hi!

Vertica 6 Removing Hosts from a Cluster: https://my.vertica.com/docs/6.1.x/HTML/ ... #10282.htm

Procedure to Remove Hosts

From one of the hosts in the cluster, run update_vertica with the –R switch, where -R specifies a comma-separated list of hosts to remove from an existing HP Vertica cluster. A host can be specified by the hostname or IP address of the system.:

Code: Select all

/opt/vertica/sbin/update_vertica -R host

Important Tips:
  • A host does not need to be functional, or even exist, to be removed as long as the database design no longer includes a node on it. Specify the hostname or IP address that you used originally for the installation. Adding hosts to and removing them from VM-based clusters can lead to a situation in which a host doesn't exist.
Good Luck.

Re: Remove a node that's never coming back

Posted: Tue Aug 08, 2017 10:44 pm
by JimKnicely
Not sure the 6.1 docs apply to later releases of Vertica (i.e. 8.1.x). From my understanding of the issue, there is no host (physical) to remove. So how do you run the "/opt/vertica/sbin/update_vertica -R host" command when the host no longer exists? ;)

For that matter, first you'll need to remove the node from database ... which I believe you tried, and you saw that you can't do that 'cause the node will be in a DOWN state...

Example:

I have 4 nodes:

[dbadmin@vertica01 ~]$ vsql -c "select node_name, node_address, node_state from nodes order by 1;"
node_name | node_address | node_state
-----------------+---------------+------------
v_test_node0001 | 192.168.2.200 | UP
v_test_node0002 | 192.168.2.201 | UP
v_test_node0003 | 192.168.2.202 | UP
v_test_node0004 | 192.168.2.203 | UP
(4 rows)


Now I will simulate a failure by shutting down host 192.168.2.203:

[dbadmin@vertica01 ~]$ ssh root@192.168.2.203 shutdown -h now
root@192.168.2.203's password:
Connection to 192.168.2.203 closed by remote host.

[dbadmin@vertica01 ~]$ vsql -c "select node_name, node_address, node_state from nodes order by 1;"
WARNING 4539: Received no response from v_test_node0004 in roll back transaction
ERROR 4539: Received no response from v_test_node0004 in transaction bind

[dbadmin@vertica01 ~]$ vsql -c "select node_name, node_address, node_state from nodes order by 1;"
node_name | node_address | node_state
-----------------+---------------+------------
v_test_node0001 | 192.168.2.200 | UP
v_test_node0002 | 192.168.2.201 | UP
v_test_node0003 | 192.168.2.202 | UP
v_test_node0004 | 192.168.2.203 | DOWN
(4 rows)


So the node is gone and I am not going to replace it. I want to remove it:

[dbadmin@vertica01 ~]$ admintools -t db_remove_node -s 192.168.2.203 -d test
connecting to 192.168.2.200
Error removing node(s) from database.
['All nodes must be UP or STANDBY before dropping a node']

Can't do that!

And here is an example of running /opt/vertica/sbin/update_vertica -R, trying to remove the node:

[dbadmin@vertica01 ~]$ su - root
Password:
Last login: Tue Aug 8 17:26:30 EDT 2017 on pts/0
[root@vertica01 ~]# /opt/vertica/sbin/update_vertica -T -R 192.168.2.203
Vertica Analytic Database 8.1.1-2 Installation Tool


>> Validating options...


Mapping hostnames in --remove-hosts (-R) to addresses...

>> Starting installation tasks.
>> Getting system information for cluster (this may take a while)...

Enter password for root@192.168.2.201 (3 attempts left):
Warning: could not connect to 192.168.2.203:
Unable to SSH as root to 192.168.2.203: UNEXPECTED EOF: Could not login with ssh. ssh output: 'ssh: connect to host 192.168.2.203 port 22: No route to host\r\r\n'
Ignoring down host 192.168.2.203 because it is due to be removed.
Warning: 192.168.2.203 will not have it's cluster information correctly updated
Unable to SSH as root to 192.168.2.203: UNEXPECTED EOF: Could not login with ssh. ssh output: 'ssh: connect to host 192.168.2.203 port 22: No route to host\r\r\n'
Error connecting to hosts: {'192.168.2.203': "Unable to SSH as root to 192.168.2.203: UNEXPECTED EOF: Could not login with ssh. ssh output: 'ssh: connect to host 192.168.2.203 port 22: No route to host\\r\\r\\n'"}
Installation FAILED with errors.

Installation stopped before any changes were made.


Can't do that either!

Re: Remove a node that's never coming back

Posted: Tue Aug 08, 2017 11:50 pm
by sKwa
Hi!

Jim:
Not sure the 6.1 docs apply to later releases of Vertica (i.e. 8.1.x)...

jaspot:
We had an eight node cluster on Vertica 6,...

P.S.:
@Jim
I just quoted Vertica docs (with reference/link), nothing more.

Re: Remove a node that's never coming back

Posted: Wed Aug 09, 2017 1:29 pm
by JimKnicely
@sKwa: Yeah, no worries! I missed the part where jaspot said he was running Vertica 6 :oops: I wasn't questioning you or the docs for 6.1 but just wanted to point out that you can't remove a non-existent node in Vertica 8 the same way you may have in Vertica 6.1. Although, I think you should be able to! I had a colleague who had a client with the same situation. That is, the client could not drop a failed node from the database because there was no way to switch the node to an UP or STAND BY state. The only way we could figure out how to drop it was via the catalog editor. Actually pretty simple that way, but too dangerous for just anyone to attempt as its very easy to corrupt a DB catalog with CE.

@jaspot: Since you are running Vertica 6 (Why?), sKwa's suggestion should work for you! I guess you can't open a support case now as Vertica 6 is no longer supported... Maybe it's time for an upgrade?!?!