Vertica node stuck in recovering state

Moderator: NorbertKrupa

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: Vertica node stuck in recovering state

Post by scutter » Wed Apr 09, 2014 5:38 pm

Another possible cause of long recoveries is having projection pairs with different sort orders. This forces the entire data set to be re-sorted during recovery, rather than just doing a binary copy from the buddy node. Check the projection that is being recovered to see if that might be the case.

Otherwise, f you have an 8-hour recovery running and it’s stuck in replay delete, you should see a message to that effect in the vertica.log for the recovering node - search for ‘replay’ case insensitive. If that’s it then go ahead with the “select make_ahm_now(true)” step. This will force recovery from scratch and will avoid replay-delete issues.

—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Vertica node stuck in recovering state

Post by NorbertKrupa » Wed Apr 09, 2014 6:52 pm

zentavr wrote:in fact, OS date(1) showed Wed Apr 9 16:13:02 UTC 2014
NOW() will return the timestamp for the start of your session. It's not a precise way to measure the current system date and time. I would use SYSDATE() if I know I'm going to be using a session that will be open for a long time. I use it when calculating duration, ie. SELECT MINUTE(SYSDATE() - session_start_timestamp) FROM system_sessions;
Checkout vertica.tips for more Vertica resources.

dennisobrien_ig
Newbie
Newbie
Posts: 6
Joined: Tue Apr 08, 2014 3:42 am

Re: Vertica node stuck in recovering state

Post by dennisobrien_ig » Wed Apr 09, 2014 6:56 pm

Hi Norbert,

NOW is 7 hours ahead of UTC:

Code: Select all

select now();
2014-04-10 00:51:38
SYSDATE and several other variations give the correct system time UTC.

Code: Select all

select now() AT TIME ZONE 'UTC';
select sysdate();
select getdate();
2014-04-09 17:51:48
I don't think this is a new problem with our configuration, but the reason it seems suspect now is that perhaps it is influencing the determination of LGE.

Thanks for your reply.

Dennis

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: Vertica node stuck in recovering state

Post by scutter » Wed Apr 09, 2014 7:52 pm

The mapping of the LGE to a time isn’t going to affect any of the cluster interpretations of LGE. To the cluster, LGE is just a number - an epoch - that’s it. It’s not going to affect Recovery for example.

—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

dennisobrien_ig
Newbie
Newbie
Posts: 6
Joined: Tue Apr 08, 2014 3:42 am

Re: Vertica node stuck in recovering state

Post by dennisobrien_ig » Wed Apr 09, 2014 8:51 pm

Sharon,

Thanks again for your reply. This has been very helpful and instructive.

The first thing we are trying is to change the recovery resource pool settings to give more memory and limit the number of concurrent threads. I'm monitoring the recovery process using

Code: Select all

select * from vs_recovery_status;
It is currently in recovery_phase 'historical pass 1' at 80/155 historical progress. I will give it some time and see how this progresses.

Looking at vs_projection_recoveries, I see a few cases of status='error-fatal' and method='incremental-replay-delete'. They are all with projections from the same table. The failure is from a previous recovery attempt based on the end_time timestamp, and there are queued recovery tasks with these same projections. So the optimist in me thinks this time might be different. :-)

It this approach does not work, I will try next changing the AHM.

Thanks once again.

Dennis

dennisobrien_ig
Newbie
Newbie
Posts: 6
Joined: Tue Apr 08, 2014 3:42 am

Re: Vertica node stuck in recovering state

Post by dennisobrien_ig » Thu Apr 10, 2014 10:57 pm

Thanks for all the help. We were able to recover the node using the make_ahm_now method Sharon outlined.

I found these queries to be very helpful to monitor the recovery process:

Code: Select all

select * from vs_recovery_status;
select * from vs_projection_recoveries WHERE status not in ('finished', 'ignored');
Now I'm setting about the task of learning proper care and maintenance of a Vertica cluster. :-)

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: Vertica node stuck in recovering state

Post by scutter » Fri Apr 11, 2014 12:11 am

Glad the recovery completed. If make_ahm_now() allowed recovery to complete, then you have projections that are suboptimal for DELETE or UPDATE. Your next task is to fix them :-) The root cause will be some combination of unsegmented instead of segmented projection, no RLE, not enough RLE, and/or missing a high cardinality column at the end of the ORDER BY columns. See the docs for specific optimization advice.

—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

Post Reply

Return to “New to Vertica Database Administration”