Another possible cause of long recoveries is having projection pairs with different sort orders. This forces the entire data set to be re-sorted during recovery, rather than just doing a binary copy from the buddy node. Check the projection that is being recovered to see if that might be the case.
Otherwise, f you have an 8-hour recovery running and it’s stuck in replay delete, you should see a message to that effect in the vertica.log for the recovering node - search for ‘replay’ case insensitive. If that’s it then go ahead with the “select make_ahm_now(true)” step. This will force recovery from scratch and will avoid replay-delete issues.
—Sharon
Vertica node stuck in recovering state
Moderator: NorbertKrupa
Re: Vertica node stuck in recovering state
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC
Vertica Consultant, Zazz Technologies LLC
-
- GURU
- Posts: 527
- Joined: Tue Oct 22, 2013 9:36 pm
- Location: Chicago, IL
- Contact:
Re: Vertica node stuck in recovering state
NOW() will return the timestamp for the start of your session. It's not a precise way to measure the current system date and time. I would use SYSDATE() if I know I'm going to be using a session that will be open for a long time. I use it when calculating duration, ie. SELECT MINUTE(SYSDATE() - session_start_timestamp) FROM system_sessions;zentavr wrote:in fact, OS date(1) showed Wed Apr 9 16:13:02 UTC 2014
Checkout vertica.tips for more Vertica resources.
-
- Newbie
- Posts: 6
- Joined: Tue Apr 08, 2014 3:42 am
Re: Vertica node stuck in recovering state
Hi Norbert,
NOW is 7 hours ahead of UTC:
Thanks for your reply.
Dennis
NOW is 7 hours ahead of UTC:
Code: Select all
select now();
SYSDATE and several other variations give the correct system time UTC.2014-04-10 00:51:38
Code: Select all
select now() AT TIME ZONE 'UTC';
select sysdate();
select getdate();
I don't think this is a new problem with our configuration, but the reason it seems suspect now is that perhaps it is influencing the determination of LGE.2014-04-09 17:51:48
Thanks for your reply.
Dennis
Re: Vertica node stuck in recovering state
The mapping of the LGE to a time isn’t going to affect any of the cluster interpretations of LGE. To the cluster, LGE is just a number - an epoch - that’s it. It’s not going to affect Recovery for example.
—Sharon
—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC
Vertica Consultant, Zazz Technologies LLC
-
- Newbie
- Posts: 6
- Joined: Tue Apr 08, 2014 3:42 am
Re: Vertica node stuck in recovering state
Sharon,
Thanks again for your reply. This has been very helpful and instructive.
The first thing we are trying is to change the recovery resource pool settings to give more memory and limit the number of concurrent threads. I'm monitoring the recovery process using
It is currently in recovery_phase 'historical pass 1' at 80/155 historical progress. I will give it some time and see how this progresses.
Looking at vs_projection_recoveries, I see a few cases of status='error-fatal' and method='incremental-replay-delete'. They are all with projections from the same table. The failure is from a previous recovery attempt based on the end_time timestamp, and there are queued recovery tasks with these same projections. So the optimist in me thinks this time might be different.
It this approach does not work, I will try next changing the AHM.
Thanks once again.
Dennis
Thanks again for your reply. This has been very helpful and instructive.
The first thing we are trying is to change the recovery resource pool settings to give more memory and limit the number of concurrent threads. I'm monitoring the recovery process using
Code: Select all
select * from vs_recovery_status;
Looking at vs_projection_recoveries, I see a few cases of status='error-fatal' and method='incremental-replay-delete'. They are all with projections from the same table. The failure is from a previous recovery attempt based on the end_time timestamp, and there are queued recovery tasks with these same projections. So the optimist in me thinks this time might be different.
It this approach does not work, I will try next changing the AHM.
Thanks once again.
Dennis
-
- Newbie
- Posts: 6
- Joined: Tue Apr 08, 2014 3:42 am
Re: Vertica node stuck in recovering state
Thanks for all the help. We were able to recover the node using the make_ahm_now method Sharon outlined.
I found these queries to be very helpful to monitor the recovery process:
Now I'm setting about the task of learning proper care and maintenance of a Vertica cluster.
I found these queries to be very helpful to monitor the recovery process:
Code: Select all
select * from vs_recovery_status;
select * from vs_projection_recoveries WHERE status not in ('finished', 'ignored');
Re: Vertica node stuck in recovering state
Glad the recovery completed. If make_ahm_now() allowed recovery to complete, then you have projections that are suboptimal for DELETE or UPDATE. Your next task is to fix them The root cause will be some combination of unsegmented instead of segmented projection, no RLE, not enough RLE, and/or missing a high cardinality column at the end of the ORDER BY columns. See the docs for specific optimization advice.
—Sharon
—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC
Vertica Consultant, Zazz Technologies LLC