Backup Vertica to S3

Moderator: NorbertKrupa

chenshaulian
Intermediate
Intermediate
Posts: 51
Joined: Wed Sep 09, 2015 9:34 am

Backup Vertica to S3

Post by chenshaulian » Wed Aug 03, 2016 12:40 pm

Hi,

I want to start backing up my database (76TB) to S3, some qustions:
1. How can I know the backup file size (full backup)?
2. How can I estimate how long the backup will take? (the cluster isn't at AWS)
3. Using the vbr.py utility, How do I configure the backup location to be on S3?
4. Do I need to save some space on the cluster hosts for the backup? (I know that the process create a file befour it sent to S3).
5. ... any more Insights... ?

Thanks

chenshaulian
Intermediate
Intermediate
Posts: 51
Joined: Wed Sep 09, 2015 9:34 am

Re: Backup Vertica to S3

Post by chenshaulian » Mon Aug 08, 2016 11:29 am

No one... ?

Victorgm
Beginner
Beginner
Posts: 25
Joined: Fri Jul 17, 2015 2:22 pm

Re: Backup Vertica to S3

Post by Victorgm » Mon Aug 08, 2016 8:15 pm

I explored this last year, and here is what I found (keep in mind things may have changed since then):
I didn't try backing-up to S3 but I did try loading (via the COPY command) from S3, and learned a few things. S3 is not a conventional file system (unlike Linux's ext3 & ext4) so you will need a third-party tool to attempt to fool Vertica into thinking a mountpoint to S3 is actually ext3 or 4.

We used S3FS for this purpose but it only worked for small numbers of small files. Medium or large files caused S3FS to choke on the multiple threads that Vertica creates when doing COPY. I imagine the results would be no better for vbr.py backups.

As a result, our vbr.py backups get written to a stand-alone Linux box and then get copied via AWS CLI "cp" or "sync" commands to an S3 bucket, and that has been working very well.

For estimates I would just take a 10 GB file, 100 GB file & 1,000 GB file, load them into a dummy table, then back up the table and note the times and file sizes, since speed & compression depend on your data and your platform & network.

You didn't specify your use cases so please forgive if I am jumping to conclusions but for cases where you have the original data files it may be faster to load from those rather than restore from a vbr backup.

Hope this Helps!

- Victor

chenshaulian
Intermediate
Intermediate
Posts: 51
Joined: Wed Sep 09, 2015 9:34 am

Re: Backup Vertica to S3

Post by chenshaulian » Tue Aug 09, 2016 11:54 am

Thanks Victor for your answer.

What about the free space on the cluster hosts for the backup?
I found a HP article that saying: "HP Vertica recommends that each backup host has space for at least twice the database footprint size." …
(https://my.vertica.com/docs/7.1.x/HTML/ ... pHosts.htm).

I have 76TB database... I need 152TB of free space ????

Thanks
Chen

Victorgm
Beginner
Beginner
Posts: 25
Joined: Fri Jul 17, 2015 2:22 pm

Re: Backup Vertica to S3

Post by Victorgm » Tue Aug 09, 2016 2:51 pm

Yes, that's a lot of disk space. Which brings me back to my earlier point about using the original data files (that were used to load the DW) as a recovery option rather than vbr.py. It's a very conventional recovery strategy for DW's. Is this not an option for some reason?

I only use vbr.py for the analysts' schemas. Their schemas are much smaller than the DW schema.

chenshaulian
Intermediate
Intermediate
Posts: 51
Joined: Wed Sep 09, 2015 9:34 am

Re: Backup Vertica to S3

Post by chenshaulian » Wed Aug 10, 2016 7:47 am

It can be an option... what do you mean by using the original data files?
Just backing up the original data files (copy them) to S3?

Thanks
Chen

Victorgm
Beginner
Beginner
Posts: 25
Joined: Fri Jul 17, 2015 2:22 pm

Re: Backup Vertica to S3

Post by Victorgm » Wed Aug 10, 2016 4:56 pm

Yes, exactly.
I believe you would still have to move the files to a Linux file system before using them for recovery.
It's a typical DW backup strategy.
Whichever strategy you choose, test it. I can't emphasize that enough.

Post Reply

Return to “Vertica Database Administration”