My assumption on difference/similarities between vertica and hadoop are
1) Vertica handles structured data (heard flex zone could handle semi structured data as well) and Hadoop can handle any file format in storage and processing, especially unstructured data
2) Real time analytics is possible with WOS and ROS, whereas in hadoop it is batch oriented.
3) Hadoop is open source
4) both can work on commodity hardware
5) both can work on peta byte of data
My question is with Apache spark and Kafka or like tools , which are getting more attraction now, where it is possible to do real time Real time analytics in hadoop , and it can be done in open stack,I am bit not clear on which one to choose if unstructured data is not in focus of requirement?
I know it is vast topic. Please share your thoughts.
Vertica and Hadoop
Moderator: NorbertKrupa
-
- GURU
- Posts: 527
- Joined: Tue Oct 22, 2013 9:36 pm
- Location: Chicago, IL
- Contact:
Re: Vertica and Hadoop
I'm very curious how this is accomplished. Given the nature of Hadoop, real-time analytics is very difficult.stefen054 wrote:Where it is possible to do real time Real time analytics in hadoop , and it can be done in open stack,I am bit not clear on which one to choose if unstructured data is not in focus of requirement?
Vertica & Hadoop handle structured data, however, if you need real-time analysis, structured data could (and should) go into Vertica first, then off to cold storage (Hadoop). Flex Zone can handle semi-structured and potentially unstructured data. However, you wouldn't store a video file in Vertica. Vertica also has an On Hadoop offering which enables it to sit directly on Hadoop nodes.stefen054 wrote:1) Vertica handles structured data (heard flex zone could handle semi structured data as well) and Hadoop can handle any file format in storage and processing, especially unstructured data
Agreed.stefen054 wrote:2) Real time analytics is possible with WOS and ROS, whereas in hadoop it is batch oriented.
3) Hadoop is open source
4) both can work on commodity hardware
The question you should be asking is what you want to do with that petabyte of data. Facebook uses Vertica and turns over 3-4 PB of data every 2 days. It would be extremely difficult to perform real-time analytics on this amount of data in a Hadoop environment.stefen054 wrote:5) both can work on peta byte of data
Checkout vertica.tips for more Vertica resources.
Re: Vertica and Hadoop
I assume Real time processing /analytics in hadoop cluster is possible with YARN Framework based tools like apache kafka /storm/spark in hadoop 2.x version.
Thanks for your reply. it really helped in certain area where i was not very clear on using Vertica vs Hadoop..
Thanks for your reply. it really helped in certain area where i was not very clear on using Vertica vs Hadoop..