Correct multitenancy design

Moderator: NorbertKrupa

Post Reply
spektom
Newbie
Newbie
Posts: 3
Joined: Sun Apr 27, 2014 2:30 pm

Correct multitenancy design

Post by spektom » Wed May 28, 2014 1:18 pm

Hello!

I wonder what would be the correct way do design a multi-tenant system for hundreds (maybe thousands) of customers. As far as I can see there are two ways:

1. Have customer_id in every table. Add "WHERE customer_id=..." to every query.
2. Have multiple schemas - one per customer.

What would be advantages and pitfalls of these methods?

Thanks!
Michael

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Correct multitenancy design

Post by NorbertKrupa » Wed May 28, 2014 1:47 pm

I don't think #2 could be considered a good design. I would say that schemas should be reserved for logical grouping of tables, so if anything, it should be reserved for something such as "customers." I would go with #1 and if possible, partition on that column.
Checkout vertica.tips for more Vertica resources.

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: Correct multitenancy design

Post by scutter » Wed May 28, 2014 4:06 pm

You’ll need to consider some factors before deciding whether or not to partition by the customer id. If you will be heading towards thousands of customers, that would be too many partitions. Also bulk loading new data across all partitions would result in a lot of Tuple Mover mergeout maintenance and resulting I/O spikes.

With customer id as a predicate in every query, having it first in the projection sort order will be natural, so predicate evaluation will be efficient without partitioning by customer id.

Normally partitioning is based on a time-based expression, and then also used for implementation of a data retention policy. One possible down side of a single table including all customer data is that all customers would need to follow the same retention policy.

One advantage of having all of the customers’ data in a single table is ease of the analysis you can do across all customer data. This would be much more difficult with a data model where each customer has a separate schema/table.

Presumably you would create a view which automatically filters the data for each customer? With role-based access to the views, and no access to the underlying tables, you can guarantee that customers can’t view other customers’ data.

—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Correct multitenancy design

Post by NorbertKrupa » Wed May 28, 2014 4:21 pm

Great explanation, Sharon. To avoid having many partitions, you could partition on a range.
Checkout vertica.tips for more Vertica resources.

spektom
Newbie
Newbie
Posts: 3
Joined: Sun Apr 27, 2014 2:30 pm

Re: Correct multitenancy design

Post by spektom » Thu May 29, 2014 6:31 am

Hello Sharon,

Thanks for your informative answer!

Actually we are not going to provide inter-customer analytics, at least at this point. I was leaning towards the second solution, since I thought that:

* Purging customer's data would be easier: just delete the relevant schema.
* Strong and native access control model: no customer could see other customer data.

The only concern I was thinking about is that creating many schemas (and thus multiplying the number of tables/projections by the number of customers per node) would not be possible because of ROS containers limit. Would the second method require that the number of Vertica nodes must be increased dramatically?

Thanks,
Michael

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: Correct multitenancy design

Post by scutter » Thu May 29, 2014 7:10 pm

Hi Michael,

There’s no inherent limit on the number of schemas or projections. There’s also no system-wide limit on the number of ROS containers. The ROS container limit is per-projection. With data separated out into per-customer tables, your data probably won’t compress as well as it would if it were within a single table. Otherwise your number of nodes won’t be affected.

You will end up with a “large catalog” and might end up needing additional memory. Just do a Proof of Concept with your expected upper bound of customers - creating that many schemas and N number of tables per schema - to confirm that it all works as expected. I don’t think that you’ll have any issues as long as you have a moderate number of tables per schema.

Is Vertica going to be responding to real-time queries for all of these customers?

—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

spektom
Newbie
Newbie
Posts: 3
Joined: Sun Apr 27, 2014 2:30 pm

Re: Correct multitenancy design

Post by spektom » Fri May 30, 2014 6:25 am

Hi Sharon,

Yes, all the customers will be able to access the analytics console simultaneously, if that's what you're asking. Is there a difference between the two methods in regards to this aspect?
scutter wrote: Is Vertica going to be responding to real-time queries for all of these customers?
Thanks!
Best regards,
Michael

Post Reply

Return to “Vertica Database Administration”