I removed a node from the database and now my query runs fas

Moderator: NorbertKrupa

Post Reply
matthewcornell
Beginner
Beginner
Posts: 30
Joined: Mon Dec 09, 2013 2:30 pm

I removed a node from the database and now my query runs fas

Post by matthewcornell » Tue Mar 11, 2014 4:16 pm

Hi folks. I'm trying a little scaling experiment where I'm removing a node to see how a test query's performance changes. When I removed a node from the database (bringing the DB Size (#Nodes) from 4 to 3, the query actually runs /faster/ than with 4: Instead of taking ~8 minutes it now runs in ~5. The query does lots of joins. I'd really appreciate someone explaining how this is possible. I've posted the query plus the two different PROFILE outputs at https://gist.github.com/anonymous/9487818 . I've also included the query below. Thank you.

Code: Select all

SELECT relVarTable0.id AS id, relVarTable1.val, relVarTable2.val 
FROM (SELECT id FROM rates) relVarTable0
LEFT JOIN
(SELECT rates1.id AS id, AVG(rates4.rating) AS val 
FROM rates rates1, movie movie1, rates rates2, ml_user ml_user1, rates rates3, movie movie2, rates rates4 
WHERE movie1.id = rates1.movie_id AND movie1.id = rates2.movie_id AND ml_user1.id = rates2.ml_user_id AND ml_user1.id = rates3.ml_user_id AND movie2.id = rates3.movie_id AND movie2.id = rates4.movie_id AND movie1.id <> movie2.id AND rates1.id <> rates2.id AND rates2.id <> rates3.id AND rates3.id <> rates4.id AND rates4.rating IS NOT NULL 
GROUP BY rates1.id) relVarTable1
     ON relVarTable0.id = relVarTable1.id
LEFT JOIN
(SELECT rates1.id AS id, rates1.rating AS val 
FROM rates rates1 
WHERE rates1.rating IS NOT NULL ) relVarTable2
     ON relVarTable0.id = relVarTable2.id;

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: I removed a node from the database and now my query runs

Post by NorbertKrupa » Tue Mar 11, 2014 4:37 pm

Could you additionally just post the EXPLAIN output? That should be enough to show what's happening with the JOINs. My initial reaction is that your data was segmented across more nodes, and lowering the number of nodes increases the performance since. Seeing the projections for each table would help as well.
Checkout vertica.tips for more Vertica resources.

matthewcornell
Beginner
Beginner
Posts: 30
Joined: Mon Dec 09, 2013 2:30 pm

Re: I removed a node from the database and now my query runs

Post by matthewcornell » Tue Mar 11, 2014 6:41 pm

Oops - sorry I forgot that. Here's the EXPLAIN -- matt

Code: Select all

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              QUERY PLAN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------
 
 EXPLAIN
 SELECT relVarTable0.id AS id, relVarTable1.val, relVarTable2.val
 FROM (SELECT id FROM rates) relVarTable0
 LEFT JOIN
 (SELECT rates1.id AS id, AVG(rates4.rating) AS val
 FROM rates rates1, movie movie1, rates rates2, ml_user ml_user1, rates rates3, movie movie2, rates rates4
 WHERE movie1.id = rates1.movie_id AND movie1.id = rates2.movie_id AND ml_user1.id = rates2.ml_user_id AND ml_user1.id = rates3.ml_user_id AND movie2.id = rates3.movie_id AND movie2.id = rates4.movie_id AND movie1.id <> movie2.id AND rates1.id <> rates2.id AND rates2.id <> rates3.id AND rates3.id <> rates4.id AND rates4.rating IS NOT NULL
 GROUP BY rates1.id) relVarTable1
      ON relVarTable0.id = relVarTable1.id
 LEFT JOIN
 (SELECT rates1.id AS id, rates1.rating AS val
 FROM rates rates1
 WHERE rates1.rating IS NOT NULL ) relVarTable2
      ON relVarTable0.id = relVarTable2.id;
 
 Access Path:
 +-JOIN MERGEJOIN(inputs presorted) [LeftOuter] [Cost: 6K, Rows: 76K (NO STATISTICS)] (PATH ID: 1) Inner (RESEGMENT)
 |  Join Cond: (relVarTable0.id = relVarTable2.id)
 |  Execute on: All Nodes
 | +-- Outer -> JOIN MERGEJOIN(inputs presorted) [LeftOuter] [Cost: 6K, Rows: 76K (NO STATISTICS)] (PATH ID: 2) Outer (RESEGMENT)
 | |      Join Cond: (relVarTable0.id = relVarTable1.id)
 | |      Execute on: All Nodes
 | | +-- Outer -> SELECT [Cost: 26, Rows: 76K (NO STATISTICS)] (PATH ID: 3)
 | | |      Execute on: All Nodes
 | | | +---> STORAGE ACCESS for rates [Cost: 26, Rows: 76K (NO STATISTICS)] (PATH ID: 4)
 | | | |      Projection: movielens_test.rates_b0
 | | | |      Materialize: rates.id
 | | | |      Execute on: All Nodes
 | | +-- Inner -> SELECT [Cost: 5K, Rows: 10K (NO STATISTICS)] (PATH ID: 5)
 | | |      Execute on: All Nodes
 | | | +---> GROUPBY HASH (SORT OUTPUT) (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 5K, Rows: 10K (NO STATISTICS)] (PATH ID: 6)
 | | | |      Aggregates: sum_float(<SVAR>), count(<SVAR>)
 | | | |      Group By: rates1.id
 | | | |      Execute on: All Nodes
 | | | | +---> JOIN HASH [Cost: 4K, Rows: 76K (NO STATISTICS)] (PATH ID: 7) Outer (BROADCAST)(LOCAL ROUND ROBIN)
 | | | | |      Join Cond: (movie2.id = rates4.movie_id)
 | | | | |      Join Filter: (rates3.id <> rates4.id)
 | | | | |      Execute on: All Nodes
 | | | | | +-- Outer -> STORAGE ACCESS for rates4 [Cost: 80, Rows: 76K (NO STATISTICS)] (PUSHED GROUPING) (PATH ID: 8)
 | | | | | |      Projection: movielens_test.rates_b0
 | | | | | |      Materialize: rates4.rating, rates4.id, rates4.movie_id
 | | | | | |      Filter: (rates4.rating IS NOT NULL)
 | | | | | |      Execute on: All Nodes
 | | | | | +-- Inner -> JOIN HASH [Cost: 2K, Rows: 76K (NO STATISTICS)] (PATH ID: 9) Outer (LOCAL ROUND ROBIN) Inner (BROADCAST)
 | | | | | |      Join Cond: (movie2.id = rates3.movie_id)
 | | | | | |      Join Filter: (movie1.id <> movie2.id)
 | | | | | |      Execute on: All Nodes
 | | | | | | +-- Outer -> JOIN HASH [Cost: 2K, Rows: 76K (NO STATISTICS)] (PATH ID: 10) Inner (RESEGMENT)
 | | | | | | |      Join Cond: (ml_user1.id = rates3.ml_user_id)
 | | | | | | |      Join Filter: (rates2.id <> rates3.id)
 | | | | | | |      Execute on: All Nodes
 | | | | | | |      Runtime Filter: (SIP1(HashJoin): rates3.movie_id)
 | | | | | | | +-- Outer -> JOIN HASH [Cost: 2K, Rows: 76K (NO STATISTICS)] (PATH ID: 11) Outer (RESEGMENT)(LOCAL ROUND ROBIN) Inner (RESEGMENT)
 | | | | | | | |      Join Cond: (ml_user1.id = rates2.ml_user_id)
 | | | | | | | |      Execute on: All Nodes
 | | | | | | | |      Runtime Filter: (SIP2(HashJoin): ml_user1.id)
 | | | | | | | | +-- Outer -> JOIN HASH [Cost: 949, Rows: 76K (NO STATISTICS)] (PATH ID: 12) Outer (RESEGMENT)(LOCAL ROUND ROBIN)
 | | | | | | | | |      Join Cond: (movie1.id = rates2.movie_id)
 | | | | | | | | |      Join Filter: (rates1.id <> rates2.id)
 | | | | | | | | |      Execute on: All Nodes
 | | | | | | | | | +-- Outer -> STORAGE ACCESS for rates2 [Cost: 78, Rows: 76K (NO STATISTICS)] (PATH ID: 13)
 | | | | | | | | | |      Projection: movielens_test.rates_b0
 | | | | | | | | | |      Materialize: rates2.id, rates2.ml_user_id, rates2.movie_id
 | | | | | | | | | |      Execute on: All Nodes
 | | | | | | | | | +-- Inner -> JOIN HASH [Cost: 359, Rows: 76K (NO STATISTICS)] (PATH ID: 14) Outer (RESEGMENT)(LOCAL ROUND ROBIN) Inner (RESEGMENT)
 | | | | | | | | | |      Join Cond: (movie1.id = rates1.movie_id)
 | | | | | | | | | |      Execute on: All Nodes
 | | | | | | | | | | +-- Outer -> STORAGE ACCESS for rates1 [Cost: 52, Rows: 76K (NO STATISTICS)] (PATH ID: 15)
 | | | | | | | | | | |      Projection: movielens_test.rates_b0
 | | | | | | | | | | |      Materialize: rates1.id, rates1.movie_id
 | | | | | | | | | | |      Execute on: All Nodes
 | | | | | | | | | | +-- Inner -> STORAGE ACCESS for movie1 [Cost: 9, Rows: 2K (NO STATISTICS)] (PATH ID: 16)
 | | | | | | | | | | |      Projection: movielens_test.movie_b0
 | | | | | | | | | | |      Materialize: movie1.id
 | | | | | | | | | | |      Execute on: All Nodes
 | | | | | | | | +-- Inner -> STORAGE ACCESS for ml_user1 [Cost: 9, Rows: 6K (NO STATISTICS)] (PATH ID: 17)
 | | | | | | | | |      Projection: movielens_test.ml_user_b0
 | | | | | | | | |      Materialize: ml_user1.id
 | | | | | | | | |      Execute on: All Nodes
 | | | | | | | +-- Inner -> STORAGE ACCESS for rates3 [Cost: 78, Rows: 76K (NO STATISTICS)] (PATH ID: 18)
 | | | | | | | |      Projection: movielens_test.rates_b0
 | | | | | | | |      Materialize: rates3.id, rates3.ml_user_id, rates3.movie_id
 | | | | | | | |      Execute on: All Nodes
 | | | | | | +-- Inner -> STORAGE ACCESS for movie2 [Cost: 9, Rows: 2K (NO STATISTICS)] (PATH ID: 19)
 | | | | | | |      Projection: movielens_test.movie_b0
 | | | | | | |      Materialize: movie2.id
 | | | | | | |      Execute on: All Nodes
 | +-- Inner -> SELECT [Cost: 54, Rows: 76K (NO STATISTICS)] (PATH ID: 20)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for rates1 [Cost: 54, Rows: 76K (NO STATISTICS)] (PATH ID: 21)
 | | |      Projection: movielens_test.rates_b0
 | | |      Materialize: rates1.rating, rates1.id
 | | |      Filter: (rates1.rating IS NOT NULL)
 | | |      Execute on: All Nodes
 
 
 ------------------------------



NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: I removed a node from the database and now my query runs

Post by NorbertKrupa » Tue Mar 11, 2014 6:59 pm

Few things I noticed
  • Missing statistics; you can fix this by running SELECT ANALYZE_STATISTICS('schema.table');
  • A RESEGMENT indicates that the projections are segmented and are likely performing a network join. If you have small tables being joined, consider replicating them to allow for local joins. Make sure to use PK and FK's where possible to help the optimizer. More on optimizing joins
  • See if you can optimize your GROUP BY HASH for a GROUP BY PIPE
  • Consider the encodings on your projections. There's a few columns that have to be materialized.
Checkout vertica.tips for more Vertica resources.

matthewcornell
Beginner
Beginner
Posts: 30
Joined: Mon Dec 09, 2013 2:30 pm

Re: I removed a node from the database and now my query runs

Post by matthewcornell » Fri Mar 14, 2014 6:49 pm

Very much obliged, norbertk. I'll check out your blog too. -- matt

Post Reply

Return to “Vertica SQL”