I truncated all tables and logs to start again. Bellow all informations about the system when the process stopped to work.
In vertica.log there are 4 messages about "RESOURCE RESERVATION FAILURE" and all of them are the same in RESOURCE_QUEUES;
Code: Select all
# grep "RESOURCE RESERVATION FAILURE" vertica.log
2013-10-18 11:17:25.033 Init Session:0x7fbb80013510-a00000001c4554 [ResourceManager] <INFO> RESOURCE RESERVATION FAILURE:Timedout waiting for resource request of [a00000001c4554,92] on general for Queries:1,Threads:4,File Handles:11,Memory(KB):959918,
2013-10-18 11:17:25.044 Init Session:0x7fbb80015680-a00000001c4597 [ResourceManager] <INFO> RESOURCE RESERVATION FAILURE:Timedout waiting for resource request of [a00000001c4597,73] on general for Queries:1,Threads:4,File Handles:0,Memory(KB):4544,
2013-10-18 11:17:25.044 Init Session:0x7fbb80010800-a00000001c453f [ResourceManager] <INFO> RESOURCE RESERVATION FAILURE:Timedout waiting for resource request of [a00000001c453f,115] on general for Queries:1,Threads:4,File Handles:0,Memory(KB):4544,
2013-10-18 11:17:25.055 Init Session:0x7fbb80010190-a00000001c44ef [ResourceManager] <INFO> RESOURCE RESERVATION FAILURE:Timedout waiting for resource request of [a00000001c44ef,176] on general for Queries:1,Threads:4,File Handles:7,Memory(KB):959918,
Code: Select all
2013-10-18 11:17:25.033 Init Session:0x7fbb80013510-a00000001c4554 [ResourceManager] <INFO> RESOURCE RESERVATION FAILURE:Timedout waiting for resource request of [a00000001c4554,92] on general for Queries:1,Threads:4,File Handles:11,Memory(KB):959918,
*general(45035996273718906) Priority 0 QueueTimeout 300
Size Queries:10,Threads:8314,File Handles:39993,Memory(KB):10113024,
Reserved Queries:10,Threads:40,File Handles:82,Memory(KB):9607372,
> [a00000001c44db,36] - Queries:1,Threads:4,File Handles:7,Memory(KB):959918,
> [a00000001c44e6,92] - Queries:1,Threads:4,File Handles:7,Memory(KB):959918,
> [a00000001c44f8,71] - Queries:1,Threads:4,File Handles:9,Memory(KB):959918,
> [a00000001c4508,64] - Queries:1,Threads:4,File Handles:9,Memory(KB):959918,
> [a00000001c4517,22] - Queries:1,Threads:4,File Handles:9,Memory(KB):959918,
> [a00000001c452c,15] - Queries:1,Threads:4,File Handles:7,Memory(KB):959918,
> [a00000001c456b,36] - Queries:1,Threads:4,File Handles:9,Memory(KB):959918,
> [a00000001c457f,15] - Queries:1,Threads:4,File Handles:9,Memory(KB):959918,
> [a00000001c45b0,57] - Queries:1,Threads:4,File Handles:7,Memory(KB):959918,
> [a00000001c45c5,15] - Queries:1,Threads:4,File Handles:9,Memory(KB):959918,
sysquery(45035996273718908)
Size Queries:10000,Threads:53,File Handles:256,Memory(KB):65536,
sysdata(45035996273718910)
Size Queries:0,Threads:0,File Handles:0,Memory(KB):102400,
Reserved Queries:0,Threads:0,File Handles:0,Memory(KB):65536,
> [ffffffffffffffff,45035996273718910] - Queries:0,Threads:0,File Handles:0,Memory(KB):65536,
wosdata(45035996273718912)
Size Queries:0,Threads:0,File Handles:0,Memory(KB):0,
Overflow Queries:0,Threads:0,File Handles:0,Memory(KB):8192,
> [ffffffffffffffff,45035996273718912] - Queries:0,Threads:0,File Handles:0,Memory(KB):8192,
tm(45035996273718914)
Size Queries:3,Threads:166,File Handles:801,Memory(KB):204800,
refresh(45035996273718916)
Size Queries:10000,Threads:0,File Handles:0,Memory(KB):0,
recovery(45035996273718918)
Size Queries:7,Threads:0,File Handles:0,Memory(KB):0,
dbd(45035996273718920)
Size Queries:10000,Threads:0,File Handles:0,Memory(KB):0,
Queue priority 105 (pools: tm)
Q [a00000001c4664,1] - tm - Queries:1,Threads:4,File Handles:16,Memory(KB):213568,
Queue priority 0 (pools: general sysdata wosdata dbd)
Q [a00000001c4554,92] - general - Queries:1,Threads:4,File Handles:11,Memory(KB):959918,
Q [a00000001c4597,73] - general - Queries:1,Threads:4,File Handles:0,Memory(KB):4544,
Q [a00000001c453f,115] - general - Queries:1,Threads:4,File Handles:0,Memory(KB):4544,
Q [a00000001c44ef,176] - general - Queries:1,Threads:4,File Handles:7,Memory(KB):959918,
Checking QUERY_REQUESTS table, all queries dumped on vertica.log took a little more than 300s. But I believe that those happen because the process it was stopped/locked and the queries were finished automatically on timeout.
I don't believe that the problem is queries taking more than 300s, because when the system is customized to PLANNEDCONCURRENCY > 12 (peak of queries) the longest query is about 30s.