Page 1 of 1

Parallel Execution of UDFs

Posted: Thu Aug 22, 2013 4:15 pm
by Yonatan
Hi,
I have implemented a UDF written in C++ and I am trying to understand the performance I am experiencing.
I am running on a 24 cores 16GB Server (single node) and using Vertica 6.1.
My function is running in fenced mode.

My query is relatively simple and looks like:
select
key1,
key2,
key3,
AnalyzeMetric(
timestamp,
value)
OVER
(PARTITION BY key1,
key2,
key3
ORDER BY timestamp)
from DB.yonatan.application_performance where application like 'p%'

My function named AnalyzeMetric returns a single row for each invocation, and the partition clause I am using should break the input into exactly 1000 invocations that potentially may be executed in parallel but for some reason it seems like Vertica runs my query using a single thread and does not take advantage of other available cores.

During execution I see that UDX process nlwp goes from 2 to 3.

ps -eo pid,comm,lstart,etime,time,nlwp | grep vertica-udx-C++
10639 vertica-udx-C++ Thu Aug 22 14:03:43 2013 41:55 00:22:23 /opt/vertica/bin/vertica-ud 3

Should I be expecting parallel execution only on multi node cluster where data is segmented?
What am I missing?

Re: Parallel Execution of UDFs

Posted: Fri Aug 23, 2013 8:43 am
by id10t
Hi!

Bug in Vertica. Follow to this thread : https://community.vertica.com/vertica/t ... y_utilized

Regards Daniel

Re: Parallel Execution of UDFs

Posted: Fri Aug 23, 2013 1:06 pm
by scutter
I think that the pointed to Community discussion on fully utilizing the system is related, but not necessarily the same root issue. That discussion is about Joins and the fact that they use a subset of the cores.

The documentation states: "Analytic functions using the partition by clause automatically run in parallel when possible to improve query performance. " It sounds like this isn't true for UDFs - that they don't try to run in parallel. I'd ask this question over in the Community or open a Support Ticket for clarification.

--Sharon