Finding the Number of String Occurrences

Moderator: NorbertKrupa

Post Reply
User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Finding the Number of String Occurrences

Post by JimKnicely » Fri Oct 26, 2012 1:33 pm

Hi,

I'm unaware of any built-in functions in Vertica which allow us to directly calculate the number of occurrences of a string within another string, but there is a simple work around using the LENGTH function.

Say I have a table named TEST with the following data:

Code: Select all

dbadmin=> SELECT c1 FROM test;
                                            c1
-------------------------------------------------------------------------------------------
If I wasn't married to Jane, I'd marry Vertica.
Vertica 6 is better than Vertica 5 which is better than Vertica 4 which is better than...
Vertica tastes like chicken.
Learning how to use Vertica is fun because Vertica is cool.
Vertica Vertica Vertica Vertica I love Vertica
(5 rows)
I want to know the number of times “Vertica” appears in each row of data contained in the C1 column of the TEST table. Here’s one method to get the results:

Code: Select all

dbadmin=> SELECT c1 the_text,
dbadmin->        (length(c1) - length(REPLACE(c1, 'Vertica', ''))) / length('Vertica') vertica_appearances
dbadmin->   FROM test;
                                         the_text                                          | vertica_appearances
-------------------------------------------------------------------------------------------+---------------------
If I wasn't married to Jane, I'd marry Vertica.                                           |                   1
Vertica 6 is better than Vertica 5 which is better than Vertica 4 which is better than... |                   3
Vertica tastes like chicken.                                                              |                   1
Learning how to use Vertica is fun because Vertica is cool.                               |                   2
Vertica Vertica Vertica Vertica I love Vertica                                            |                   5
(5 rows)
That was easy.

Here’s another method:

Code: Select all

dbadmin=> SELECT c1 the_text,
dbadmin->        SUM(SIGN(instr(c1, 'Vertica', 1, rn))) vertica_appearances
dbadmin->   FROM (SELECT c1,
dbadmin(>                row_number() over (partition by c1) rn
dbadmin(>           FROM test
dbadmin(>          CROSS join tables) foo
dbadmin->  GROUP BY c1;
                                         the_text                                          | vertica_appearances
-------------------------------------------------------------------------------------------+---------------------
If I wasn't married to Jane, I'd marry Vertica.                                           |                   1
Learning how to use Vertica is fun because Vertica is cool.                               |                   2
Vertica Vertica Vertica Vertica I love Vertica                                            |                   5
Vertica 6 is better than Vertica 5 which is better than Vertica 4 which is better than... |                   3
Vertica tastes like chicken.                                                              |                   1
(5 rows)
However, since using a CROSS JOIN is less intuitive and introduces a slight performance hit, I recommend using the first method employing the LENGTH function in your queries!
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

Post Reply

Return to “Vertica Tips, Lessons and Examples”