MAPLOOKUP / MapJSONExtractor relative slowness

Moderator: NorbertKrupa

Post Reply
_pj_
Newbie
Newbie
Posts: 6
Joined: Wed Mar 29, 2017 1:14 pm

MAPLOOKUP / MapJSONExtractor relative slowness

Post by _pj_ » Fri Nov 09, 2018 3:07 pm

Hi, All

We have the case that Json is pushed from outer system to our Vertica database into tables with longvarchar columns. We do not have any possibility to change this process currently.

In general the Jsons are simple linear ones, so we could in theory just parse those with MAPLOOKUP/MapJSONExtractor like that:

Code: Select all

SELECT
    public.MAPLOOKUP(public.MapJSONExtractor(xJSON), 'id') AS id
  , public.MAPLOOKUP(public.MapJSONExtractor(xJSON), 'value') AS value
FROM /*EXAMPLE_DATA*/
    (
     SELECT
            '{"id":1,"value":1000}' AS xJSON
  UNION ALL
     SELECT
            '{"id":2,"value":2000}' AS xJSON ) EXAMPLE_DATA
WHERE
    1 = 1
/*
id	value
1	1000
2	2000
*/
The problem is that MapJSONExtractor seams really slow with high volumes.

The throughput we get:
1. Select the data out to our ETL tool (Pentaho Data Integration) and parse the JSON there and write back to database 10million records in 4 minutes.
2. Taking some risks (casting it to varchar so string functions can be applied, and then parsing the JSON with string functions like SPLIT_PART etc) then we get 10 million rows parsed in 2 minutes.
3. With MapJSONExtractor the 10 million rows take 6 minutes to be parsed.

So with billions of rows the difference is significant.

We do not like to take the risks of handling the JSON as string and parsing with string functions, but as it is so much faster then we really do not see an better option at the moment than to go with option 2. string parsing. The same example than above with SPLIT_PART:

Code: Select all

SELECT
    SPLIT_PART(SPLIT_PART(xJSON, '"id":',2),',',1) AS id
  , SPLIT_PART(SPLIT_PART(SPLIT_PART(xJSON, '"value":',2),',',1),'}',1) AS value
FROM /*EXAMPLE_DATA*/
    (
     SELECT
            '{"id":1,"value":1000}' AS xJSON
  UNION ALL
     SELECT
            '{"id":2,"value":2000}' AS xJSON ) EXAMPLE_DATA
WHERE
    1 = 1
/*
id	value
1	1000
2	2000
*/
So does anybody know a better option for this? Is there some way to make the MapJSONExtractor faster? Doesn't this sound like a bug in MapJSONExtractor if it is so much slower?

Br,
pj

Post Reply

Return to “Vertica SQL Functions”