Regular Expressions

Moderator: NorbertKrupa

Post Reply
ssrao
Newbie
Newbie
Posts: 14
Joined: Fri Jun 07, 2013 2:37 pm

Regular Expressions

Post by ssrao » Sun Jun 09, 2013 10:18 am

Hi,

select regexp_substr('10.3.4.211 - 65.123.101.114 - - [21/May/2013:15:54:32] "GET /V','"\w+\s(/[^/]+)',1,1,' ',1) from dual

The above query is /V but I want only V..

Can you please check and modify regular expression accordingly?

Thanks in Advance

ssrao
Newbie
Newbie
Posts: 14
Joined: Fri Jun 07, 2013 2:37 pm

Re: Regular Expressions

Post by ssrao » Sun Jun 09, 2013 1:38 pm

Hi,

I ran the below query
copy cdn.inbound_access_log1
(
record filler varchar(65000),
EndPoint_IP_Address as split_part(record,' ', 1),
Client_IP_Address as split_part(record,' - ', 2),
Request_Time as to_timestamp(regexp_substr(record, '\[([^]]+)]', 1,1,'',1), 'DD/Mon/YYYY:HH:MI:SS'),
HTTP_Request_Method as regexp_substr(record, '"([^\s]+)',1,1,'',1),
Delivery_Type as regexp_substr(record, '"\w+\s/([^w+])',1,1,'',1),
Delivery_Format as regexp_substr(record, '"\w+\s(/[^/]+)/([^/]+)',1,1,'',2),
Bucket_ID as regexp_substr(record,'"\w+\s(/[^/]+)(/[^/]+)/([^/]+)',1,1,'',3)::int
)
from '/home/cdn/spain_samples/SENT_access_2013-05-21_T15.59.31_213.4.28.244.log' direct abort on error;

The above query is working fine whenever the bucket_id number is there.

Input file is
213.4.28.244 - 95.123.101.114 - - [21/May/2013:15:57:05] "GET /V/0/11573/Granturismo1.mp4 HTTP/1.0 -" 200 11171317 1048576 2 1 "Wget/1.11.4"
[213.4.28.244 - 95.123.101.114 - - [21/May/2013:15:58:14] "GET /X/0/crossdomain.xml HTTP/1.1 -" 200 307 307 0 0 "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW6
4; Trident/6.0)"[/size]
213.4.28.244 - 95.123.101.114 - - [21/May/2013:15:58:24] "GET /V/0/11573/Granturismo1.mp4?start=0 HTTP/1.1 http://democdn.biz.tm/player-5.1.897.swf" 200 11125709 111257
09 10 0 "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)"
213.4.28.244 - 95.123.101.114 - - [21/May/2013:15:59:09] "GET /V/0/11467/The.Dark.Knight.Rises.2012.officialTrailer.ism/Manifest HTTP/1.0 -" 200 9436 9436 0 0 "Wget/1.1
1.4"

The above query is giving error because of this record is not having bucket_id.
213.4.28.244 - 95.123.101.114 - - [21/May/2013:15:58:14] "GET /X/0/crossdomain.xml HTTP/1.1 -" 200 307 307 0 0 "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW6
4; Trident/6.0)"[/size]

How to load these records which is having missing fields values ?
How to redirect these records to rejected file path?

I want to redirected these records to error file whever is not loaded and reload after that.

Post Reply

Return to “New to Vertica SQL”