REGEX_REPLACE or SUBSR to extract Twitter accounts from text

Moderator: NorbertKrupa

Post Reply
kryszczyn
Newbie
Newbie
Posts: 1
Joined: Thu Nov 23, 2017 8:49 am

REGEX_REPLACE or SUBSR to extract Twitter accounts from text

Post by kryszczyn » Thu Nov 23, 2017 9:02 am

In one of my projects I have data sample that is a list with tweets. The task is to extract every account name to separate field.

The sample data looks as follows (each line is a row - 1 column):

@msgerain [What the hell happened to you?] Appendicitis http://tr.im/dagerby
@msngregeain is @aragerekumar hiding from @yuvipergeeaa?

The result should be (each line is a row - 1 column):
@msgerain
@msngregeain @aragerekumar @yuvipergeeaa

I thought of using REGEX_REPLACE but that won't work because it matches only selected occurrence. It was a struggle for me to write the regex that would match other strings than \b@\w+\b. What would be your approach to do that?

Kryszczyn

Post Reply

Return to “Vertica SQL Functions”