Quantcast
Channel: Data Preparation & Blending discussions
Viewing all articles
Browse latest Browse all 4999

Parsing unstructured addresses (Malaysia addresses)

$
0
0

Hi,

 

I tried without success, with some of the solutions suggested  in the discussion to tackle unstructured address into State, City & Postal Code(aka Zip). Struggled with the Regex syntax to customize something that fits my addresses as shown below, need some help in extracting the State, City & Zip, which are usually found in the last 3 sets of words to the right of each address.

 

Example (I need to extract only those marked in BOLD/Underlined)

 

PARCEL 1.20, BLOCK C, LEVEL 1, THE GENESIS WALK, LOT 3772 & 3773, JALAN MATANG/BATU KAWA, 93250KUCHING, SARAWAK

 

 

No. 2, Lot 7137,  Countryland Commercial Centre,  Jalan Datuk Mohd Musa, 94300Kota Samarahan, Sarawak.

 

LOT 680, LIGHT INDUSTRIAL ESTATE, JALAN SULTAN ISKANDAR, P.O. BOX 3198, 97013BINTULU, SARAWAK.

 

Tried Solutions:

1) Tried delimiting (added up to 26 columns) the addresses, ended up with many "Null" records, Zip, State & City are singled out, but all over the columns, need help eliminating the non-relevant columns OR identifying only those that pertains to Zip, City or State. We do have list of Zip, City & State for Malaysia, but not sure how we can utilize in Alteryx. 

 

2.) Tried using formula  to create 2 new columns with the expressions below, but didn't quite get the formula to work either. (reference to tread: https://community.alteryx.com/t5/Data-Preparation-Blending/Parsing-Unstructured-Addresses/td-p/25386)

Output Field 1 "Zipcode": regex_replace([ADDRESS],"(.*)(\d{5}$|\d{5}-\d{4}$)","$2")

Output Field 2 "State": regex_replace([ADDRESS],"(.*\s)([[:alpha:]]{2})(\s\d{5}.*)","$2")

 

3.) Tried RegEx tool using expression as shown in http://dataglut.blogspot.jp/2014/10/digging-for-data-creating-boston.html , didn't solve the issue too. 

 

Need some help with either of the solution mentioned above. 

 

Thank you. 

 


Viewing all articles
Browse latest Browse all 4999

Trending Articles