Click here to Skip to main content
15,395,065 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:

Hi All,

Problem statment is as described in the headline
Using postgresql

Example value in "Address"column
4 drive rd phoneix Arizona 34445

Needs to be extracted into differaent columns as Addressline,State,City,Zipcode

Biggest challenge is the space delimeter as all rows are not identical for ex
Row1:4 drive rd phoneix Arizona 34445
Row2 :Alm dr missouri 65034

Looking forward to hearing back

Thanks!

What I have tried:

Tries splitpart function however since the data is inconsistent in different rows having issues getting accurate values
Posted
Updated 10-May-22 11:57am

Don't. You can't really automate that successfully without having a lookup table for each possible address in the country.
For example, where do you break "peach pine road lake havasu city arizona" that gives you three segments "peach pine road", "lake havasu city", "arizona" that doesn't cause problems with "three bears lake", "north little rock", "arkansas"?

The only way to break that info up correctly would be to use a lookup on the zipcode and then try to find street / house / apartment info from the results of that together with the original input.

And doing that in SQL? I wouldn't go there ... use a presentation language instead.

And if you are going to invest time in a presentation language instead, then why not do it more sensibly, and separate teh address info at source? instead of getting addresses entered as one line of text, get it broken up into apartment, street, town, county, state, zipcode at the point of data entry and store it separately. M
   
Comments
Member 14019423 9-May-22 12:25pm
   
Actally.I have a table that has derived city,State and Zip columns from the address column however it is based in eventname column called "in" and "out"


Example.
Id. "Address" city. State. . Zip. Event

1. 4 drive rd phoix Arizona 34445. Phoneix. Ar. 34445. IN


1. 5 drive rd almx Missiippi 34445. Phoneix. Ar. 34445. Out

Ids are same so have to pick the address based on eventnames
There's a chance to split address data into parts using regex[^].

Assuming that, last 3 pieces of address column are:
- postal abbrevation: 2 letters,
- city: one or more words,
- zipcode: 5 digits
you'll be able to split address into parts.

This regex pattern should do the job: ^(.*?)\s([a-z]{2})\s(.*?)\s(\d{5})$

See: PostgreSQL REGEXP_MATCHES: Extracting Text Based on a Regex[^]

[EDIT]

Example:
SQL
WITH MyData AS
(
  SELECT '4 drive rd phoneix Arizona 34445' AS address
  UNION ALL
  SELECT 'Alm dr missouri 65034'  AS address
)
SELECT parts[1] AS addressline, parts[2] AS state, parts[3] AS city, parts[4] AS zipcode 
FROM (SELECT regexp_matches(address, 
  '^(.*?)\s([a-z]{2})\s(.*?)\s(\d{5})$') parts FROM MyData) t;


Result:
addressline 	state 	city 	 	 	 	zipcode
4 drive 	 	rd 	 	phoneix Arizona 	34445
Alm 	 	 	dr 	 	missouri 	 	 	65034


Online demo: Postgres 10 | db<>fiddle[^]
   
v3
Quote:
Biggest challenge is the space delimeter as all rows are not identical for ex

It is not a challenge as space have multiple usage in your addresses, it is a bad idea, it is impossible.
You have devised this RegEx
^(.*?)\s([a-z]{2})\s(.*?)\s(\d{5})$

but think abou_t addresses like
4 drive rd phoneix Arizona 34445
Alm dr missouri 65034
4 drive rd phoneix New Mexico 34445
Alm dr New York New York 65034  // aka NY city in NY state
4 drive rd phoneix North Carolina 34445
4 drive rd phoneix South Dakota 34445
4 drive rd phoneix Washington District of Columbia 34445
4 drive rd Baton Rouge Louisiana 34445

You need a way to check every spelling of states and cities with multiple words naming (including wrong spellings).
The only way is to have a separator that can not be part of the address
like:
4 drive rd;phoneix;Washington District of Columbia;34445
4 drive rd;Baton Rouge;Louisiana;34445


As programmer, your job is to device solution that will support all cases. And adding cases not handled by first fast solution takes much more time than doing things right in first place.
   
v3
Comments
Maciej Los 12-May-22 6:56am
   
Sorry, but usage of Regex is not OP's idea, but mine...

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900