Validating Address

Multi-value Solutions

Nathan Rector

Natec Systems

nater@northcoast.com

http://www.northcoast.com/~nater/

The address is a key piece of information in anyone's database, but it can be extremely hard to validate, and due to free form input, even harder to search.

I had a client that wanted to use the Address as data for validation when checking for duplicate information. The client works in the building trades and keeps tracks of job sites. Since more than one of their customers could be working at any one job site, they wanted to make sure that duplicate job site records where not made.

Since every bid is keyed both with a customer number and a job site number, they had to keep their users from creating duplicating records or confusion would result. The address was the only common element in all the site records.

The client wanted a way to search for exact and/or like address, then display the results to the user for them to choose from if only like address where found.

If they were creating a new database, then the easiest way to do this is to require the user to break out the common information like the street name and address number. The problem was that the database already existed. The client wanted to retro-fit this database. They also wanted to keep the data entry as normal as possible, so the division of the address number and street name was not a viable solution.

An address can be input in several different formats, but they all follow a common pattern. Look at the following examples:

Wilson Street, #B

100 Wilson Street, #B

100 Wilson, Apt. b

100 Wilson Street

Pattern: {address number}{street name}{type of street}{apartment or suite}

Using this format as the guideline, I wrote a routine that would extract the address information following this pattern.

The first thing that the routine does is check to see if an address number has been input. Once it has found all of the address number, then it checks to see what the street name is.

Street names can be a little confusing. At times people input 'Wilson Street' when they really want to have 'Wilson Lane'. Other times, only 'Wilson' is input as the street name. Due to this, the routine separates the name of the street and the type of the street (road, lane, ave, etc) from each other. This allows matches on just 'Wilson' and then checks to

see if the street type matches up to create a more exact match.

For the same reason that 'Wilson' and 'Street' are separated, the apartment number is also separated from the address.

As displayed in the examples of the 'Wilson Street' address, the routine has to be able know when a part of the address is missing and move onto the next part. Instead of starting with one part of the address and finding that before moving onto the next part, the routine looks for all parts of the address as it spans the address string.

The end result of each example are:

Wilson Street, #B = Wilson;;B;;Street

100 Wilson Street, #B = Wilson;100;B;;Street

100 Wilson, Apt. b = Wilson;100;B;;

100 Wilson Street = Wilson;100;;;Street

Now that the address has be broken into common components, it can now be indexed and searched. It can even be validated to make sure that all the information needed has been input. For example, validating the user for an address number, or the requirement of the street type.

Next month, I'll discuss the searching and matching routine that can be used with this routine to find exact or near exact matches.

SUBROUTINE FORMAT.ADDR1(ADDRESS)

EQUATE AM TO CHAR(254), VM TO CHAR(253), SVM TO CHAR(252)

EQUATE BELL TO CHAR(7)

*CREATED BY NATHAN RECTOR, 10/08/96

* NATEC SYSTEMS

* nater@northcoast.com

* http://www.northcoast.com/~nater/

* D O C U M E N T A T I O N

* This program is used to create a value used to create a common

* format for an address.

* The program will isolate the different types of addresses and

* create a value that make the address easies to find.

* {street name};{address #};{apt#};{lot#};{street type}

* INPUT 'Y' to continue OR 'N' to return to Menu.

*********************************************************************

*OPEN FILES

*********************************************************************

STREET.SUFIX = ".STREET.LANE.AVENUE.WAY.COURT.CIRLE.PLACE.ROAD.DRIVE."

*** This var is used to convert abbrevations into thier long

*** names. This will allow the street names to be more accurate

*** when searching.

***

*** format:

*** ABBRV<1,-1> = abbreviation

*** <2,-1> = full name

ABBRV = ""

*********************************************************************

*PROGRAMMING LOGIC

*********************************************************************

STREET.NAME = "" ; ADDRESS = TRIM(ADDRESS) 'CU'

IF ADDRESS[1,3] = "C/O" THEN ADDRESS = ADDRESS[4,LEN(ADDRESS)]

*** takes care of the PO boxes

BEGIN CASE

CASE ADDRESS = "" ; GOTO 900

CASE ADDRESS[1,8] 'CU' = "P.O. BOX" ; ADDRESS.NUM = TRIM(ADDRESS[9,10]) ; STREET.NAME = "POBOX" ; GOTO 800

CASE ADDRESS[1,6] 'CU' = "PO BOX" ; ADDRESS.NUM = TRIM(ADDRESS[7,10]) ; STREET.NAME = "POBOX" ; GOTO 800

CASE ADDRESS[1,9] 'CU' = "P. O. BOX" ; ADDRESS.NUM = TRIM(ADDRESS[10,10]) ; STREET.NAME = "POBOX" ; GOTO 800

CASE ADDRESS[1,5] 'CU' = "POBOX" ; ADDRESS.NUM = TRIM(ADDRESS[6,10]) ; STREET.NAME = "POBOX" ; GOTO 800

CASE ADDRESS[1,3] 'CU' = "BOX" ; ADDRESS.NUM = TRIM(ADDRESS[4,10]) ; STREET.NAME = "POBOX" ; GOTO 800

CASE ADDRESS[1,7] 'CU' = "P.O.BOX" ; ADDRESS.NUM = TRIM(ADDRESS[8,10]) ; STREET.NAME = "POBOX" ; GOTO 800

CASE ADDRESS[1,4] 'CU' = "P.O." ; ADDRESS.NUM = TRIM(ADDRESS[5,10]) ; STREET.NAME = "POBOX" ; GOTO 800

CASE ADDRESS[1,7] 'CU' = "POSTBOX" ; ADDRESS.NUM = TRIM(ADDRESS[8,10]) ; STREET.NAME = "POBOX" ; GOTO 800

CASE ADDRESS[1,8] 'CU' = "P. O BOX" ; ADDRESS.NUM = TRIM(ADDRESS[9,10]) ; STREET.NAME = "POBOX" ; GOTO 800

CASE ADDRESS[1,7] 'CU' = "P O BOX" ; ADDRESS.NUM = TRIM(ADDRESS[8,10]) ; STREET.NAME = "POBOX" ; GOTO 800

CASE ADDRESS[1,6] 'CU' = "P O BX" ; ADDRESS.NUM = TRIM(ADDRESS[7,10]) ; STREET.NAME = "POBOX" ; GOTO 800

CASE ADDRESS[1,3] 'CU' = "P O" ; ADDRESS.NUM = TRIM(ADDRESS[4,10]) ; STREET.NAME = "POBOX" ; GOTO 800

CASE ADDRESS[1,4] 'CU' = "P. O." ; ADDRESS.NUM = TRIM(ADDRESS[5,10]) ; STREET.NAME = "POBOX" ; GOTO 800

CASE ADDRESS[1,4] 'CU' = "P. B." ; ADDRESS.NUM = TRIM(ADDRESS[5,10]) ; STREET.NAME = "POBOX" ; GOTO 800

END CASE

*** takes the Number part of the address out of the address list

ADDRESS.NUM = "" ; WORD = FIELD(ADDRESS," ",1) ; NEXT.WORD = FIELD(ADDRESS," ",2)

BEGIN CASE

CASE WORD[1,1] = "#"

* taken card of later

CASE NEXT.WORD = "1/2"

ADDRESS.NUM = FIELD(ADDRESS," ",1)

HALF = FIELD(ADDRESS," ",2)

ADDRESS = ADDRESS[COL2() + 1,LEN(ADDRESS)]

ADDRESS.NUM = ADDRESS.NUM :" 1/2"

CASE WORD MATCHES "1N0N1X'1/2'"

ADDRESS.NUM = FIELD(ADDRESS," ",1)

ADDRESS = ADDRESS[COL2() + 1,LEN(ADDRESS)]

ADDRESS.NUM = ADDRESS.NUM[1,LEN(ADDRESS.NUM) - 3] 'CN' :" 1/2"

CASE WORD MATCHES "0X'-'0X"

ADDRESS.NUM = FIELD(ADDRESS," ",1)

ADDRESS = ADDRESS[COL2() + 1,LEN(ADDRESS)]

STREET.NAME<2> = STREET.NAME<2> :" ": FIELD(ADDRESS.NUM,"-",2)

ADDRESS.NUM = FIELD(ADDRESS.NUM,"-",1)

CASE NUM(WORD)

ADDRESS.NUM = FIELD(ADDRESS," ",1)

ADDRESS = ADDRESS[COL2() + 1,LEN(ADDRESS)]

CASE WORD MATCHES "1N0X'TH'" OR WORD MATCHES "1N0X'ND'" OR WORD MATCHES "1N0X'ST'"

* no address numbers.

CASE WORD MATCHES "1N0N0X" OR WORD MATCHES "1A1N0N0X"

ADDRESS.NUM = FIELD(ADDRESS," ",1)

ADDRESS = ADDRESS[COL2() + 1,LEN(ADDRESS)]

STREET.NAME<2> = STREET.NAME<2> :" ": ADDRESS.NUM 'CA'

ADDRESS.NUM = ADDRESS.NUM 'CN'

END CASE

*** breaks the address down word for word, looking for abbrivations

*** or appartment#s

ADDRESS.COMPLETE = 0 ; PREV.WORD = "" ; PREV.DELIM = "" ; AFTER.STREET = 0

LOOP

UNTIL TRIM(ADDRESS = "") OR ADDRESS.COMPLETE DO

GOSUB 5000 ;* gets word

ADDRESS = ADDRESS[FN.POS + 1,LEN(ADDRESS)]

*** decides what to do with an 's. Check to see if it is actual

*** 's or 's'

IF NOT(WORD = "") AND DELIM = \'\ AND ADDRESS[1,1] = "S" THEN

WORD = WORD :\'S\

ADDRESS = ADDRESS[2,999]

END

BEGIN CASE

CASE WORD = ""

CASE WORD = "NO" AND STREET.NAME<1> = ""

* if there is not street name, and 'NO' is found, then it

* is an abbrevation for North

WORD = "NORTH"

CASE INDEX(VM: ABBRV<1> :VM,VM: WORD :VM,1)

LOCATE(WORD,ABBRV,1;MV) ELSE MV = ""

IF NOT(MV = "") THEN WORD = ABBRV<2,MV>

END CASE

BEGIN CASE

CASE WORD = ""

*** no value

CASE PREV.WORD = ""

*** no previous word to work with

CASE PREV.WORD = "AND"

*** the next char is part of the address

WORD = PREV.WORD :" ": WORD

PREV.WORD = ""

CASE INDEX(STREET.SUFIX,".": WORD :".",1)

*** the previous word is part of a street name

STREET.NAME<1> = STREET.NAME<1> :" ": PREV.WORD

PREV.WORD = ""

CASE STREET.NAME<1> = ""

*** no previous street name. convert the single chars

BEGIN CASE

CASE PREV.WORD = "W"

STREET.NAME<1> = STREET.NAME<1> : "WEST"

CASE PREV.WORD = "E"

STREET.NAME<1> = STREET.NAME<1> : "EAST"

CASE PREV.WORD = "N"

STREET.NAME<1> = STREET.NAME<1> : "NORTH"

CASE PREV.WORD = "S"

STREET.NAME<1> = STREET.NAME<1> : "SOUTH"

CASE 1

STREET.NAME<2> = STREET.NAME<2> : PREV.WORD

END CASE

PREV.WORD = ""

CASE 1

*** assume the pervious word is an appartment#

STREET.NAME<2> = STREET.NAME<2> :" ": PREV.WORD

PREV.WORD = ""

END CASE

*** gets word

BEGIN CASE

CASE DELIM = "#"

GOSUB 4000

STREET.NAME<2> = STREET.NAME<2> :" ": APT.LOT

CASE WORD = ""

CASE INDEX(".APT.NO.SUITE.UNIT.STE.SPACE.RM.",".": WORD :".",1)

GOSUB 4000

STREET.NAME<2> = STREET.NAME<2> :" ": APT.LOT

CASE WORD = "LOT"

GOSUB 4000

STREET.NAME<3> = STREET.NAME<3> :" ": APT.LOT

PREV.WORD = ""

CASE AFTER.STREET AND NUM(WORD)

* found a number after the street sufix had been defined

STREET.NAME<2> = STREET.NAME<2> :" ": WORD

CASE LEN(WORD) = 1

* just add to the previous word to make test with the next

* word

PREV.WORD = WORD

CASE WORD = "AND"

PREV.WORD = WORD

CASE WORD = "TH" AND NUM(TRIM(STREET.NAME<1>))

* the 'TH' in '12th' had a space in it

STREET.NAME<1> = STREET.NAME<1> : WORD

CASE INDEX(STREET.SUFIX,".": WORD :".",1)

* this a street sufix. Place in different area to keep the street

* name by itself

STREET.NAME<4> = STREET.NAME<4> :" ": WORD

AFTER.STREET = 1

CASE 1

STREET.NAME<1> = STREET.NAME<1> :" ": WORD

END CASE

REPEAT

BEGIN CASE

CASE PREV.WORD = ""

CASE STREET.NAME<1> = "" ; STREET.NAME<1> = PREV.WORD

CASE 1 ; STREET.NAME<2> = STREET.NAME<2> : PREV.WORD

END CASE

800*

ADDRESS.NUM = TRIM(ADDRESS.NUM)

STREET.NAME<1> = TRIM(STREET.NAME<1>) ;* Street name

STREET.NAME<2> = TRIM(STREET.NAME<2>) ;* appartment#

STREET.NAME<3> = TRIM(STREET.NAME<3>) ;* Lot Number

STREET.NAME<4> = TRIM(STREET.NAME<4>) ;* Street Type

ADDRESS = STREET.NAME<1> :";": ADDRESS.NUM :";": STREET.NAME<4> :";": STREET.NAME<2> :";": STREET.NAME<3>

900*

RETURN

*********************************************************************

*SUBROUTINE

*********************************************************************

4000*

APT.LOT = "" ; STOP.LOOP = 0

LOOP

GOSUB 5000 ; NEXT.WORD = WORD

ADDRESS = ADDRESS[FN.POS + 1,LEN(ADDRESS)]

*** takes out tailing dots

IF NEXT.WORD[LEN(NEXT.WORD),1] = "," THEN NEXT.WORD = NEXT.WORD[1,LEN(NEXT.WORD) - 1]

IF NEXT.WORD[LEN(NEXT.WORD),1] = "." THEN NEXT.WORD = NEXT.WORD[1,LEN(NEXT.WORD) - 1]

BEGIN CASE

CASE NEXT.WORD = ""

CASE NEXT.WORD = "#"

CASE 1

APT.LOT = APT.LOT : NEXT.WORD

*** checks to see if the next word is an 'AND', if so, then

*** get it, and look at the next appartment#

IF FIELD(ADDRESS," ",1) = "&" THEN

ADDRESS = ADDRESS[COL2() + 1,LEN(ADDRESS)]

APT.LOT = APT.LOT :"&"

END ELSE

STOP.LOOP = 1

END

END CASE

UNTIL ADDRESS = "" OR STOP.LOOP DO

REPEAT

RETURN

5000*

WORD = "" ; SEARCH = ADDRESS ; STOP.DELIM.LOOP = 0

FN.POS = 0

LOOP

DELIM = OCONV(SEARCH,"MC/A":VM:"MC/N")[1,1]

IF DELIM = "" THEN

POS = LEN(SEARCH) + 1

END ELSE

POS = INDEX(SEARCH,DELIM,1)

END

NEXT.CHAR = SEARCH[POS + 1,1]

STOP.DELIM.LOOP = 1 ; CHAR = ""

IF DELIM = "&" THEN CHAR = "&"

BEGIN CASE

CASE NOT(DELIM = ".")

CASE NUM(NEXT.CHAR) AND NOT(NEXT.CHAR = "")

CASE 1 ; STOP.DELIM.LOOP = 0

END CASE

BEGIN CASE

CASE NOT(DELIM = "-")

CASE INDEX(" .,",NEXT.CHAR,1)

CASE 1

CHAR = "-" ; STOP.DELIM.LOOP = 0

END CASE

WORD = WORD : SEARCH[1,POS - 1] : CHAR

SEARCH = SEARCH[POS + 1,LEN(SEARCH)]

FN.POS = FN.POS + POS

UNTIL STOP.DELIM.LOOP DO