Validating Address
Multi-value Solutions
Nathan Rector
Natec Systems
nater@northcoast.com
http://www.northcoast.com/~nater/
The address is a key piece of information in anyone's database, but it can be extremely hard to validate, and due to free form input, even harder to search.
I had a client that wanted to use the Address as data for validation when checking for duplicate information. The client works in the building trades and keeps tracks of job sites. Since more than one of their customers could be working at any one job site, they wanted to make sure that duplicate job site records where not made.
Since every bid is keyed both with a customer number and a job site number, they had to keep their users from creating duplicating records or confusion would result. The address was the only common element in all the site records.
The client wanted a way to search for exact and/or like address, then display the results to the user for them to choose from if only like address where found.
If they were creating a new database, then the easiest way to do this is to require the user to break out the common information like the street name and address number. The problem was that the database already existed. The client wanted to retro-fit this database. They also wanted to keep the data entry as normal as possible, so the division of the address number and street name was not a viable solution.
An address can be input in several different formats, but they all follow a common pattern. Look at the following examples:
Wilson Street, #B
100 Wilson Street, #B
100 Wilson, Apt. b
100 Wilson Street
Pattern: {address number}{street name}{type of street}{apartment or suite}
Using this format as the guideline, I wrote a routine that would extract the address information following this pattern.
The first thing that the routine does is check to see if an address number has been input. Once it has found all of the address number, then it checks to see what the street name is.
Street names can be a little confusing. At times people input 'Wilson Street' when they really want to have 'Wilson Lane'. Other times, only 'Wilson' is input as the street name. Due to this, the routine separates the name of the street and the type of the street (road, lane, ave, etc) from each other. This allows matches on just 'Wilson' and then checks to
see if the street type matches up to create a more exact match.
For the same reason that 'Wilson' and 'Street' are separated, the apartment number is also separated from the address.
As displayed in the examples of the 'Wilson Street' address, the routine has to be able know when a part of the address is missing and move onto the next part. Instead of starting with one part of the address and finding that before moving onto the next part, the routine looks for all parts of the address as it spans the address string.
The end result of each example are:
Wilson Street, #B = Wilson;;B;;Street
100 Wilson Street, #B = Wilson;100;B;;Street
100 Wilson, Apt. b = Wilson;100;B;;
100 Wilson Street = Wilson;100;;;Street
Now that the address has be broken into common components, it can now be indexed and searched. It can even be validated to make sure that all the information needed has been input. For example, validating the user for an address number, or the requirement of the street type.
Next month, I'll discuss the searching and matching routine that can be used with this routine to find exact or near exact matches.
SUBROUTINE FORMAT.ADDR1(ADDRESS)
EQUATE AM TO CHAR(254), VM TO CHAR(253), SVM TO CHAR(252)
EQUATE BELL TO CHAR(7)
*
*CREATED BY NATHAN RECTOR, 10/08/96
* NATEC SYSTEMS
* nater@northcoast.com
* http://www.northcoast.com/~nater/
*
* D O C U M E N T A T I O N
*
* This program is used to create a value used to create a common
* format for an address.
*
* The program will isolate the different types of addresses and
* create a value that make the address easies to find.
*
* {street name};{address #};{apt#};{lot#};{street type}
*
* INPUT 'Y' to continue OR 'N' to return to Menu.
*
*********************************************************************
*OPEN FILES
*********************************************************************
*
STREET.SUFIX = ".STREET.LANE.AVENUE.WAY.COURT.CIRLE.PLACE.ROAD.DRIVE."
*
*** This var is used to convert abbrevations into thier long
*** names. This will allow the street names to be more accurate
*** when searching.
***
*** format:
*** ABBRV<1,-1> = abbreviation
*** <2,-1> = full name
*
ABBRV = ""
*********************************************************************
*PROGRAMMING LOGIC
*********************************************************************
STREET.NAME = "" ; ADDRESS = TRIM(ADDRESS) 'CU'
*
IF ADDRESS[1,3] = "C/O" THEN ADDRESS = ADDRESS[4,LEN(ADDRESS)]
*
*** takes care of the PO boxes
*
BEGIN CASE
CASE ADDRESS = "" ; GOTO 900
CASE ADDRESS[1,8] 'CU' = "P.O. BOX" ; ADDRESS.NUM = TRIM(ADDRESS[9,10]) ; STREET.NAME = "POBOX" ; GOTO 800
CASE ADDRESS[1,6] 'CU' = "PO BOX" ; ADDRESS.NUM = TRIM(ADDRESS[7,10]) ; STREET.NAME = "POBOX" ; GOTO 800
CASE ADDRESS[1,9] 'CU' = "P. O. BOX" ; ADDRESS.NUM = TRIM(ADDRESS[10,10]) ; STREET.NAME = "POBOX" ; GOTO 800
CASE ADDRESS[1,5] 'CU' = "POBOX" ; ADDRESS.NUM = TRIM(ADDRESS[6,10]) ; STREET.NAME = "POBOX" ; GOTO 800
CASE ADDRESS[1,3] 'CU' = "BOX" ; ADDRESS.NUM = TRIM(ADDRESS[4,10]) ; STREET.NAME = "POBOX" ; GOTO 800
CASE ADDRESS[1,7] 'CU' = "P.O.BOX" ; ADDRESS.NUM = TRIM(ADDRESS[8,10]) ; STREET.NAME = "POBOX" ; GOTO 800
CASE ADDRESS[1,4] 'CU' = "P.O." ; ADDRESS.NUM = TRIM(ADDRESS[5,10]) ; STREET.NAME = "POBOX" ; GOTO 800
CASE ADDRESS[1,7] 'CU' = "POSTBOX" ; ADDRESS.NUM = TRIM(ADDRESS[8,10]) ; STREET.NAME = "POBOX" ; GOTO 800
CASE ADDRESS[1,8] 'CU' = "P. O BOX" ; ADDRESS.NUM = TRIM(ADDRESS[9,10]) ; STREET.NAME = "POBOX" ; GOTO 800
CASE ADDRESS[1,7] 'CU' = "P O BOX" ; ADDRESS.NUM = TRIM(ADDRESS[8,10]) ; STREET.NAME = "POBOX" ; GOTO 800
CASE ADDRESS[1,6] 'CU' = "P O BX" ; ADDRESS.NUM = TRIM(ADDRESS[7,10]) ; STREET.NAME = "POBOX" ; GOTO 800
CASE ADDRESS[1,3] 'CU' = "P O" ; ADDRESS.NUM = TRIM(ADDRESS[4,10]) ; STREET.NAME = "POBOX" ; GOTO 800
CASE ADDRESS[1,4] 'CU' = "P. O." ; ADDRESS.NUM = TRIM(ADDRESS[5,10]) ; STREET.NAME = "POBOX" ; GOTO 800
CASE ADDRESS[1,4] 'CU' = "P. B." ; ADDRESS.NUM = TRIM(ADDRESS[5,10]) ; STREET.NAME = "POBOX" ; GOTO 800
END CASE
*
*** takes the Number part of the address out of the address list
*
ADDRESS.NUM = "" ; WORD = FIELD(ADDRESS," ",1) ; NEXT.WORD = FIELD(ADDRESS," ",2)
BEGIN CASE
CASE WORD[1,1] = "#"
* taken card of later
CASE NEXT.WORD = "1/2"
ADDRESS.NUM = FIELD(ADDRESS," ",1)
HALF = FIELD(ADDRESS," ",2)
ADDRESS = ADDRESS[COL2() + 1,LEN(ADDRESS)]
*
ADDRESS.NUM = ADDRESS.NUM :" 1/2"
CASE WORD MATCHES "1N0N1X'1/2'"
ADDRESS.NUM = FIELD(ADDRESS," ",1)
ADDRESS = ADDRESS[COL2() + 1,LEN(ADDRESS)]
*
ADDRESS.NUM = ADDRESS.NUM[1,LEN(ADDRESS.NUM) - 3] 'CN' :" 1/2"
CASE WORD MATCHES "0X'-'0X"
ADDRESS.NUM = FIELD(ADDRESS," ",1)
ADDRESS = ADDRESS[COL2() + 1,LEN(ADDRESS)]
*
STREET.NAME<2> = STREET.NAME<2> :" ": FIELD(ADDRESS.NUM,"-",2)
ADDRESS.NUM = FIELD(ADDRESS.NUM,"-",1)
CASE NUM(WORD)
ADDRESS.NUM = FIELD(ADDRESS," ",1)
ADDRESS = ADDRESS[COL2() + 1,LEN(ADDRESS)]
CASE WORD MATCHES "1N0X'TH'" OR WORD MATCHES "1N0X'ND'" OR WORD MATCHES "1N0X'ST'"
* no address numbers.
CASE WORD MATCHES "1N0N0X" OR WORD MATCHES "1A1N0N0X"
ADDRESS.NUM = FIELD(ADDRESS," ",1)
ADDRESS = ADDRESS[COL2() + 1,LEN(ADDRESS)]
*
STREET.NAME<2> = STREET.NAME<2> :" ": ADDRESS.NUM 'CA'
ADDRESS.NUM = ADDRESS.NUM 'CN'
END CASE
*
*** breaks the address down word for word, looking for abbrivations
*** or appartment#s
*
ADDRESS.COMPLETE = 0 ; PREV.WORD = "" ; PREV.DELIM = "" ; AFTER.STREET = 0
LOOP
UNTIL TRIM(ADDRESS = "") OR ADDRESS.COMPLETE DO
GOSUB 5000 ;* gets word
ADDRESS = ADDRESS[FN.POS + 1,LEN(ADDRESS)]
*
*** decides what to do with an 's. Check to see if it is actual
*** 's or 's'
*
IF NOT(WORD = "") AND DELIM = \'\ AND ADDRESS[1,1] = "S" THEN
WORD = WORD :\'S\
ADDRESS = ADDRESS[2,999]
END
*
BEGIN CASE
CASE WORD = ""
CASE WORD = "NO" AND STREET.NAME<1> = ""
* if there is not street name, and 'NO' is found, then it
* is an abbrevation for North
WORD = "NORTH"
CASE INDEX(VM: ABBRV<1> :VM,VM: WORD :VM,1)
LOCATE(WORD,ABBRV,1;MV) ELSE MV = ""
IF NOT(MV = "") THEN WORD = ABBRV<2,MV>
END CASE
*
BEGIN CASE
CASE WORD = ""
*** no value
CASE PREV.WORD = ""
*** no previous word to work with
CASE PREV.WORD = "AND"
*** the next char is part of the address
WORD = PREV.WORD :" ": WORD
PREV.WORD = ""
CASE INDEX(STREET.SUFIX,".": WORD :".",1)
*** the previous word is part of a street name
STREET.NAME<1> = STREET.NAME<1> :" ": PREV.WORD
PREV.WORD = ""
CASE STREET.NAME<1> = ""
*** no previous street name. convert the single chars
BEGIN CASE
CASE PREV.WORD = "W"
STREET.NAME<1> = STREET.NAME<1> : "WEST"
CASE PREV.WORD = "E"
STREET.NAME<1> = STREET.NAME<1> : "EAST"
CASE PREV.WORD = "N"
STREET.NAME<1> = STREET.NAME<1> : "NORTH"
CASE PREV.WORD = "S"
STREET.NAME<1> = STREET.NAME<1> : "SOUTH"
CASE 1
STREET.NAME<2> = STREET.NAME<2> : PREV.WORD
END CASE
PREV.WORD = ""
CASE 1
*** assume the pervious word is an appartment#
STREET.NAME<2> = STREET.NAME<2> :" ": PREV.WORD
PREV.WORD = ""
END CASE
*
*** gets word
*
BEGIN CASE
CASE DELIM = "#"
GOSUB 4000
*
STREET.NAME<2> = STREET.NAME<2> :" ": APT.LOT
CASE WORD = ""
CASE INDEX(".APT.NO.SUITE.UNIT.STE.SPACE.RM.",".": WORD :".",1)
GOSUB 4000
*
STREET.NAME<2> = STREET.NAME<2> :" ": APT.LOT
CASE WORD = "LOT"
GOSUB 4000
*
STREET.NAME<3> = STREET.NAME<3> :" ": APT.LOT
PREV.WORD = ""
CASE AFTER.STREET AND NUM(WORD)
* found a number after the street sufix had been defined
STREET.NAME<2> = STREET.NAME<2> :" ": WORD
CASE LEN(WORD) = 1
* just add to the previous word to make test with the next
* word
PREV.WORD = WORD
CASE WORD = "AND"
PREV.WORD = WORD
CASE WORD = "TH" AND NUM(TRIM(STREET.NAME<1>))
* the 'TH' in '12th' had a space in it
STREET.NAME<1> = STREET.NAME<1> : WORD
CASE INDEX(STREET.SUFIX,".": WORD :".",1)
* this a street sufix. Place in different area to keep the street
* name by itself
STREET.NAME<4> = STREET.NAME<4> :" ": WORD
AFTER.STREET = 1
CASE 1
STREET.NAME<1> = STREET.NAME<1> :" ": WORD
END CASE
REPEAT
*
BEGIN CASE
CASE PREV.WORD = ""
CASE STREET.NAME<1> = "" ; STREET.NAME<1> = PREV.WORD
CASE 1 ; STREET.NAME<2> = STREET.NAME<2> : PREV.WORD
END CASE
800*
ADDRESS.NUM = TRIM(ADDRESS.NUM)
STREET.NAME<1> = TRIM(STREET.NAME<1>) ;* Street name
STREET.NAME<2> = TRIM(STREET.NAME<2>) ;* appartment#
STREET.NAME<3> = TRIM(STREET.NAME<3>) ;* Lot Number
STREET.NAME<4> = TRIM(STREET.NAME<4>) ;* Street Type
*
ADDRESS = STREET.NAME<1> :";": ADDRESS.NUM :";": STREET.NAME<4> :";": STREET.NAME<2> :";": STREET.NAME<3>
900*
RETURN
*********************************************************************
*SUBROUTINE
*********************************************************************
4000*
APT.LOT = "" ; STOP.LOOP = 0
LOOP
GOSUB 5000 ; NEXT.WORD = WORD
ADDRESS = ADDRESS[FN.POS + 1,LEN(ADDRESS)]
*
*** takes out tailing dots
*
IF NEXT.WORD[LEN(NEXT.WORD),1] = "," THEN NEXT.WORD = NEXT.WORD[1,LEN(NEXT.WORD) - 1]
IF NEXT.WORD[LEN(NEXT.WORD),1] = "." THEN NEXT.WORD = NEXT.WORD[1,LEN(NEXT.WORD) - 1]
*
BEGIN CASE
CASE NEXT.WORD = ""
CASE NEXT.WORD = "#"
CASE 1
APT.LOT = APT.LOT : NEXT.WORD
*** checks to see if the next word is an 'AND', if so, then
*** get it, and look at the next appartment#
*
IF FIELD(ADDRESS," ",1) = "&" THEN
ADDRESS = ADDRESS[COL2() + 1,LEN(ADDRESS)]
APT.LOT = APT.LOT :"&"
END ELSE
STOP.LOOP = 1
END
END CASE
UNTIL ADDRESS = "" OR STOP.LOOP DO
REPEAT
RETURN
5000*
WORD = "" ; SEARCH = ADDRESS ; STOP.DELIM.LOOP = 0
FN.POS = 0
LOOP
DELIM = OCONV(SEARCH,"MC/A":VM:"MC/N")[1,1]
IF DELIM = "" THEN
POS = LEN(SEARCH) + 1
END ELSE
POS = INDEX(SEARCH,DELIM,1)
END
*
NEXT.CHAR = SEARCH[POS + 1,1]
*
STOP.DELIM.LOOP = 1 ; CHAR = ""
IF DELIM = "&" THEN CHAR = "&"
BEGIN CASE
CASE NOT(DELIM = ".")
CASE NUM(NEXT.CHAR) AND NOT(NEXT.CHAR = "")
CASE 1 ; STOP.DELIM.LOOP = 0
END CASE
*
BEGIN CASE
CASE NOT(DELIM = "-")
CASE INDEX(" .,",NEXT.CHAR,1)
CASE 1
CHAR = "-" ; STOP.DELIM.LOOP = 0
END CASE
*
WORD = WORD : SEARCH[1,POS - 1] : CHAR
SEARCH = SEARCH[POS + 1,LEN(SEARCH)]
*
FN.POS = FN.POS + POS
UNTIL STOP.DELIM.LOOP DO
REPEAT
RETURN
END