Indexing and Searching Address
Mutli-value Solutions - Oct '97
Nathan Rector
Natec Systems
nater@northcoast.com
http://www.northcoast.com/~nater/
Last month, I provided a routine that would break out the common parts of an address to help indexing and validation of address input. If you missed that issue, it can be found on the web site for download at http://www.northcoast.com/~nater/solutions.html.
This routine takes an address and extracts the key elements from it. Using the following example address, it creates an output that can be easily checked or indexed:
Example Address Program Output
Wilson Street, #B = Wilson;;B;;Street
100 Wilson Street, #B = Wilson;100;B;;Street
100 Wilson, Apt. b = Wilson;100;B;;
100 Wilson Street = Wilson;100;B;;Street
To the human eye, these address all look similar, but a computer has a hard time finding the similarities, let alone close matches. The routine that I created uses percentages to return matches.
The main part of the routine deals with the indexing. The indexing used stores the formatted addresses in a separate file with '.INDEX' on the end of the file name; for example, CUST.INDEX for the CUST file. The records IDs are the same as in the main file and the formatted address is stored in attribute 1.
Since the street name can be misspelled, and often is, it makes it hard to find matches by comparing exact street name. By processing the street name through a soundex routine, a misspelled word matters less. Do keep in mind that soundex routines will not cure the problem of misspelled words completely.
Once the street name has been soundexed, it is written to the index file. Then this files is Btreed for faster searching. The routine displayed here uses Advanced Pick Btree, but any type of indexing can be used.
The search routine pulls all the records that match the soundex street name being searched. After pulling all the records that match the search, it checks the other parts of the address to see if there is a match.
When comparing the street type, it includes all items found that don't have a street type as well as all items that have the matching street type. For example, if you were searching for 'Wilson Street' the following would result:
Wilson Street, #B = Wilson;;B;;Street - Match
Wilson, Apt. b = Wilson;;B;; - Match
Wilson Lane = Wilson;;;;Lane - Not match
The routine follows the same logic with address numbers. If there is no address number, then it includes the item, but if the address numbers do not match, then the item is not included. Same follows with the apartment or suite numbers.
As the routine finds matches, it creates a top to bottom list of matches. You can tell the routine to display only those items with the closest matches to the search string, or set the variable LISTTYPE = 1 to display all matches found sorted top to bottom with the best matches found at the top.
INDEX.ADDR
001 SUBROUTINE INDEX.ADDR(FILENAME,ID,ADDRESS)
002 *
003 * CREATED BY NATHAN RECTOR
004 * NATEC SYSTEMS
005 * nater@northcoast.com
006 * http://www.northcoast.com/~nater/
007 *
008 OPEN FILENAME :".INDEX" TO INDEX.FILE ELSE STOP 201, FILENAME:".INDEX"
009 *
010 CALL FORMAT.ADDR1(ADDRESS)
011 WRITE ADDRESS ON INDEX.FILE, ID
012 RETURN
013 END
TEST
001 * TEST
002 *
003 SEARCH = "100 Wilson Street, #B"
004 LISTTYPE = 0
005 CALL SEARCH.ADDRESS(SEARCH,DISPLAY,LISTYPE)
006 *
007 NUM = DCOUNT(DISPLAY,CHAR(254))
008 FOR I = 1 TO NUM
009 CRT DISPLAY<I,1>
010 NEXT I
011 END
output:
{display output}
TEST
001 * TEST
002 *
003 SEARCH = "100 Wilson Street, #B"
004 LISTTYPE = 1
005 CALL SEARCH.ADDRESS(SEARCH,DISPLAY,LISTYPE)
006 *
007 NUM = DCOUNT(DISPLAY,CHAR(254))
008 FOR I = 1 TO NUM
009 CRT DISPLAY<I,1>
010 NEXT I
011 END
output:
Wilson;;B;;Street
Wilson;;B;;
Wilson;;;;Lane
CUST RECORDS:
Key Address.....
1 Wilson Street, #B
2 100 Wilson Street, #B
3 100 Wilson, Apt. b
4 100 Wilson Street
5 500 Johnson Ave, #B
TEST
001 * TEST - Shows example of closest match only
002 *
003 SEARCH = "100 Wilson Street, #B"
004 FOR LISTTYPE = 0 TO 1
005 CRT "LISTTYPE= ": LISTTYPE
006 CALL SEARCH.ADDRESS(SEARCH,DISPLAY,LISTTYPE)
007 *
008 NUM = DCOUNT(DISPLAY,CHAR(254))
009 FOR I = 1 TO NUM
010 CRT DISPLAY<I,1> :" ": OCONV(DISPLAY<I,1>,"TCUST;X;;2")
011 NEXT I
012 NEXT LISTTYPE
013 END
output:
:TEST
LISTTYPE= 0
2 100 Wilson Street, #B
LISTTYPE = 1
2 100 Wilson Street, #B
3 100 Wilson, Apt. b
1 Wilson Street, #B
4 100 Wilson Street
INDEX.ADDR
001 SUBROUTINE INDEX.ADDR(FILENAME,ID,ADDRESS)
002 *
003 OPEN FILENAME :".INDEX" TO INDEX.FILE ELSE STOP 201, FILENAME:".INDEX"
004 *
005 CALL FORMAT.ADDR1(ADDRESS)
006 WRITE ADDRESS ON INDEX.FILE, ID
007 RETURN
008 END
SEARCH.ADDR1
001 SUBROUTINE SEARCH.ADDR1(FILENAME,ADDRESS,LISTTYPE,LIST)
002 *
003 * CREATED BY NATHAN RECTOR
004 * NATEC SYSTEMS
005 * nater@northcoast.com
006 * http://www.northcoast.com/~nater/
007 *
008 *
009 * D O C U M E N T A T I O N
010 *
011 * This routine is used to return a list of address that match the address
012 * being looked for. This routine uses FORMAT.ADDR1.
013 *
014 * FILENAME = the file to search off of.
015 * ADDRESS = the address being looked for
016 * LISTTYPE = 1 - return all items that have a close match
017 * 0 - returns only the items that are closest
018 * LIST<-1> = list of address found that match
019 *
020 ************************************************************************
021 *** Open Files
022 ************************************************************************
023 OPEN FILENAME,"ADDRESS.INDEX" TO BTREE.FILE ELSE STOP 201, FILENAME :",ADDRESS.INDEX"
024 ************************************************************************
025 *** Main Program
026 ************************************************************************
027 DIM SORT.LIST(5) ; MAT SORT.LIST = ""
028 LIST = ""
029 *
030 *** converts the search string into the same format that is indexed.
031 *
032 CALL FORMAT.ADDR(ADDRESS)
033 *
034 *** searches the index to find the list of records that match the
035 *** street name
036 *
037 ROOT.FILE = FILENAME :",ADDRESS.INDEX"
038 ROOT.NAME = "A1"
039 CUR.SEARCH = SOUNDEX(FIELD(ADDRESS,";",1))
040
041 ROOT ROOT.FILE, ROOT.NAME TO C.ROOT ELSE GOTO 900
042 KEY("X",C.ROOT,CUR.SEARCH,FIRST.LIST) ELSE GOTO 900
043 *
044 *** loop through list to sort into closest match lists.
045 *
046 FOR I = 1 TO I + 1 UNTIL FIRST.LIST<I> = ""
047 READV BTREE.ADDR FROM BTREE.FILE, LIST<I>, 1 ELSE
048 * Item not found. Do not add to list.
049 GOTO 800
050 END
051 *
052 LIST.NO = 1
053 *** checks address number
054 BEGIN CASE
055 CASE FIELD(ADDRESS,";",2) = ""
056 CASE FIELD(BTREE.ADDR,";",2) = ""
057 CASE FIELD(BTREE.ADDR,";",2) = FIELD(ADDRESS,";",2) ; LIST.NO = LIST.NO + 1
058 END CASE
059 *** check street type
060 BEGIN CASE
061 CASE FIELD(ADDRESS,";",3) = ""
062 CASE FIELD(BTREE.ADDR,";",3) = "" ; LIST.NO = LIST.NO + 1
063 CASE FIELD(BTREE.ADDR,";",3) = FIELD(ADDRESS,";",3) ; LIST.NO = LIST.NO + 1
064 END CASE
065 *** check appartment#
066 BEGIN CASE
067 CASE FIELD(ADDRESS,";",4) = ""
068 CASE FIELD(BTREE.ADDR,";",4) = ""
069 CASE FIELD(BTREE.ADDR,";",4) = FIELD(ADDRESS,";",4) ; LIST.NO = LIST.NO + 1
070 END CASE
071 *** check lot#
072 BEGIN CASE
073 CASE FIELD(ADDRESS,";",5) = ""
074 CASE FIELD(BTREE.ADDR,";",5) = ""
075 CASE FIELD(BTREE.ADDR,";",5) = FIELD(ADDRESS,";",5) ; LIST.NO = LIST.NO + 1
076 END CASE
077 *
078 SORT.LIST(LIST.NO)<-1> = FIRST.LIST<I>
079 800*
080 NEXT I
081 *
082 *** creates the final list
083 *
084 EXIT.FOR.I = 0
085 FOR I = 5 TO 1 UNTIL EXIT.FOR.I
086 IF NOT(SORT.LIST(I) = "") THEN
087 LIST<-1> = SORT.LIST(I)
088 IF LISTTYPE = 0 THEN EXIT.FOR.I = 1
089 END
090 NEXT I
091 900*
092 RETURN
093 END