TITLE Searching Multipal Words

ISSUE Multi-value Solutions Jan '98

AUTHOR Nathan Rector

COMPANY Natec Systems

EMAIL nater@northcoast.com

HTTP www.northcoast.com/~nater/

In the December '97 issue of Multi-Value Solutions I talked about breaking down sentences into multiple words that could be searched. In that issue I showed you a program that would do the actual break down of the words. In this article, I'll cover the program to do the search.

The BREAK.WORD2 program (covered last month) breaks all the words apart and places them into a multi-value array. This array of words is then written to an index file so it can be searched when needed. For the name of the index file, I commonly use the file name that value is coming from with '.INDEX' concatenated to it. The item id for the the index file is the same as the item id of the original file.

A simple approach would be to use an ACCESS statement to search the index file for all the words you are looking for. This works fine if you are searching for one word, or doing a search for this word or that word. However, the problem arises when you want to find all the items both words in them.

Again a simple access statement could use an 'AND' statement with more than one 'WITH' statment together. The problem with this is that if you use more than one word and you want all the items with at least one or more occurance of any one of these words, this syntax won't work. For example, if you wanted to find all the items with "Vacuum" or "Upright" in them. By using the 'AND' syntax only returns all the items that only have all the words in them.

For example, if you had an inventory file with the following descriptions:

1001

Black and Decker Vacuum, 5 HP with 8' hose and dust nozzle.

1002

Black and Decker Dust Devil, with optional hose attachment.

1003

Black Milwalkee vacuum with dust collection.

Let's say your user wants to search for all the 'Black and Decker' items in an inventory file. If you used an ACCESS statement of:

SELECT INDEX WITH WORDS = "Black" OR WORDS = "Decker"

You'd get items 1001, 1002, and 1003, but 1003 is not a 'Black and Decker' item. If you used the same SELECT statement, but used an AND instead of an OR, you'd get items 1001 and 1002.

The second statement would work unless another word was added, like 'vacuum'. If you use the same AND select statement, with the additional word, you would only get 1001. This would work okay, but you now have lost all the other Black and Decker items.

What if you want to display all the 'Black and Decker' items, but want to have the vacuums displayed at the top of the list. A select statement with ANDs would not work then.

The routine that is displayed here will let you control this. I use an Advanced Pick B-Tree to help the search work faster. The program searches for each word and displays the results found. You have an option of the program returning all the items it finds, with the top of the list the closest matches, or for the program to return only those matches that meet the number of required matches.

For example, you could tell the program to return only the items that have a match of more than two words. Again it would return the list with the top being the closest matches.

This program is very handy for users. With a few modifications, you can set it up so that you can pass a list of words that have to be matched and also a list of words that are optional matches. I've found these program useful in finding customer names as well as inventory lookups.

TITLE

TITLE Searching Multipal Words

ISSUE Multi-value Solutions Jan '98

AUTHOR Nathan Rector

COMPANY Natec Systems

EMAIL nater@northcoast.com

HTTP www.northcoast.com/~nater/

In the December '97 issue of Multi-Value Solutions I talked about breaking down sentences into multiple words that could be searched. In that issue I showed you a program that would do the actual break down of the words. In this article, I'll cover the program to do the search.

The BREAK.WORD2 program (covered last month) breaks all the words apart and places them into a multi-value array. This array of words is then written to an index file so it can be searched when needed. For the name of the index file, I commonly use the file name that value is coming from with '.INDEX' concatenated to it. The item id for the the index file is the same as the item id of the original file.

A simple approach would be to use an ACCESS statement to search the index file for all the words you are looking for. This works fine if you are searching for one word, or doing a search for this word or that word. However, the problem arises when you want to find all the items both words in them.

Again a simple access statement could use an 'AND' statement with more than one 'WITH' statment together. The problem with this is that if you use more than one word and you want all the items with at least one or more occurance of any one of these words, this syntax won't work. For example, if you wanted to find all the items with "Vacuum" or "Upright" in them. By using the 'AND' syntax only returns all the items that only have all the words in them.

For example, if you had an inventory file with the following descriptions:

1001

Black and Decker Vacuum, 5 HP with 8' hose and dust nozzle.

1002

Black and Decker Dust Devil, with optional hose attachment.

1003

Black Milwalkee vacuum with dust collection.

Let's say your user wants to search for all the 'Black and Decker' items in an inventory file. If you used an ACCESS statement of:

SELECT INDEX WITH WORDS = "Black" OR WORDS = "Decker"

You'd get items 1001, 1002, and 1003, but 1003 is not a 'Black and Decker' item. If you used the same SELECT statement, but used an AND instead of an OR, you'd get items 1001 and 1002.

The second statement would work unless another word was added, like 'vacuum'. If you use the same AND select statement, with the additional word, you would only get 1001. This would work okay, but you now have lost all the other Black and Decker items.

What if you want to display all the 'Black and Decker' items, but want to have the vacuums displayed at the top of the list. A select statement with ANDs would not work then.

The routine that is displayed here will let you control this. I use an Advanced Pick B-Tree to help the search work faster. The program searches for each word and displays the results found. You have an option of the program returning all the items it finds, with the top of the list the closest matches, or for the program to return only those matches that meet the number of required matches.

For example, you could tell the program to return only the items that have a match of more than two words. Again it would return the list with the top being the closest matches.

This program is very handy for users. With a few modifications, you can set it up so that you can pass a list of words that have to be matched and also a list of words that are optional matches. I've found these program useful in finding customer names as well as inventory lookups.