View Light

HOWTO: SQL Weighted Best Match Query with Fuzzy Logic

Doing a definitive query using T-SQL is easy. When you know all the criteria in the query it's as easy as SELECT something WHERE Criteria1=x AND Criteria2=y. But in the case where you have some loosely defined criteria and want SQL to find the best possible match rather than an exact match, it gets to be a tricky problem.


There are two basic types of best matching approaches: A Best Match Substring Search and a Best Match Full Criteria Search. A Substring Search is for best matching in text searches. (There may be a more standard term for that but I am not aware of one so am making up my own term for it.) This is more along the lines of finding the best matches in a block of text, similar to a search engine (aka Google) search. A Criteria Search where you have the complete value or text of each of the parameters/criteria you are searching for, and need to find which entry in the database matches the highest number of criteria in that search. Some criteria may be undefined (null). You may wish to add "mass" to a certain criteria type to give it more "weight" in the search, such as "If you match on [Criteria A but not criteria B or criteria C], it's considered a better match than matching on [Criteria B and criteria C, but not criteria A]. This means that a higher weight has been assigned to criteria A than on criteria B and criteria C.

Note that this article deals with the second type of query, a Best Match Full Criteria Seach. If you are looking for info about Substring Searching in Text, try

Also note that this article details a best match Criteria Search using T-SQL (Microsoft SQL Server), but it should be easy to convert the logic of the examples to PL-SQL (Oracle), PostgreSQL, MySQL, or any other SQL-based database server that supports stored procedures. The logic is pretty straight-forward and doesn't use anything platform-specific.

Creating the Solution

I had an issue where I needed to fetch data based on the best match of four criteria: A Category Name, a Vendor ID, a Context, and a Service. Also, for a couple of them, I had some existing data that contained a "wildcard" word that meant it applied to any entry, as long as there was not a definitive entry to match on, which made it still more complicated to account for. I did a lot of searching on the net looking for a similar solution but only found articles about finding best matches when dealing with substrings of text. For my issue, each of my criteria, when it existed, would be a full word, I did not need to do substring searching. It seems there is a large wealth of substring best match searching articles out there, but nothing on the criteria best matching, thus the reason I decided to post this article to save the next person the hair loss this problem nearly caused me. :)

OK, shut up, quit blabbering and get on with the technical details of the article dude... 

Technical Details

I have a table called Lookup which contains the following columns:

Data	(Primary Key, varchar(200), not null) – The data to return from the best matching search.
Category (varchar(128), not null) – Consider this Criteria A
Vendor (varchar(6), not null) – Criteria B
Context (varchar(16), not null) – Criteria C
Service (varchar(16), not null)
– Criteria D
For purposes of this article lets just populate the table with five rows to keep it simple:


The stored procedure:

-- Does a Weighted Select against passed criteria in order to find
-- the record containing the best matching Data.
CREATE proc [dbo].[rspBestMatch]

@Category varchar(128),

@Vendor varchar(6),

@Context varchar(16),

@Service varchar(16)

DECLARE @TTemp TABLE (Data varchar(200), Mass int)
DECLARE @VendorMatch int
DECLARE @VendorCount int
-- Should never call this proc without a vendor, but if someone 
-- does, assume a VendorID of 0 so will match on any vendor.
-- (Vendor 0 matches any vendor unless a Vendor > 0 is passed)
SET @Vendor = 0
SET @Context = ''
 -- Make sure parameter data is in lower case.
SET @Category = Lower(@Category)
SET @Context = Lower(@Context)
SET @Service = Lower(@Service)
IF (@Vendor > 0)
SET @VendorMatch = (
SELECT Count(1)
FROM Lookup
WHERE (Lower(Category) = @Category)
AND (Vendor = @Vendor)) 
	IF @VendorMatch = 0
SET @Category = 0
-- Collect entries that match on Vendor
SELECT Data, 10 as Mass
FROM Lookup
WHERE Vendor = @Vendor
AND Lower(Category) = @Category
SET @VendorCount = (SELECT COUNT(1) FROM @TTemp)
-- Only if we found an Category/Vendor match should we look for 
-- other criteria, otherwise we want to return nothing.
IF (@VendorCount > 0) 
-- Collect entries that match on Context
IF (@Context > '')


SELECT Data, 1 as Mass

FROM Lookup

WHERE Lower(Context) = @Context

AND Lower(Category) = @Category


-- Collect entries that match on Service




SELECT Data, 2 as Mass

FROM Lookup

WHERE Lower([Service]) = @Service

AND Lower(Category) = @Category


-- If passed Service does not equal common,

-- then also attempt to match on Service='common'.

IF Lower(@Service) <> 'common'



SELECT Data, 1 as Mass

FROM Lookup

WHERE Lower([Service]) = 'common'

AND Lower(Category) = @Category




-- Find Data with highest weight

Now to experiment with this a bit:

Execute rspBestMatch with the following data:


  1. Category='Test1', Vendor='00001', Context='Dev', Service='common'

    You should get back ResultA from the query as it is the best match.

  2. Category='Test1', Vendor='00002', Context=NULL, Service=NULL

    You should get back ResultC from the best match query

  3. Category='Test2', Vendor=NULL, Context=NULL, Service=NULL

    You should get back ResultE this time, as the only thing we matches on was the category.

  4. To demonstrate the weighting on the Vendor criteria, use the following data; Category='Test1', Vendor='00002', Context='Dev', Service=NULL.

    You should get back ResultC in this case. Even though we had an match on Context='Dev', the Vendor='00002' won out as the best match because if you look in the sproc code above, when we get a match on Vendor, it gets assigned a mass of 10. Matches on Context are only assigned a mass of 1.


To get a little more understanding about what it going on behind the scenes with the query, change the final four lines (After 'Find Data with highest weight' to:

SELECT Data, Sum(Mass) as Weight
FROM @Ttemp

This will show you the sorting and grouping going on to find the best match.


11/25/2007 11:35:09 PM
This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for research and educational purposes. For more information go to: . If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner.