Fear Not the Mighty TRIM() Function!

 

Jason Strate wrote up an excellent blog post dealing with the people asking for SQL Server to have a TRIM() function in addition to the LTRIM() and RTRIM() function and the possibility of performance issues creeping up.  You can check out his post below:

PLEASE, NO TRIM()?

Jason points out using two examples

RTRIM()

SELECT * FROM Person.Contact 

WHERE RTRIM(EmailAddress) = 'gustavo0@adventure-works.com' 

LTRIM() + RTIM() = TRIM() Equivalent

 

SELECT * FROM Person.Contact 

WHERE LTRIM(RTRIM(EmailAddress)) = 'gustavo0@adventure-works.com' 

Now on the face of things, this looks kind of bad….you’ve traded off a really good index scan and key lookup for a clustered index scan. Also if you look at the I/O counts as Jason does it bears out that this is the wrong thing to do …basically 178 reads for the RTRIM() version versus 570 reads for the TRIM() version…

However, the problem here is the LTRIM function and how it is handled by the system. Basically, every query that you send to the database goes through a multi-step process in order to get your results…

1. Parser: Yep, the system actually checks to ensure that you know WTH you are doing before it wastes its time on some poorly typed invalid T-SQL code.

2. Algebrizer: This piece is harder for people to understand. If your query is a DML then we have some extra work to do. Basically, it’s the same as a mechanic trying to work on an engine…. before he can actually get a plan together to do some work he needs to know what he is working with in order to pick the right tools. That’s what the Algebrizer’s job is. Resolve all of the various table and columns , do some aggregate binding(if there are any), and type check everything. It wraps it all up nice and neat into what is typically referred to as the Algebrizer Tree or the Query Processor Tree.

3. Optimizer: The optimizer is the thinker of the bunch, and it’s job is a lot like an insurance adjuster. Take a look at what we have to work on, take a look at the tools we have to use(like indexes), look at the costs involved(I/O, CPU, etc. )  and figure out a plan of attack in order to execute. Now, it’s job is NOT to come up with the best plan but to come up with the “meh-this look good enough for military work”, within the quickest timeframe possible, and the best use of resources.

 

You also have to understand that with all the glory and coolness that surrounds the optimizer, it is only a piece of dumb code…Yep, that’s right dumb code because code is only as smart as you program it to be. So obviously the optimizer is coded to treat the RTRIM function differently from the LTRIM function.  You can see this by simply removing the RTRIM function from out second example and looking at the resulting plan.

SELECT * FROM Person.Contact 

WHERE LTRIM(EmailAddress) = 'gustavo0@adventure-works.com' 

It’s the same plan! What’s more, if you look at the actual reads …. it’s actually saying that it will be the same amount of reads involved. Also we could “influence” the plan by giving the query a hint. Below we simply give it a hint that we would like for it to strongly consider using the index that is used by the RTRIM function query.  Look at what the result is.

SELECT * FROM Person.Contact WITH (INDEX(IX_Contact_EmailAddress)) 

WHERE LTRIM(RTRIM(EmailAddress)) = 'gustavo0@adventure-works.com' 

Basically, we have a nearly identical plan to original RTRIM statement that we used. Sure the percentages are skewed slightly but if we delve a little deeper to look at the counts we would find the following

RTRIM()

Table 'Contact'. Scan count 1, logical reads 178, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

HINTED LTRIM() + RTRIM()

Table 'Contact'. Scan count 1, logical reads 179, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

A difference of 1 in reads between to the two plans according to SQL Server. So the basic premise is that you should actually want SQL Server to add a TRIM function because when they do they will also program how the optimizer “looks” at this function in the whole scheme of things and will probably make it much closer to the RTRIM function than the LTRIM one. Plus, you could look at this as a bonus because if your developers do actually need to use something like TRIM then they would have the optimized function instead of relying on the crappy work around.

What do you think? Do you want SQL Server to add a TRIM function? Maybe you should vote it up or down on Microsoft’s Connect site

Add a Trim Function

 

Cheers!
AJ