lundi 15 juin 2015

Optimization issue with user defined function

I have a problem understanding why SQL server decides to call user defined function for every value in the table even though only one row should be fetched. The actual SQL is a lot more complex, but I was able to reduce the problem down to this:

select  
    S.GROUPCODE,
    H.ORDERCATEGORY
from    
    ORDERLINE L
    join ORDERHDR H on H.ORDERID = L.ORDERID
    join PRODUCT P  on P.PRODUCT = L.PRODUCT    
    cross apply dbo.GetGroupCode (P.FACTORY) S
where   
    L.ORDERNUMBER = 'XXX/YYY-123456' and
    L.RMPHASE = '0' and
    L.ORDERLINE = '01'

For this query, SQL Server decides to call GetGroupCode function for every single value that exists in PRODUCT Table, even though the estimate and actual number of rows returned from ORDERLINE is 1 (it's the primary key):

Query Plan

Same plan in plan explorer showing the row counts:

Plan explorer Tables:

ORDERLINE: 1.5M rows, primary key: ORDERNUMBER + ORDERLINE + RMPHASE (clustered)
ORDERHDR:  900k rows, primary key: ORDERID (clustered)
PRODUCT:   6655 rows, primary key: PRODUCT (clustered)

The index being used for the scan is:

create unique nonclustered index PRODUCT_FACTORY on PRODUCT (PRODUCT, FACTORY)

The function is actually slightly more complex, but the same thing happens with a dummy multi-statement function like this:

create function GetGroupCode (@FACTORY varchar(4))
returns @t table(
    TYPE        varchar(8),
    GROUPCODE   varchar(30)
)
as begin
    insert into @t (TYPE, GROUPCODE) values ('XX', 'YY')
    return
end

I was able to "fix" the performance by forcing SQL server to fetch the top 1 product, although 1 is max that can ever be found:

select  
    S.GROUPCODE,
    H.ORDERCAT
from    
    MILLORDERLINE M
    join ORDERHDR H
        on H.ORDERID = M.ORDERID
    cross apply (select top 1 P.MILLBU from PRODUCT P where P.PRODUCT = M.PRODUCT) P
    cross apply dbo.GetGroupCode (P.MILLBU) S
where   
    M.MILLORDNO = 'OFDE-500002' and
    M.MACHCHAINNO = '0' and
    M.ORDERLINENO = '01'

Then the plan shape also changes to be something I expected it to be originally:

Query Plan with top

I also though that the index PRODUCT_FACTORY being smaller than the clustered index PRODUCT_PK would have an affect, but even with forcing the query to use PRODUCT_PK, the plan is still the same as original, with 6655 calls to the function.

If I leave out ORDERHDR completely, then the plan starts with nested loop between ORDERLINE and PRODUCT first, and the function is called only once.

I would like to understand what could be the reason for this since all the operations are done using primary keys and how to fix it if it happens in a more complex query that can't be solved this easily.

Aucun commentaire:

Enregistrer un commentaire