We are trying to migrate ~650 million records to a new database on the same server and are seeing extremely discrepant performance between the two alternatives being tested right now. The first script is below, and uses a reference table with beginning and end time intervals and a cursor to build out insert statements. The alternative is using the same reference table to actually create ~26000 unique insert statements and executing them.
Option 1 (ran for 16 hours and only moved ~2% of total records):
--Declare scoped variables
DECLARE @migStart datetime, @migEnd datetime;
--Set up cursor for reference intervals
DECLARE migCur CURSOR FOR
SELECT dttm_interval_start, dttm_interval_end
FROM DBA.dbo.intervalReference
WHERE complete = 0
ORDER BY dttm_interval_start;
OPEN migCur;
--Initialize the cursor
FETCH NEXT FROM migCur
INTO @migStart, @migEnd;
--Iterate through the reference intervals
WHILE (@@FETCH_STATUS = 0)
BEGIN
--Move the records as >= start and < end
INSERT INTO NEW.dbo.new_log(field1, field2, field3, ......, field19)
SELECT field1, field2, field3, ......, field19
FROM OLD.dbo.old_log
WHERE log_dt >= @migStart and log_dt < @migEnd;
--Mark the interval as done, capture record count
UPDATE DBA.dbo.intervalReference
SET complete = 1, records = @@ROWCOUNT
WHERE dttm_interval_start = @migStart and dttm_interval_end = @migEnd;
--Fetch the next cursor item
FETCH NEXT FROM migCur
INTO @migStart, @migEnd;
END;
CLOSE migCur;
DEALLOCATE migCur;
GO
Option 2 (our calculations project this one to finish in ~65 hours total. There are ~26000 of these, each one covers a different time interval.)
INSERT INTO new.dbo.new_log(field1, field2, field3, ......, field19)
SELECT field1, field2, field3, ......, field19
FROM OLD.dbo.old_log
WHERE log_dt >= '2013-02-17 12:00:00.000' and log_dt < '2013-02-17 14:00:00.000';
I understand that cursors are not performant, but are they really so bad that I can't even use them to iterate through a reference table to control query bounds?
I have looked at the execution plan for the script vs the individual insert statements and the number of records expected at the insert step is ~190 million, way more than the 140k record batch size we calculated the time intervals around. Could it be that the statistics are misleading the engine about the best way to optimize?
Do queries in loops get optimized?
Why is Option 2 so much more performant than Option 1?
Aucun commentaire:
Enregistrer un commentaire