Tuesday, September 10, 2013

CheckSum is pretty damn accurate

I am at a SSIS Masters course with Jorge Segarra and Brian Knight (Pragmatic Works). An attendee brought up an issue about checksum not being 100% accurate, which we all acknowledge; however, another attendee said it is only 99% accurate when checking for differences. I know it's not even close to 1% inaccurate so I wrote a quick script to attempt find an inaccurate checksum value via brute force. I ran the script below for 15 minutes, checking over a quarter of a billion random values in which I found 0 invalid matches. If someone wants to run this longer and change the bigint to a numeric, let me know how many attempts until you find a record.

Summary: checksum is pretty damn accurate (not 100%, but pretty close).


DECLARE @value1a VARCHAR(36)
DECLARE @value2a VARCHAR(36)
DECLARE @value1b VARCHAR(36)
DECLARE @value2b VARCHAR(36)

DECLARE @attempt bigint
SET @attempt = 0


  
SET @value1a = NEWID()
  
SET @value2a = NEWID()
  
SET @value1b = NEWID()
  
SET @value2b = NEWID()

WHILE checksum(@value1a, @value2a) <> checksum(@value1b, @value2b)
BEGIN
   SET
@attempt = @attempt + 1
  
SET @value1a = NEWID()
  
SET @value2a = NEWID()
  
SET @value1b = NEWID()
  
SET @value2b = NEWID()
  
IF @attempt%100000 = 0 PRINT @attempt
END

SELECT
@attempt
SELECT @value1a value1a, @value2a value2a, checksum(@value1a, @value2a) chk_a
SELECT @value1b value2a, @value2b value2b, checksum(@value1b, @value2b) chk_b

No comments:

Post a Comment