How many times have you seen VARCHAR columns holding INT (or NULL) values only ?
I recently had to deal with some columns like that, and decided it would be good to change them to INT. So I went and changed the columns, but after checking the table size, I noticed, table actually increased in size – well that was unexpected to me. I’ve expected table to keep its size, as in my case, column was VARCHAR(10) and all values were more than 4 chars each – between 7 and 9.
Let me walk you through this.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
CREATE TABLE large_table ( id INT NOT NULL IDENTITY(1,1) CONSTRAINT PK_large_table PRIMARY KEY, varchar_storing_int VARCHAR(10) ) GO INSERT large_table (varchar_storing_int) SELECT TOP 10000 '1234567' FROM sys.objects SO1 CROSS JOIN sys.objects SO2 UNION ALL SELECT TOP 10000 '123456789' FROM sys.objects SO1 CROSS JOIN sys.objects SO2 GO |
And now, when checking the size it says:
1 2 3 4 5 6 |
EXEC sp_spaceused 'large_table' GO /* name rows reserved data index_size unused large_table 20000 520 KB 496 KB 16 KB 8 KB */ |
Ok, so that’s 496 KB. Now if I change the column to be INT, let’s see how much space table will take:
1 2 3 4 5 6 7 8 9 10 |
ALTER TABLE large_table ALTER COLUMN varchar_storing_int INT GO EXEC sp_spaceused 'large_table' GO /* name rows reserved data index_size unused large_table 20000 1032 KB 984 KB 16 KB 32 KB */ |
Ok, that’s interesting, now data size almost doubled in size 984 KB, imagine what this can do to a table with 100 million rows, and if you are converting more than one column – your table will grow a lot in size, index fragmentation will occur and you may end up with a file growth you didn’t actually expected/planned for – the list might go on from here.
1 2 3 4 5 6 7 8 |
SELECT object_id, avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats( DB_ID(), OBJECT_ID('dbo.large_table'), NULL, NULL, NULL); /* object_id avg_fragmentation_in_percent 1919304451 99.1869918699187 */ |
And there’s the fragmentation.
It seems, internally, sql server adds another column to the table, then converts the data, and drops the column, which is exactly what it does:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
CREATE TABLE large_table ( id INT NOT NULL IDENTITY(1,1) CONSTRAINT PK_large_table PRIMARY KEY, varchar_storing_int VARCHAR(10) ) GO INSERT large_table (varchar_storing_int) SELECT TOP 10000 '1234567' FROM sys.objects SO1 CROSS JOIN sys.objects SO2 UNION ALL SELECT TOP 10000 '123456789' FROM sys.objects SO1 CROSS JOIN sys.objects SO2 GO ALTER TABLE large_table ADD int_column INT GO UPDATE large_table SET int_column = CAST(varchar_storing_int AS INT) GO EXEC sp_spaceused 'large_table' GO /* name rows reserved data index_size unused large_table 20000 1032 KB 984 KB 16 KB 32 KB */ |
Following this finding, it might be a good call to rebuild the clustered index after a column is converted, and of course, NEVER agree to this in the first place.
As a side note, this doesn’t apply when converting from CHAR to INT.
0 Comments.