Table size growth when converting columns – VARCHAR to INT

How many times have you seen VARCHAR columns holding INT (or NULL) values only ?

I recently had to deal with some columns like that, and decided it would be good to change them to INT. So I went and changed the columns, but after checking the table size, I noticed, table actually increased in size – well that was unexpected to me. I’ve expected table to keep its size, as in my case, column was VARCHAR(10) and all values were more than 4 chars each – between 7 and 9.

Let me walk you through this.

CREATE TABLE large_table (
id INT NOT NULL IDENTITY(1,1)
CONSTRAINT PK_large_table PRIMARY KEY,
varchar_storing_int VARCHAR(10)
)
GO
INSERT large_table (varchar_storing_int)
SELECT TOP 10000 '1234567'
FROM sys.objects SO1
CROSS JOIN sys.objects SO2
UNION ALL
SELECT TOP 10000 '123456789'
FROM sys.objects SO1
CROSS JOIN sys.objects SO2
GO

And now, when checking the size it says:

EXEC sp_spaceused 'large_table'
GO
/*
name rows reserved data index_size unused
large_table 20000 520 KB 496 KB 16 KB 8 KB
*/

Ok, so that’s 496 KB. Now if I change the column to be INT, let’s see how much space table will take:

ALTER TABLE large_table
ALTER COLUMN varchar_storing_int INT
GO

EXEC sp_spaceused 'large_table'
GO
/*
name rows reserved data index_size unused
large_table 20000 1032 KB 984 KB 16 KB 32 KB
*/

Ok, that’s interesting, now data size almost doubled in size 984 KB, imagine what this can do to a table with 100 million rows, and if you are converting more than one column – your table will grow a lot in size, index fragmentation will occur and you may end up with a file growth you didn’t actually expected/planned for – the list might go on from here.

SELECT object_id, avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(
DB_ID(), OBJECT_ID('dbo.large_table'), NULL, NULL, NULL);

/*
object_id avg_fragmentation_in_percent
1919304451 99.1869918699187
*/

And there’s the fragmentation.

It seems, internally, sql server adds another column to the table, then converts the data, and drops the column, which is exactly what it does:

CREATE TABLE large_table (
id INT NOT NULL IDENTITY(1,1)
CONSTRAINT PK_large_table PRIMARY KEY,
varchar_storing_int VARCHAR(10)
)
GO
INSERT large_table (varchar_storing_int)
SELECT TOP 10000 '1234567'
FROM sys.objects SO1
CROSS JOIN sys.objects SO2
UNION ALL
SELECT TOP 10000 '123456789'
FROM sys.objects SO1
CROSS JOIN sys.objects SO2
GO

ALTER TABLE large_table
ADD int_column INT
GO
UPDATE large_table SET
int_column = CAST(varchar_storing_int AS INT)
GO

EXEC sp_spaceused 'large_table'
GO
/*
name rows reserved data index_size unused
large_table 20000 1032 KB 984 KB 16 KB 32 KB
*/

Following this finding, it might be a good call to rebuild the clustered index after a column is converted, and of course, NEVER agree to this in the first place.

As a side note, this doesn’t apply when converting from CHAR to INT.

Leave a Comment