String_Agg Replacement

I ran into a problem where a client needed a comma delimited field from the database. In post 2016 versions of SQL Server, this is easy. You use String_Agg. This seemed easy at first. Then I realized that they were using 2016 and it was not present in that version. DOH! So what does someone do when you need a String_Agg Replacement?

First let me say this is freaking annoying. It is some crazy SQL we are about to embark upon in this little journey.

Before we dig into it, lets get it setup first.

We need a couple of table to be created within a SQL Server database. Note that I am using version 14 because I am showing the String_Agg function too. The replacement should work on most if not all versions of SQL Server.

Company Table

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[Company](
	[CompanyID] [int] IDENTITY(1,1) NOT NULL,
	[Name] [nchar](50) NULL,
	[BillingAddress] [nchar](30) NULL,
	[BillingCity] [nchar](10) NULL,
	[BillingZipCode] [nchar](10) NULL,
	[BillingCountry] [nchar](10) NULL,
	[ShippingAddress] [nchar](30) NULL,
	[ShippingCity] [nchar](10) NULL,
	[ShippingZipCode] [nchar](10) NULL,
	[ShippingCountry] [nchar](10) NULL,
	[WSUserName] [nchar](10) NULL,
	[WSPassword] [nchar](10) NULL,
	[Type] [nchar](10) NULL,
	[IBAN] [nchar](10) NULL,
	[PaymentDays] [int] NULL,
 CONSTRAINT [PK_Company] PRIMARY KEY CLUSTERED 
(
	[CompanyID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

In this example, we really only care about the CompanyID and Name. Create a few companies that just have their name. I used JimCo and BobCo as my two companies.

With that table done we need to add another.

SET ANSI_NULLS ON
 GO
 SET QUOTED_IDENTIFIER ON
 GO
 CREATE TABLE [dbo].[States](
     [StateID] [int] IDENTITY(1,1) NOT NULL,
     [Abbreviation] varchar NOT NULL,
  CONSTRAINT [PK_States] PRIMARY KEY CLUSTERED 
 (
     [StateID] ASC
 )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
 ) ON [PRIMARY]
 GO

This one is nothing more than an ID and a state abbreviation. You don’t have to do all 50. Just import a couple or some 2 digit BS values.

Finally lets get the many to many table built for this example that will connect the companies to states.

USE [OrderIT]
 GO
 / Object:  Table [dbo].[CompanyState]    Script Date: 1/2/2021 2:33:05 PM /
 SET ANSI_NULLS ON
 GO
 SET QUOTED_IDENTIFIER ON
 GO
 CREATE TABLE [dbo].[CompanyState](
     [CompanyStateID] [int] IDENTITY(1,1) NOT NULL,
     [CompanyID] [int] NOT NULL,
     [StateID] [int] NOT NULL,
  CONSTRAINT [PK_CompanyState] PRIMARY KEY CLUSTERED 
 (
     [CompanyStateID] ASC
 )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
 ) ON [PRIMARY]
 GO
 ALTER TABLE [dbo].[CompanyState]  WITH CHECK ADD  CONSTRAINT [FK_CompanyState_Company] FOREIGN KEY([CompanyID])
 REFERENCES [dbo].Company
 GO
 ALTER TABLE [dbo].[CompanyState] CHECK CONSTRAINT [FK_CompanyState_Company]
 GO
 ALTER TABLE [dbo].[CompanyState]  WITH CHECK ADD  CONSTRAINT [FK_CompanyState_States] FOREIGN KEY([StateID])
 REFERENCES [dbo].States
 GO
 ALTER TABLE [dbo].[CompanyState] CHECK CONSTRAINT [FK_CompanyState_States]
 GO

This one is even easier take the CompanyID and the StateID and enter combinations together. These should all be unique.

I have included some CSV’s with the data in them. Remember to process the Company and state before the CompanyState file.

Company Download

States Download

CompanyState-1 Download

With all the setup out of the way lets get into it.

String_Agg example

Here is how you do this example in the modern world.

SELECT  c.[CompanyID]
   ,[Name]
   ,STRING_AGG(Abbreviation,',') AS [States]
   FROM [dbo].[Company] AS c
   INNER JOIN [dbo].[CompanyState] AS cs ON c.[CompanyID] = cs.[CompanyID]
   INNER JOIN [dbo].[States] AS s ON cs.[StateID] = s.[StateID]
   GROUP BY c.[CompanyID], [Name]

Join all the tables that have any needed data together and use the String_Agg function to merge the abbreviation fields into a comma delimited file. You want dashes then swap out the comma character for a dash. Either way you get the states each company is located in.

Now lets look at the pre 2016 way of doing it.

SELECT [CompanyID]
 ,[Name]
     , States= STUFF((SELECT N', ' + [Abbreviation]
                         FROM dbo.CompanyState AS cs
                         INNER JOIN dbo.States AS s ON cs.StateID = s.StateID
                         WHERE cs.CompanyID = c.CompanyID
                         ORDER BY [Abbreviation]
                         FOR XML PATH(N''), TYPE).value(N'.[1]', N'nvarchar(max)'), 1, 2, N'')
   FROM [dbo].[Company] AS c
   GROUP BY [CompanyID], [Name]

The first time, I saw this I was flabbergasted by the complexity of this script. Yet every single knowledgeable DBA, I know in person or found online said that is the way to do it. Even experienced developers have no idea what the devil this thing is doing.

With that in mind lets break it down next to its simpler modern variant.

Breakdown of the String_Agg Replacement

First off we need to look at what we are after. In both we pull a CompanyID, Name, CSV Field called States. So we state with getting the information we are after. Let look at the shell.

SELECT [CompanyID]
 ,[Name]
 ,[States] = ''
 FROM [dbo].[Company] AS c
 GROUP BY [CompanyID], [Name]

This will produce the dataset we want but it is missing the states.

To get that we get the wacky part.

, States= STUFF((SELECT N', ' + [Abbreviation]                     
FROM dbo.CompanyState AS cs                     
INNER JOIN dbo.States AS s ON cs.StateID = s.StateID                     
WHERE cs.CompanyID = c.CompanyID                     
ORDER BY [Abbreviation]                     
FOR XML PATH(N''), TYPE).value(N'.[1]', N'nvarchar(max)'), 1, 2, N'')

Lets first dig into the first weird thing for the average developer that dabbles in SQL, mainly the STUFF.

STUFF

This is pretty simple. You take a string, in our case the big SQL statement, and between the first and second character stuff an empty string into it. This is functionally giving us our delimited list. Even with the definition, it makes no sense when we look at the SQL.

That’s because we need to look into the next weird bit.

FOR XML PATH

This part of the XML is saying we are turning the results of our STUFF-ed SQL statement into XML with the element having no name.

It’s making even less sense know. Right?

Hang with me.

The back half is the root element we are going to put the file into xml which is a Type).value(N’.[1]’, N’nvarchar(max)’. This piece takes the xml node we just created and put it into a nvarchar datatype.

So the Stuff, crams everything into the comma delimited format. The For XML Path puts it into a unnamed XML node and the .Value ending changes it to a useful datatype for our purposes. Now we have a comma delimited format.

Make you want to drink right?

It is easier to think of it as a pattern.

You setup all the information you want in the XML field leaving out our list.

That gives you your shell.

In the element that needs to the list we will drop in a sub select. This one needs to be formatted like this. Note the lower case value statement. It won’t work in all uppercase.

(SELECT N', ' + [FieldToCSV]
    FROM [tableName]
    WHERE [value1] = [selectionValue]
    ORDER BY [FieldToCSV]
    FOR XML PATH(N''), TYPE).value(N'.[1]', N'NVARCHAR(MAX)')

Walkthrough using the String_Agg Replacement

So lets walk through what we are after.

First step is to build our shell.

SELECT [CompanyID]
 ,[Name]
 ,[States] = ''
 FROM [dbo].[Company] AS c
 GROUP BY [CompanyID], [Name]

Next, we add the template and replace the empty string above.

SELECT [CompanyID]
 ,[Name]
 ,[States] = (SELECT N', ' + [FieldToCSV]
                 FROM [tableName]
                 WHERE [value1] = [selectionValue]
                 ORDER BY [FieldToCSV]
                 FOR XML PATH(N''), TYPE).value(N'.[1]', N'NVARCHAR(MAX)')
 FROM [dbo].[Company] AS c
 GROUP BY [CompanyID], [Name]

After that we insert into the FROM [tableName] section the table or tables were the data is located. In our case it is two tables so we have a join.

SELECT [CompanyID]
 ,[Name]
 ,[States] = (SELECT N', ' + [FieldToCSV]
                 FROM dbo.CompanyState AS cs
                 INNER JOIN dbo.States AS s ON cs.StateID = s.StateID
                 WHERE [value1] = [selectionValue]
                 ORDER BY [FieldToCSV]
                 FOR XML PATH(N''), TYPE).value(N'.[1]', N'NVARCHAR(MAX)')
 FROM [dbo].[Company] AS c
 GROUP BY [CompanyID], [Name]

With that done, lets choose what we want in the list. For us that is Abbreviation so replace the [FieldToCSV] with [Abbreviation].

SELECT [CompanyID]
 ,[Name]
 ,[States] = (SELECT N', ' +  [Abbreviation]
                 FROM dbo.CompanyState AS cs
                 INNER JOIN dbo.States AS s ON cs.StateID = s.StateID
                 WHERE [value1] = [selectionValue]
                 ORDER BY [FieldToCSV]
                 FOR XML PATH(N''), TYPE).value(N'.[1]', N'NVARCHAR(MAX)')
 FROM [dbo].[Company] AS c
 GROUP BY [CompanyID], [Name]

Lets add out where clause which is how we connect the sub select with the main select. [Value1] becomes cs.[CompanyID] and [selectionValue] becomes c.[CompanyID] referencing the outside select.

SELECT [CompanyID]
 ,[Name]
 ,[States] = (SELECT N', ' +  [Abbreviation]
                 FROM dbo.CompanyState AS cs
                 INNER JOIN dbo.States AS s ON cs.StateID = s.StateID
                 WHERE cs.[CompanyID] = c.[CompanyID]
                 ORDER BY [FieldToCSV]
                 FOR XML PATH(N''), TYPE).value(N'.[1]', N'NVARCHAR(MAX)')
 FROM [dbo].[Company] AS c
 GROUP BY [CompanyID], [Name]

Finally, we will control the order that the abbreviations will display by ordering by [Abbreviation]. We will also take this moment to bracket everything up that is not already done to improve the readability and possibly the performance.

SELECT [CompanyID]
 ,[Name]
 ,[States] = (SELECT N', ' +  [Abbreviation]
                 FROM [dbo].[CompanyState] AS cs
                 INNER JOIN [dbo].[States] AS s ON cs.[StateID] = s.[StateID]
                 WHERE cs.[CompanyID] = c.[CompanyID]
                 ORDER BY [Abbreviation]
                 FOR XML PATH(N''), TYPE).value(N'.[1]', N'NVARCHAR(MAX)')
 FROM [dbo].[Company] AS c
 GROUP BY [CompanyID], [Name]

So hopefully that will make life easier for the next dev that suffers through this problem. I had not found a proper guide in how to use the bloody thing and had to suffer through it.

Anyway, leave any comments below and have a great day.