SSIS Hints: Getting a Row Count within the Data Flow

One of the main hindrances of the RowCount component in SSIS is that the variable that stores the row count is not updated till the data flow component has finished processing. What if you need to have the row count within the data flow component? I ran into this exact problem when I had to utilize an API for a third party CRM implementation that required me to get IDs for new contacts through a process that wanted me to tell it how many IDs I would need up front. I could have accomplished this by merely requesting 1 ID at a time but this proved to be much too slow..not SSIS’s fault but the API 🙂 .

So are you just hosed then? Of course not! You just need to understand why the RowCount component does what it does. The RowCount component does not update the variable until the end because it has no idea when it is coming to the end of the data stream….and what if another component asked for the row count before it had completed counting all the rows? Total disaster! However, we know that we could instead just use a custom script component to run through the rows and count them for us. That takes care of problem #1..Now we have a variable that is filled as it is being counted….

Now for #2 it is a little trickier. How do we ensure that we count all rows before continuing on to the next set of components that may use them? Well, SSIS actually has a simple control that is going to help us. The Sort control. Yes, this control is going to keep out custom script piece in check. That is because SSIS transformation components are actually one of two types…partial blocking and full blocking. The partial blocking ones are the ones that you see that take in a batch of data from the flow and then systematically pass it down the line before taking in another batch. The sort control is the second type, fully blocking. Since the sort control needs to have all of the data from the data stream before allowing it to continue..obviously in order to make sure things are sorted properly. So if we place this control in downstream from our script component then it will ensure that it has all the rows before allowing the data to continue.

So now we have a simple solution that allows us to access the row count information within the data flow process. I have added an image of the type of setup you are looking at below.

Cheers,
AJ

image