Katalin Pap Katalin Pap - 29 days ago 12
SQL Question

SSIS performance in this case scenario

Can this kind of logic can be implemented in SSIS and is it possible to do it in near-real time?

Users are submitting tables with hundreds of thousands of records and waiting for the results for up to 1 hour with the current implementation when the starting table have about 500.000 rows (after the STEP1 and STEP2 we have millions of records). In the future the amount of data and the user base may drastically grow.

STEP 1



We have a table (A) of around 500.000 rows with the following main columns:
ID, AMOUNT

We also have a table (B) with the prop.steps and the following main columns:
ID_A, ID_B, COEF


TABLE A:

id amount
a 1000
b 2000


TABLE B:

id_a,id_b,coef
a,a1,2
a1,b2,2
b,b1,5


We are creating new records from all the 500.000 records that we have in the table A multiplying the
AMOUNT
by the
COEF
:

OUTPUT TABLE:

id, amount
a,1000
a1,2000
a2,4000
b,2000
b1,10000


STEP 2



Following custom logic, we are assigning the amount of all the records calculated before to some other items with the following logic:

TABLE A

ID,AMOUNT
a,1000
a1,2000
a2,4000
b,2000
b1,10000


TABLE B

ID,WC,COEF
a,wc1,0.1
a,wc2,1
a,wc3,0.1
a1,wc4,1
a2,wc5,1
b,wc1,1
b1,wc1,1
b1,wc2,1


OUTPUT TABLE:

ID,WC,AMOUNT
a,wc1,100
a,wc2,1000
a,wc3,100
a1,wc4,2000
a2,wc5,4000
b,wc1,2000
b1,wc1,10000
b1,wc2,10000


The other steps are just joins and arithmetical operations on the tables and the overall number of records can't be reduced (the tables have also other fields with metadata).

Answer

In my personal experience that kind of logic can be completely implemented in SSIS.

I would do it in a Script Task or Component for two reasons:

  • First, if I understood correctly, you need an asynchronous task to output more data than your input. Scripts can handle multiple and diferent outputs.
  • Second, in the script you can implement all those calculations which in the case of using other components would take a lot of them and relationships between them. And the most important aspect, the algorithm complexity is kept in relation with your algorithmic design which can be a huge boost on performance and scalability if you get a good complexity, two aspects that, if I get it right again, are fundamental.

There are, though, some professionals that have a bad opinion of "complex" scripts and...

The down step of this approach is that you need some ability with .NET and programming, also most of your package logic will be focus there and script debugging can be more complex than other components. But once you get to use the .NET features of SSIS, there is no turning back.

Usually getting near real time in SSIS is tricky for big data sets, and sometimes you need to integrate other tools (e.g. StreamInsight) to achieve it.