Quantcast
Channel: Data Preparation & Blending discussions
Viewing all articles
Browse latest Browse all 4999

Data processing efficiencies

$
0
0

Hi all, 

 

I currently have a somewhat complex workflow set up with 186 different steps on the workflow. It writes the full tables to SQL at 2 different times during the workflow. In total, to run through 170m records, it takes 32 hours. The input file has around 30 columns. Writing direct to SQL takes the most time. See below for the processing times (over 13 hours for those two processes alone. There are also 19 joins along the way, which takes around 7 hours in total. 

 

Output Data (509) - 8.1 hours

Output Data (158) - 5.11 hours

 

My question is, is there something that I'm missing here that can greatly improve the processing time? 32 hours seems like a lot of time to go through 170m records. Is it more efficient to break up a workflow into a couple of different workflows? Maybe outputting to .txt is quicker than writing straight to SQL?

 

Any suggestions would be greatly appreciated!

 

Thanks


Viewing all articles
Browse latest Browse all 4999

Trending Articles