Quantcast
Channel: Data Preparation & Blending discussions
Viewing all articles
Browse latest Browse all 4999

Use reference file to cut, test data

$
0
0

I have about 250k rows of data, and I need to cut it up according to client standards (approx. 20 cuts). Looking for some advice on how to do this using a reference table vs. 20 filters. :-)

 

My complete dataset looks like this:

EmployeeEmployee TypeGradeOrgSub-OrgRegion
EE1Associate5Org 1Org 1.1US
EE2Associate4Org 1Org 1.1APAC
EE3Associate6Org 1Org 1.1APAC
EE4Associate4Org 2Org 2.1US
EE6Associate6Org 2Org 2.2APAC
EE7Associate4Org 2Org 2.2EMEA
EE8Associate8Org 3Org 3.3US
EE9Associate5Org 3Org 3.2APAC

 

I need to provide the following cuts in separate Excel files:

CutOrgSub-OrgRegion
Org 1_Org 1.1_USOrg 1Org 1.1US
Org 1_Org 1.1_APACOrg 1Org 1.1APAC
Org 2_Org 2.1Org 2Org 2.1Any
Org 2_Org 2.2Org 3Org 2.2Any
Org 3Org 3AnyAny

 

I solved this by creating a filter and copying/pasting 20 times over. But the cuts will change frequently, hence my desire to use a reference table of some sort.

 

As an additional consideration, I have to test the data within each cut (for the main dataset, I use the test tool to compare values in one column to values in another column... if it fails, the workflow errors out and stops).

 

With all of that in mind, do I want to:

 

  1. Add a column to the main dataset, signifying what each row's cut should be? And how do I do this using a reference table?
  2. Run a test per cut? How do I do this? And then loop the test through the rest of the cuts?
  3. Summarize (?) based on those cuts (along with all other fields)?

Any thoughts would be much appreciated!


Viewing all articles
Browse latest Browse all 4999

Trending Articles