I have about 250k rows of data, and I need to cut it up according to client standards (approx. 20 cuts). Looking for some advice on how to do this using a reference table vs. 20 filters. :-)
My complete dataset looks like this:
Employee | Employee Type | Grade | Org | Sub-Org | Region |
EE1 | Associate | 5 | Org 1 | Org 1.1 | US |
EE2 | Associate | 4 | Org 1 | Org 1.1 | APAC |
EE3 | Associate | 6 | Org 1 | Org 1.1 | APAC |
EE4 | Associate | 4 | Org 2 | Org 2.1 | US |
EE6 | Associate | 6 | Org 2 | Org 2.2 | APAC |
EE7 | Associate | 4 | Org 2 | Org 2.2 | EMEA |
EE8 | Associate | 8 | Org 3 | Org 3.3 | US |
EE9 | Associate | 5 | Org 3 | Org 3.2 | APAC |
I need to provide the following cuts in separate Excel files:
Cut | Org | Sub-Org | Region |
Org 1_Org 1.1_US | Org 1 | Org 1.1 | US |
Org 1_Org 1.1_APAC | Org 1 | Org 1.1 | APAC |
Org 2_Org 2.1 | Org 2 | Org 2.1 | Any |
Org 2_Org 2.2 | Org 3 | Org 2.2 | Any |
Org 3 | Org 3 | Any | Any |
I solved this by creating a filter and copying/pasting 20 times over. But the cuts will change frequently, hence my desire to use a reference table of some sort.
As an additional consideration, I have to test the data within each cut (for the main dataset, I use the test tool to compare values in one column to values in another column... if it fails, the workflow errors out and stops).
With all of that in mind, do I want to:
- Add a column to the main dataset, signifying what each row's cut should be? And how do I do this using a reference table?
- Run a test per cut? How do I do this? And then loop the test through the rest of the cuts?
- Summarize (?) based on those cuts (along with all other fields)?
Any thoughts would be much appreciated!