Quantcast
Channel: Data Preparation & Blending discussions
Viewing all 4999 articles
Browse latest View live

Select rows greater than

$
0
0

Looking to select all rows for processing greater than set amount in one of the varibles. Tried Filter, but no luck?

Database is is 24 hours, just want times greater than 1600?


Input/Output in Tools Properties missing

Output csv file split into 2 files suddenly, how to output only 1 file?

$
0
0

My workflow has been outputting single CSV file (400k rows), after I added an extra output to include Alteryx Database as additional output, suddenly all my output files split into 2 files, including the Alteryx Database.

 

I tried to remove the Alteryx Output, and the CSV is still outputting 2 files (split into 2 sets). Any idea why that is happening? What should I do to output a single CSV file now?

 

Attached pic shows my settings and the extra files

 

Thank you. 

 

 

RegEx question: match all strings containing range of characters

$
0
0

I've used trial and error on this and even RTFM, but I can't figure out how to solve this RegEx problem. It seems extremely simple but I've reached the hubris threshold and actually need a solution.

 

Problem:

I need to match all of the strings that contain a range of characters. In the example below, I need to find all of the strings that only contain letters a through d.

 

Constraint:

I want to put this in a macro that would accept the last character and the Action would change only the last letter in the range. Therefore, RegEx that specifies each character won't work for my need.

 

Current Attempt:

I've tried various combinations of the following:

REGEX_Match([Testval], "[a-d]")

REGEX_Match([Testval], "^[a-d]")

REGEX_Match([Testval], "/[a-d]/")

 

and all of them return zero records.

 

Extra Credit:

Ideally, I'd also love to have a RegEx that would allow me to match all of the records where each character in that range only appeared once. (e.g. match abcd, but not aabd)

 

I've attached a test workflow and would love your help in soving this problem.

 

Thanks!

Exponentiation?

$
0
0

I'm sure there is an easy answer to this, but I can't seem to figure it out. I am using the formula tool and I would like to raise x to the power of y. In most programs this would be written as x^y or x**y, unfortunately both of those notations throw errors.

 

Any suggestions?

Joining numbers with multiple criteria fields

$
0
0

Hi, Im struggling with a problem for which I thought the solution might be quite easy.. However haven't found it, so I'm trying this great medium again..

 

Situation:

I have 2 different data sheets.

 

The first contains 4 columns

Name of playlist

Name of song

Nr of plays

Date

 

 

The seconds contains these columns

Name of playlist

Name of song

Position

Date

 

So, the obvious question is; How can I add the Position to the first sheet. Therefor it has to match both Name of the playlist, Name of the song and the Date.

 

I tried the Join en Union function here, but it looks like it fails to match with multiple criteria.

 

Thanks again for your help.

 

Best

 

 

 

 

 

Parsing HTML tables

$
0
0

Many thanks to the weekly challenge on this idea. 

  

One of the weekly challenges was to parse html and exract table data which got me thinking to build a generic workflow (and eventually an application) to get table data from any page. 

 

Hope to publish further improvements since web-scraping is a passion of mine.  Next step would be to add multi-page feature

 

Looking for community feedback.

 

 

Need to replace select values in one field with values from another field

$
0
0

I am currently working on a workflow where I am combining data from multiple sources into one Alteryx file. The main issue with this project is that the data I am using is very sparse and somewhat inaccurate. I am trying to blend two colums that are the same field type and category except one of them is a "revised" column. I am trying to blend the revised column to populate the "UNKNOWN" rows of the original column so that in the end I will have one column with data from both sources. I am trying to use a Formula

Tool to write an expression that will find all of the "UNKNOWN" rows in the original column and replace them with the good values from the "revised" column. This is what my current expression looks like:

 

Replace([Customer Division], "UNKNOWN" , "[rev Customer division]")

 

I have tried many IF statements as well but can't seem to get it right. I think my issue is trying to replace with values in the field rather than a simple "insert text" replacement.

 

Any advice would be great

 

 


Finding Certain Dates

$
0
0

I have a large data file containing customer numbers, corresponding revenue and date of transactions.  How can I analyze the data to compile the first date of transaction for each customer number? (i.e., a particular customer number may have multiple transaction dates, however I'm trying to compile the first date of transaction for each customer).  

 

Thanks,

Joe

If Contains and Equals Then Statement

$
0
0

So I am horrible with If/Then Statements. I am trying to make a new column (vstring) that flags rows that need to be looked at based on what is in two columns.

The Top ZIP Sales ADI column is a vstring and the info in that column will look like this: Palm Springs, CA - 186, Los Angeles, CA - 812, etc.

The Percent of Total Circ by Market column is a double and would read like this: 1 or .022 etc

 

 

Here is what I came up with:

IF ([Top ZIP Sales ADI] contains "812") && ([Percent of Total Circ by Market] = 1) THEN "Ok" ELSE "Check" ENDIF

Dynamic Append Fields

$
0
0

Hi Everyone, 

 

I am trying to do a dynamic append fields but can't figure out how to do it in Alteryx:

 

The source looks like this: 

dynamicappend.png

The column field determines in which column the name row belongs to. 

 

And desired results: 

results.JPG

 

 

Challenge: The numbers of columns needed to be created is not known in advanced as the source varies - so I cannot put a certain amount of Append Fields or Text to Columns tools in advance. 

How can I create the same result dynamically. 

 

Thank you in advance for your help. 

 

 

Add column "Perecent of total"

$
0
0

Dear colleagues,

 

Could you please help me with the following (I believe, pretty basic) question.

 

In Alteryx I have a table like this:

RegionSales
Region A50
Region B50
Region C30
Region D70

 

And I need somehow to make a new column with percentage of total sales. So, basically I need output like this:

RegionSales% of total
Region A5025%
Region B5025%
Region C3015%
Region D7035%

 

How can I make this in Alteryx?

 

Thanks a lot in advance!

Execution of multiple run command tools

$
0
0

Hi all,

 

I have created a few macro's which make use of the run command tool to create directories, however when using many of these I begin to get errors saying that the files produced etc. do not exist or that access is denied. I assume this is happening because they are all trying to do the same thing, thus clashing and creating errors. My question is whether it is possible to avoid this concurrency issue, whether ensuring only one runs at a time or otherwise?

 

Thanks in advance,

 

Timur_O

Inconsistent Date format (string)

$
0
0

I have a field that contains date values, currently a string field, that has multiple formats that i need to convert to an actual date field.  The current values are "yyyy-mm-dd" and "mm/dd/yyy".  I have tried the DateTime parse tool but with multiple formats this isnt successful.  Anyone run across this?

 

Something like Excel's "Solver" - to find adjustment coefficient for market share

$
0
0

Dear friends, could you please help me with the following. Let's suppose I have a table with:

  • Total market size by regions
  • Market share of our company, that we get from market agency

 

RegionsMarket sizeReported Market Share
Region A1,4008%
Region B1,50014%
Region C1,8007%
Region D70010%

 

But I also know that market share are not precise enough. Because total real market share in the country (that consists out of this 4 regions) is 12%. And if we do the math in table above, it will return us 9.6%.

 

In Excel I would make a cell called "Adjustment coefficient", make a new column "Adjusted market share" (reported MS multiploed by the adjustment coeffecient) and run solver to tell me, that if market share in each region is proportionally increase by 1.2510, our total market share in country will match.

 

But what is the best way to do it in Alteryx?

 

My goal is to get result like this:

 

RegionsMarket sizeReported Market ShareAjusted Market Share
Region A1,4008%10%
Region B1,50014%18%
Region C1,8007%9%
Region D70010%13%

Filtering multiple workflow queries with single filter

$
0
0

In a single yxmd, I have 15 query workflows fired up creating a tde file. All of them use a unique column called ‘ABO_YEAR_NUM’ (string type of character in the database) which represents year (e.g. 2016), which is a common filter for all of them.

 

Currently I have 2 ways of putting year filter:

  1. Manually code it in visual query builder
  2. Create a small separate workflow for year and join it to each of the 15 workflows by creating a tabgled web of joins throughout the file like in attached file:

 

Both of the above solutions work. However I am looking for a cleaner and more efficient automatd solution (may be a macro or an app, where I can code it once and pass its' reference to all these workflows in SQL query builder or by any other way available in Alteryx) so that

  1. workflow file does not look like a tangled web of joins like the one in attached file, which could be very difficult to debug
  2. it can be updated with a single instance update instead of opening 15 query workflows.

 

Please let me know if this makes sense.

thank you.

Using fuzzy match to bring in unique identity numbers from different source

$
0
0

I have two large data sets each with a list of client information with a different set unique identifiers for each file. Some of the client names aren't exact matches which is where I think the fuzzy matching tool could come in. I would like to match the data together so I can eventually move the UEN's from one file to the other.

 

New to the fuzzy matching tool but I understand the basics on creating matches just not sure what I should do after to move the UEN's.

 

I made up some data here so you know what I mean to do. This could save me countless hours thanks!!

UEN sample.PNG

Tableau TWB Audit Workflow - Sharing and Asking for Input

$
0
0

Hi,

 

I set up a workflow to audit a Tableau .twb file.  I got the idea after reading this post but found that that macro didn't quite get me the output I was looking for.  Being new to XML and RegEx, this current version took me quite a few brain cycles and I suspect there are multiple ways to improve it, which is why I'm attaching it here.  It works, but there may be ways to arrive at the same (or better) results with fewer or more robust/elegant steps. :-)

 

What it does:

 

  • Reads in a .twb file
  • Identifies all fields used in the workbook
  • Cycles each calculated field through an iterative macro that replaces generic field name references like [Calculation_0021112080235996] with the actual field name, which makes for much easier reading and auditing.
  • Joins the fields with a list of all worksheets that use them
  • Outputs the data to a TDE for exploration

 

What it doesn't do (yet):

 

  • I can't figure out how to identify fields in the XML that are only associated to a worksheet because of an Action filter remnant.  This isn't a huge deal, but I've noticed that some fields are linked to a worksheet simply because at one point an action filter with that field was applied to that sheet and that action filter remained on the filter shelf.
  • Process more than one .twb file (so that I can check across workbooks whether the same fields and calculations are used).  I suspect this is just a matter of turning this into a macro that leverages a Directory tool to pull in all .twb files in a particular folder.  That, or an iterative macro that allows you to specify exactly which .twb files you want to feed in.

If this module works for as-is, feel free to use it.  If there are things I can do to it to make it run better/easier, I'm open to all feedback. :-)

twb audit.png

In-DB Browse tool halting workflow?

$
0
0

First off, sorry if this isn't the correct place to ask this question. I've searched around and couldn't really find a conclusive answer to my problem, so I figured I'd start here.

 

I have a workflow that's mostly in-database, but I'm having a problem with the "browse" tool within the in-db suite of tools. I can run a simple query with a formula tool attached to it that adds a RowID (again, in-db) that will run in 15 seconds without any output or browse tool on the end of it.

 

Adding a browse tool for the first 100 records changes that by what seems to be forever. I actually haven't let it run all the way to completion, that's how long it takes (over 5 minutes). However, if I add the data stream out tool with a browse connected to that, it runs a little slower than before, but gets basically back to the original run time of around 17 seconds.

 

Can anyone explain why this is? Are there server settings that could have an impact on the browse tool? I don't understand how using an in-db browse for 100 records stops the workflow from completing, but streaming out into a standard workflow browse tool has virtually no effect.

Parsing Comma Separated Values Within in a Single Column Based on a Separate Count

$
0
0

Hello all, I've been struggling with parsing some comma seperated values within columns imported to alteryx from a csv. The data pertains to a messaging site my company uses, and we want to know how many times different emojis are attached to messages within the messaging platfrom. So I have a table filled with data similar to the following row (ex) :

 

count                              

countemojiuserID
5,1,2tada,clap,sparkles1,2,3,4,5,1,1,6

 

That applies to one message. Somebody must have done something good....

 

 

So what this is saying is:

  • 5 people used a 'tada' and their userIDs were 1,2,3,4,5
  • 1 person used a 'clap' and their userID was 1
  • 2 people used a 'sparkle' and their userIDs were 1,6

 

How I tried to approach this project: 

  • I used the 'Text to Columns' parsing tool and selected to split to rows rather than columns.
  • This worked well for the 'count' and 'emojis' columns, but not the 'userID' column.
    • As I'm sure you all have figured out by now, the parsing tool created rows for each of the userIDs... not what I wanted. 

 I am wondering if there is a way/ tool in alteryx that can be used to parse the userID column so that it creates a new row with (for example) the 5 userIDs who attached a 'tada' to a particular message, rather than a different row for each listed id # in the userID column. So, ideally my final output for the above example would appear as: 

 

countemojiuserID
5tada1,2,3,4,5
1clap1
2sparkles1,6

 

 

Thanks in advance for  the help! 

 

 

 

 


 

Viewing all 4999 articles
Browse latest View live