Hi there!
I have a CEF (Common event format) file that looks like this:
CEF:0|Incapsula|SIEMintegration|1|1|Illegal Resource Access|9| fileid=3412341160002518171 sourceServiceName=site123.abcd.info siteid=1509732 suid=50005477 requestClientApplication=Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0 cs2=true cs2Label=Javascript Support cs3=true cs3Label=CO Support src=12.12.12.12 caIP=[13.13.13.13] ccode=[IL] tag=www.elvis.com cn1=200 in=54 xff=44.44.44.44 cs1=NOT_SUPPORTED cs1Label=Cap Support cs4=c2e72124-0e8a-4dd8-b13b-3da246af3ab2 cs4Label=VID cs5=de3c633ac428e0678f3aac20cf7f239431e54cbb8a17e8302f53653923305e1835a9cd871db32aa4fc7b8a9463366cc4 cs5Label=clappsigdproc=Browser cs6=Firefox cs6Label=clapp ccode=[IL] cicode=Rehovot cs7=31.8969 cs7Label=latitude cs8=34.8186 cs8Label=longitude Customer=CEFcustomer123 start=1453290121336 request=site123.abcd.info/ requestmethod=GET qstr=p\=%2fetc%2fpasswd app=HTTP act=REQ_CHALLENGE_CAPTCHA deviceExternalID=33411452762204224 filetype=30037,1001, filepermission=2,1, cs9=Block Malicious User,High Risk Resources, cs9Label=Rule name
This one line record btw.
Any ideas on how to best parse this? The first step was using "|" but everything else got complicated. I've tried using text to column tool with "=" as delimiter but then I encountered records that had the "=" character in the field content as well. I am using the regex parse with '([^=\s]+)=((?:[\\]=|[^=])+)(?:\s|$)' to match single key=value pair but then it gets ugly... because I have to repeat this like 40 times for each field. Is there a prettier way to do this in Alteryx? Like a "for i in a" ?
Any ideas would be much appreciated :)
Thanks!