Quantcast
Channel: Data Preparation & Blending discussions
Viewing all articles
Browse latest Browse all 4999

Parsing CEF Files (from Incapsula web application security)

$
0
0

Hi there!

 

I have a CEF (Common event format)  file that looks like this:

 

CEF:0|Incapsula|SIEMintegration|1|1|Illegal Resource Access|9| fileid=3412341160002518171 sourceServiceName=site123.abcd.info siteid=1509732 suid=50005477 requestClientApplication=Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0 cs2=true cs2Label=Javascript Support cs3=true cs3Label=CO Support src=12.12.12.12 caIP=[13.13.13.13] ccode=[IL] tag=www.elvis.com cn1=200 in=54 xff=44.44.44.44 cs1=NOT_SUPPORTED cs1Label=Cap Support cs4=c2e72124-0e8a-4dd8-b13b-3da246af3ab2 cs4Label=VID cs5=de3c633ac428e0678f3aac20cf7f239431e54cbb8a17e8302f53653923305e1835a9cd871db32aa4fc7b8a9463366cc4 cs5Label=clappsigdproc=Browser cs6=Firefox cs6Label=clapp ccode=[IL] cicode=Rehovot cs7=31.8969 cs7Label=latitude cs8=34.8186 cs8Label=longitude Customer=CEFcustomer123 start=1453290121336 request=site123.abcd.info/ requestmethod=GET qstr=p\=%2fetc%2fpasswd app=HTTP act=REQ_CHALLENGE_CAPTCHA deviceExternalID=33411452762204224 filetype=30037,1001, filepermission=2,1, cs9=Block Malicious User,High Risk Resources, cs9Label=Rule name

 

This one line record btw.

Any ideas on how to best parse this? The first step was using "|" but everything else got complicated. I've tried using text to column tool with "=" as delimiter but then I encountered records that had the "=" character in the field content as well. I am using the regex parse with '([^=\s]+)=((?:[\\]=|[^=])+)(?:\s|$)' to match single key=value pair but then it gets ugly... because I have to repeat this like 40 times for each field. Is there a prettier way to do this in Alteryx? Like a "for i in a" ?

 

Any ideas would be much appreciated :)

 

Thanks!


Viewing all articles
Browse latest Browse all 4999

Trending Articles