I am trying to get only certain propreties from the Suricata rule - content and related within and distance values. The outpu should look like eg.:
[[content1, distance1, within1], [content2, none, within2], [content3, none, none], [content4, distance4, none] ]
so it means that not in every rule is all properties. I also need to get these contents in a certain order as they are in the the rule.
I try:
import re
# Sample rule
selected_rule = '''alert tcp $EXTERNAL_NET any -> $HOME_NET 445 (msg:"ET EXPLOIT DOS Microsoft Windows SRV.SYS MAILSLOT"; flow:to_server,established; content:"|00|"; depth:1; content:"|FF|SMB%"; within:5; distance:3; byte_test:1,!&,128,6,relative; pcre:"/^.{27}/sR"; content:"|03|"; distance:21; content:"|01 00 00 00 00 00|"; distance:1; within:6; byte_test:2,=,17,0,little,relative; content:"|5C|MAILSLOT|5C|"; within:10; distance:2; reference:url,www.milw0rm.com/exploits/2057; reference:url,www.microsoft.com/technet/security/bulletin/MS06-035.mspx; reference:url,doc.emergingthreats.net/bin/view/Main/2003067; classtype:attempted-dos; sid:2003067; rev:5; metadata:created_at 2010_07_30, former_category EXPLOIT, updated_at 2010_07_30;)'''
content_with_distances_within = re.findall(r'content:\s*"([^"]+)";\s*(?:within:(\d+))?\s*;\s*(?:distance:(\d+))?',selected_rule)
content_with_within_distances = re.findall(r'content:\s*"([^"]+)";\s*(?:distance:(\d+))?;\s*(?:within:(\d+))?\s*',selected_rule)
content_only = re.findall(r'content:\s*"([^"]+)";(?!\s*(?:within|distance|;|$))', selected_rule)
content_distance_within_list = []
for match in content_with_distances_within:
content = match[0]
within = int(match[1]) if match[1] else None
distance = int(match[2]) if match[2] else None
content_distance_within_list.append([content, distance, within])
content_within_distance_list = []
for match in content_with_within_distances:
content = match[0]
within = int(match[2]) if match[2] else None
distance = int(match[1]) if match[1] else None
content_within_distance_list.append([content, distance, within])
content_only_list = [[content, None, None] for content in content_only]
merged_list = content_distance_within_list + content_within_distance_list + content_only_list
print(merged_list)
but my output is [['|FF|SMB%', 3, 5], ['|5C|MAILSLOT|5C|', 2, 10], ['|03|', 21, None], ['|01 00 00 00 00 00|', 1, 6], ['|00|', None, None]] / it is in bad order.
The right one output should be like: [['|00|', None, None], ['|FF|SMB%', 3, 5], ['|03|', 21, None], ['|01 00 00 00 00 00|', 1, 6], ['|5C|MAILSLOT|5C|', 2, 10] ]
There is problem that not everytime is content property followed by distance, sometimes is followed by within and after that by distance.
Does someone have idea how to do this? This example also not handle other properties. I need to extract to the related content also depth and offset and have output like [[content1, distance1, within1, depth1, offset1], [content2, none, within2, none, offset1], [content3, none, none, none, none], [content4, distance4, none, depth4, offset4] ] etc.