Leveling up with osquery for Workloads: Using regular expressions (REGEX) in queries
Audit and Remediation provide direct access to osquery functionality within the VMware Carbon Black Cloud console to enable security, compliance, and IT teams to query over 2,000 individual attributes across endpoints and workloads. osquery can help teams with gathering information at scale across environments for IT and help desk operations, compliance and M&A reporting, incident response, and security investigations.
Audit and Remediation allow administrators to ask questions about the environment across hardware, software, and network variables at scale. VMware Carbon Black Cloud Workload customers have access to the full Audit and Remediation capabilities. In this blog series, we’ll be laying out relevant queries for VMware Carbon Black Cloud Workload customers to use to achieve a variety of use cases. All queries are available in the VMware Carbon Black Cloud console, or in the VMware Carbon Black User Exchange.
Requirements:
This blog series is intended for readers that have a basic understanding of SQLite and have an osquery test environment. If neither of these things is true for you, please take a moment to read the Audit and Remediation Best Practices Guide before reading the rest of the blog.
Resources
-
https://osquery.io/schema/ (Refer to the Best Practices guide to determine the version currently installed)
-
VMware Carbon Black Query Exchange (*Requires you sign up for a free account in the VMware Carbon Black User Community unless you are a customer)
Using regular expressions (REGEX) in queries
I was recently asked about a use case regarding finding Emotet malware artifacts using Live Query. The artifacts in question looked like:
C:\WINDOWS\System32\Random Folder Name\Random File Name.Random File Type C:\windows\syswow64\Random Folder Name\Random File Name.Random File Type
The actual file paths look like:
C:\WINDOWS\System32\Kefcbfwztlxk\ppfiimnvbwkgw.nvg C:\windows\syswow64\txrsryjkrhlvwvve\sgnzjys.dyw
I initially thought of using a query to look at the file creation times in order to identify outliers, but the person asking wanted to try to use regular expressions (REGEX) to try to find the files, so for the purpose of this blog post, we will explore that route. This post is not a discussion of how to write a REGEX, but how to use them in osquery. osquery uses the Java variant of REGEX, so please see this site for information on creating your own REGEX.
In osquery we have a function called regex_match() that we can use, but there are some caveats. rRegex_match() as defined in the osquery documentation is:
regex_match(COLUMN, PATTERN, INDEX): Runs regex match across the column, and returns matched subgroups. (The 0 index is the full match; subsequent numbers are the groups).
If you use regex_match() in a SELECT clause, then it will return what is matched. Here is an example without grouping:
osquery> select regex_match('foo bar',"\w+\ \w+",0);
+---------------------------------------+
| regex_match('foo bar',"\w+\ \w+",0) |
+---------------------------------------+
| foo bar |
+---------------------------------------+
Here is an example with grouping (Note that I changed the index as well):
osquery> select regex_match('foo bar',"(\w+)\ \w+",1);
+---------------------------------------+
| regex_match('foo bar',"(\w+)\ \w+",1) |
+---------------------------------------+
| foo |
+---------------------------------------+
When you use it in a WHERE clause then it will return a non-NULL value on a hit, so you have to tell the WHERE clause that is what you are looking for:
osquery> select name from processes
where regex_match(name,"carbon\S?black",0) is not null;
+----------------------------------------+
| name |
+----------------------------------------+
| com.carbonblack.es-loader.es-extension |
+----------------------------------------+
So, let's look at the query they were trying to use (we chose to focus on the SysWOW64 directory because there are fewer files and directories than System32):
select * FROM file
WHERE path like regex_match(path,"C\:\\Windows\\SysWOW64\\[A-Z0-9]{4,16}\\[A-Z0-9]{3,15}\.[A-Z][0-9]{3}",0);
First, there is not a is not null after the regex_match(), but there are also a few other issues with the REGEX. To troubleshoot this query, I find it easy to not use a specific table, but just a typical value that we are trying to match against: C:\Windows\SysWOW64\asdUf567\qdqwd.df3. If we use that data in just a basic SELECT we get no results:
osquery> select regex_match('C:\Windows\SysWOW64\asdUf567\qdqwd.df3',"C\:\\Windows\\SysWOW64\\[A-Z0-9]{4,16}\\[A-Z0-9]{3,15}\.[A-Z][0-9]{3}",0) as 'match';
+-------+
| match |
+-------+
| |
+-------+
If we look at the character classes that they used you will see that you are only looking for uppercase characters: A-Z, but the path is made up of both upper and lower case. However, if we add lower case to the regex, we still get no results:
osquery> select regex_match('C:\Windows\SysWOW64\asdUf567\qdqwd.df3',"C\:\\Windows\\SysWOW64\\[a-zA-Z0-9]{4,16}\\[a-zA-Z0-9]{3,15}\.[a-zA-Z][0-9]{3}",0) as 'match';
+-------+
| match |
+-------+
| |
+-------+
If you look at the portion for the file extension the character class is split in two and is looking for one character followed by exactly three numbers. When we fix that, we get a match!:
osquery> select regex_match('C:\Windows\SysWOW64\asdUf567\qdqwd.df3',"C\:\\Windows\\SysWOW64\\[a-zA-Z0-9]{4,16}\\[a-zA-Z0-9]{3,15}\.[a-zA-Z0-9]{3}",0) as 'match';
+----------------------------------------+
| match |
+----------------------------------------+
| C:\Windows\SysWOW64\asdUf567\qdqwd.df3 |
+----------------------------------------+
We are making progress, but we are not done yet. Let’s use our modified REGEX for the full query:
osquery> select * FROM file WHERE path like regex_match(path,"C\:\\Windows\\SysWOW64\\[a-zA-Z0-9]{4,16}\\[a-zA-Z0-9]{3,15}\.[a-zA-Z0-9]{3}",0) is not null;
W0201 12:09:28.400182 4156 virtual_table.cpp:965] Table file was queried without a required column in the WHERE clause
W0201 12:09:28.400182 4156 virtual_table.cpp:976] Please see the table documentation: https://osquery.io/schema/#file
Error: constraint failed
The way we are trying to use regex_match() here is incorrect as I explained initially, so let’s fix the WHERE clause, but even though it runs we get no results:
osquery> select * FROM file WHERE path like '\windows\syswow64\%\%' and regex_match(path,"C\:\\Windows\\SysWOW64\\[a-zA-Z0-9]{4,16}\\[a-zA-Z0-9]{3,15}\.[a-zA-Z0-9]{3}",0) is not null;
osquery>
Let's now take a look at the actual data we are trying to match against:
osquery> select path from file where path like '\windows\syswow64\%' and type = 'regular' limit 1;
+--------------------------------+
| path |
+--------------------------------+
| \windows\syswow64\12520437.cpx |
+--------------------------------+
So, the path does not have C: and it is in all lowercase. Let's use the lower() function to force the path to all lower case and change our regex to be all lowercase (I am not sure how to make regex_match() case insensitive) and remove the C:. Now we get results, but 202 of them.
select * FROM file WHERE path like '\windows\syswow64\%\%' and regex_match(lower(path),"\\windows\\syswow64\\[a-zA-Z0-9]{4,16}\\[a-zA-Z0-9]{3,15}\.[a-zA-Z0-9]{3}",0) is not null;
Since we are looking for file extensions that are random, they will most likely not have common file extensions, so let's look at what the most common file extensions are. To do this we are using another function: substr(). This function allows us to extract a substring from a larger string. This syntax for this function is:
substr(column, start, length)
The column value is self-explanatory. The start value is an index that can be positive or negative. The positive numbers count from the left, and the negative numbers count from the right. The length value is the number of characters that we want to capture. Since we are looking for three-character file extensions, we want to start at the third to last character in the string and grab three characters. Here is the query:
osquery> select
substr(filename,-3,3),
count(substr(filename,-3,3))
FROM file
WHERE path like '\windows\syswow64\%\%' and
regex_match(lower(path),"\\windows\\syswow64\\[a-z0-9]{4,16}\\[a-zA-Z0-9]{3,15}\.[a-zA-Z0-9]{3}",0) is not null
group by substr(filename,-3,3);
+-----------------------+------------------------------+
| substr(filename,-3,3) | count(substr(filename,-3,3)) |
+-----------------------+------------------------------+
| dll | 75 |
| doc | 1 |
| exe | 7 |
| inf | 1 |
| mof | 88 |
| nmd | 16 |
| ppt | 1 |
| sys | 1 |
| tlb | 1 |
| txt | 1 |
| xls | 1 |
| xml | 6 |
| xsl | 3 |
+-----------------------+------------------------------+
So, we can remove mof, nmd, dll, exe, xsl and xml to reduce the number of hits we have to sift through:
select * FROM file
WHERE path like '\windows\syswow64\%\%'
and regex_match(lower(path),"\\windows\\syswow64\\[a-z0-9]{4,16}\\[a-zA-Z0-9]{3,15}\.[a-zA-Z0-9]{3}",0) is not null
and substr(filename,-3,3) not in ('mof','nmd','dll','exe','xsl','xml')
group by substr(filename,-3,3);
We still have seven results so we should take a look at other attributes that might help. At this point I would suggest that you look at the actual files to try to determine which will be helpful. I would suggest that you look at: size, file_version, hard_links, attributes, uid, and gid.
There is also a more obvious column to filter on... directory! If we look at that column for the seven we have left, we see that we could filter them out quite easily:
osquery> select directory FROM file
...> WHERE path like '\windows\syswow64\%\%'
...> and regex_match(lower(path),"\\windows\\syswow64\\[a-z0-9]{4,16}\\[a-zA-Z0-9]{3,15}\.[a-zA-Z0-9]{3}",0) is not null
...> and substr(filename,-3,3) not in ('mof','nmd','dll','exe','xsl','xml')
...> group by substr(filename,-3,3);
+---------------------------+
| directory |
+---------------------------+
| \windows\syswow64\MSDRM |
| \windows\syswow64\sppui |
| \windows\syswow64\MSDRM |
| \windows\syswow64\drivers |
| \windows\syswow64\wbem |
| \windows\syswow64\drivers |
| \windows\syswow64\MSDRM |
+---------------------------+
So, the query would be:
select directory FROM file
WHERE path like '\windows\syswow64\%\%'
and regex_match(lower(path),"\\windows\\syswow64\\[a-z0-9]{4,16}\\[a-zA-Z0-9]{3,15}\.[a-zA-Z0-9]{3}",0) is not null
and substr(filename,-3,3) not in ('mof','nmd','dll','exe','xsl','xml')
and directory not in ('\windows\syswow64\MSDRM','\windows\syswow64\sppui','\windows\syswow64\drivers','\windows\syswow64\wbem')
group by substr(filename,-3,3);
This blog post has shown how to effectively use the regex_match() function in osquery to solve an advanced use case in searching for malware artifacts. There are many alternative ways to search for malware artifacts, but next time the opportunity presents itself, try regular expressions using the process above in osquery.
To learn more about how Audit and Remediation capabilities can help you gather data at scale across your environments, check out the resources below and read the other blogs in this series:
Carbon Black Cloud Workload TestDrive
VMware Carbon Black Tech Zone: Mastering Audit & Remediation
Query Exchange (Carbon Black User Exchange)
Leveling Up with osquery for Workloads blogs