Searching code with regex
Posted by Matt at 2:15 PM
7 comments - Categories:
regex | Coldfusion
Ever had trouble finding where a variable is set, displayed or manipulated throughout a large application?
I can remember often trying to find code before having any knowledge of regex.
An example of a problem I used to face several years ago is the following.
I would inherit a large mess of an application and in the code I would find a variable #abc123#, not knowing where this is set I would do a search on the following string using Eclipse
<cfset abc123
I expected that this would find the variable, however when I eventually found it, I would often find that it would be written something like
<cfset abc123 (notice the 2 spaces between cfset and abc.)
Another example would be I wanted to search for where abc123 is set to session.crap so
I'd search
<cfset abc123 = session.crap >
and
<cfset abc123 = session.crap (notice the 2 spaces)
I'd have no luck, I'd find that one developer would write code like
<cfset abc123 = session.crap >
and the other
<cfset abc123=session.crap> (notice the lack of spacing)
I've worked on some nasty apps over the years and have lost my fair share of hair hunting stuff down. These days I simply use regex whenever I search through code.
Now, lets get started on a 5 minute tutorial, open Eclipse hit CTRL + F, tick the regular expressions check box, enter the following regex <cfset[\t\s]+abc123[\t\s]*?\=(.*?)\> and it should look like below.

Now if you hit search it should find where the variable is being set on your testing page, however most likely you'll want to search the entire site, so use that search screen.
To modify this for a variable name simply replace the text "abc123" in the above regex with the name of your variable.
Lets consider the following scenario.
In your site you have a variable session.crap, you want to know all the locations of where its set in the giant messy application you've just inherited, so you search on session.crap and the search returns 500 results in 150 files, instantly you realise 2 things.
1, this codebase contains a lot of crap
2, the session has crap all through it (pun intended).
Some of the results for this variable in the search are
<cfset crud = session.crap + mrVariable />
#session.crap#
<cfset session.crap.crud.fcku.whacked = "yo!" />
<cfscript>
session.crap = 1 + 1;
session.crap= session.crap
session.crap = session.crap
</cfscript>
You get my point, there could be a bazillion different ways this could appear if you simply search on the name or search for <cfset session.crap which of course could have an unknown amount of whitespace.
The first step of the hunt I would try would be
<cfset[\t\s]+session\.crap[\t\s]*?\=(.*?)\>
and for cfscript the below should do.
[\t\s]+session\.crap[\t\s]*?\=(.*?);
Remember to escape the Dot with a \ otherwise the dot will match any character
e.g sessionscrap would be found just as session.crap would (for more info see my previous blog tutorial).
The above of course isn't perfect but it is one of the many ways to save time without having to run code and debug etc.
Regex can be used in many ways, lets say we wanted to simply know how many cfsets are in our code base we could run
<cfset[\t\s]+[a-zA-Z_]+[\w\.]+[\t\s]*?\=(.*?)\> (of course we could just search on cfset)
But then what if wanted to know how many of these cfsets are in an xml compliant format (for whatever reason).
<cfset[\t\s]+[a-zA-Z_]+[\w\.]+[\t\s]*?\=(.*?)\/\>
will accomplish this to a degree.
If I was assigned with cleaning up a large code base I could search through and see what variables have been scoped for example
<cfset[\t\s]+[a-zA-Z_]+[\w\.]+\.[a-zA-Z_]+[\w\.]+[\t\s]*?\=(.*?)\>
or scoped and xml compliant
<cfset[\t\s]+[a-zA-Z_]+[\w\.]+\.[a-zA-Z_]+[\w\.]+[\t\s]*?\=(.*?)\/\>
If I wanted to follow some basic OO principles I could start by making sure my cfc's are encapsulated,
a simple example would be to search through all cfc files for variables belonging to the request scope.
<cfset[\t\s]+request\.[a-zA-Z_]+[\w\.]+[\t\s]*?\=(.*?)\>
By now you know what to do for searching Application variables, simply change the "request" above to "application".
Some further examples of regex for searching through/counting CF code.
#[a-zA-Z_]+[\w\.]+\.[a-zA-Z_]+[\w\.]+# Output Scoped variables
#[a-zA-Z_]+[\w]+# Output unscoped variables
Searching for variables through code can be much faster than debugging etc, I hope others can use some of the above regular expressions in their daily development, if you have some helpful regex you use often or if you think you can improve on my regex, then please post it below.

Tony wrote on 06/07/10 7:38 AM
Looks like great stuff Mat. Just what I was looking for.Just one thing. We have have some really 'unique' coding styles to deal with. One of my main bugbears is unscoped variables. So someone will call #randomVar# somewhere in the code. But it will actually be coming from soming like #session.randomVar#. Can you update your example to show how you might be able to ignore any scoping of the variable names.