Problem
You want to parse and analyze a textbased logfile using PowerShell’s standard object management commands.
Solution
Use the ConvertTextObject script to work with textbased logfiles. With your assistance, it con verts steams of text into streams of objects, which you can then easily work with using PowerShell’s standard commands.
The ConvertTextObject script primarily takes two arguments:
- A regular expression that describes how to break the incoming text into groups
- A list of property names that the script then assigns to those text groups
As an example, you can use patch logs from the Windows directory. These logs track the patch installation details from updates applied to the machine (except for Windows Vista). One detail included in these logfiles are the names and versions of the files modified by that specific patch, as shown in Example 71.
Example 71. Getting a list of files modified by hotfixes
PS >cd $env:WINDIR PS >$parseExpression = "(.*): Destination:(.*) \((.*)\)" PS >$files = dir kb*.log Exclude *uninst.log PS >$logContent = $files | GetContent | SelectString $parseExpression PS >$logContent
(...)
- Destination:C:\WINNT\system32\shell32.dll (6.0.3790.205)
- Destination:C:\WINNT\system32\wininet.dll (6.0.3790.218)
- Destination:C:\WINNT\system32\urlmon.dll (6.0.3790.218)
- Destination:C:\WINNT\system32\shlwapi.dll (6.0.3790.212)
- Destination:C:\WINNT\system32\shdocvw.dll (6.0.3790.214)
- Destination:C:\WINNT\system32\digest.dll (6.0.3790.0)
- Destination:C:\WINNT\system32\browseui.dll (6.0.3790.218) (...)
Like most logfiles, the format of the text is very regular but hard to manage. In this example, you have:
A number (the number of seconds since the patch started) The text, “: Destination:” The file being patched An open parenthesis The version of the file being patched A close parenthesis
You don’t care about any of the text, but the time, file, and file version are useful properties to track:
$properties = "Time","File","FileVersion" So now, you use the ConvertTextObject script to convert the text output into a stream of objects:
PS >$logObjects = $logContent | >> ConvertTextObject ParseExpression $parseExpression PropertyName $properties >>
We can now easily query those objects using PowerShell’s builtin commands. For example, you can find the files most commonly affected by patches and service packs, as shown by Example 72.
Example 72. Finding files most commonly affected by hotfixes
PS >$logObjects | GroupObject file | SortObject Descending Count | >> SelectObject Count,Name | FormatTable Auto >>
Count Name
152 C:\WINNT\system32\shdocvw.dll 147 C:\WINNT\system32\shlwapi.dll
Example 72. Finding files most commonly affected by hotfixes (continued)
128 C:\WINNT\system32\wininet.dll
116 C:\WINNT\system32\shell32.dll
92 C:\WINNT\system32\rpcss.dll
92 C:\WINNT\system32\olecli32.dll
92 C:\WINNT\system32\ole32.dll
84 C:\WINNT\system32\urlmon.dll (...)
Using this technique, you can work with most textbased logfiles.
Discussion
In Example 72, you got all the information you needed by splitting the input text into groups of simple strings. The time offset, file, and version information served their purposes as is. In addition to the features used by Example 72, however, the ConvertTextObject script also supports a parameter that lets you control the data types of those properties. If one of the properties should be treated as a number or a DateTime, you may get incorrect results if you work with that property as a string. For more information about this functionality, see the description of the –PropertyType parameter in the ConvertTextObject script.
Although most logfiles have entries designed to fit within a single line, some span multiple lines. When a logfile contains entries that span multiple lines, it includes some sort of special marker to separate log entries from each other. Take, for example:
PS >GetContent AddressBook.txt Name: Chrissy Phone: 5551212
Name: John
Phone: 5551213
The key to working with this type of logfile comes from two places. The first is the –Delimiter parameter of the GetContent cmdlet, which makes it split the file based on that delimiter instead of newlines. The second is to write a ParseExpression Regular Expression that ignores the newline characters that remain in each record.
PS >$records = gc AddressBook.txt Delimiter "" PS >$parseExpression = "(?s)Name: (\S*).*Phone: (\S*).*" PS >$records | ConvertTextObject ParseExpression $parseExpression
Property1 Property2
Chrissy 5551212
John 5551213 The parse expression in this example uses the single line option (?s) so that the (.*) portion of the regular expression accepts newline characters as well.
For extremely large logfiles, handwritten parsing tools may not meet your needs. In those situations, specialized log management tools can prove helpful. One example is Microsoft’s free Log Parser (http://www.logparser.com ). Another common alternative is to import the log entries to a SQL database, and then perform ad hoc queries on database tables, instead.