24/7/365 Support

Program: Convert Text Streams to Objects in Windows PowerShell

One of the strongest features of PowerShell is its objectbased pipeline. You don’t waste your energy creating, destroying, and recreating the object representation of your data. In other shells, you lose the fullfidelity representation of data when the pipeline converts it to pure text. You can regain some of it through excessive text parsing, but not all of it.

However, you still often have to interact with lowfidelity input that originates from outside PowerShell. Textbased data files and legacy programs are two examples.

PowerShell offers great support for two of the three textparsing staples:

Sed

Replaces text. For that functionality, PowerShell offers the replace operator.

Grep

Searches text. For that functionality, PowerShell offers the SelectString cmdlet, among others.

The third traditional textparsing tool, Awk, lets you to chop a line of text into more intuitive groupings. PowerShell offers the Split() method on strings, but that lacks some of the power you usually need to break a string into groups.

The ConvertTextObject script presented in Example 57 lets you convert text streams into a set of objects that represent those text elements according to the rules you specify. From there, you can use all of PowerShell’s objectbased tools, which gives you even more power than you would get with the textbased equivalents.

Example 57. ConvertTextObject.ps1

############################################################################## ## ## ConvertTextObject.ps1 Convert a simple string into a custom PowerShell ## object.

##

##
Parameters:

##

##
[string] Delimiter

##
If specified, gives the .NET Regular Expression with which to

##
split the string. The script generates properties for the

##
resulting object out of the elements resulting from this split.

##
If not specified, defaults to splitting on the maximum amount

##
of whitespace: "\s+", as long as ParseExpression is not

##
specified either.

##

##
[string] ParseExpression

##
If specified, gives the .NET Regular Expression with which to

##
parse the string. The script generates properties for the

##
resulting object out of the groups captured by this regular

##
expression.

##

Example 57. ConvertTextObject.ps1 (continued)

##
** NOTE ** Delimiter and ParseExpression are mutually exclusive.

##

##
[string[]] PropertyName

##
If specified, the script will pair the names from this object

##
definition with the elements from the parsed string. If not

##
specified (or the generated object contains more properties

##
than you specify,) the script uses property names in the

##
pattern of Property1,Property2,...,PropertyN

##

##
[type[]] PropertyType

##
If specified, the script will pair the types from this list with

##
the properties from the parsed string. If not specified (or the

##
generated object contains more properties than you specify,) the

##
script sets the properties to be of type [string]

##

##

##
Example usage:

##
"Hello World" | ConvertTextObject

##
Generates an Object with "Property1=Hello" and "Property2=World"

##

##
"Hello World" | ConvertTextObject Delimiter "ll"

##
Generates an Object with "Property1=He" and "Property2=o World"

##

##
"Hello World" | ConvertTextObject ParseExpression "He(ll.*o)r(ld)"

##
Generates an Object with "Property1=llo Wo" and "Property2=ld"

##

##
"Hello World" | ConvertTextObject PropertyName FirstWord,SecondWord

##
Generates an Object with "FirstWord=Hello" and "SecondWord=World

##

##
"123 456" | ConvertTextObject PropertyType $([string],[int])

##
Generates an Object with "Property1=123" and "Property2=456"

##
The second property is an integer, as opposed to a string

##

##############################################################################

param( [string] $delimiter, [string] $parseExpression, [string[]] $propertyName, [type[]] $propertyType )

function Main( $inputObjects, $parseExpression, $propertyType, $propertyName, $delimiter)

{ $delimiterSpecified = [bool] $delimiter $parseExpressionSpecified = [bool] $parseExpression

## If they've specified both ParseExpression and Delimiter, show usage if($delimiterSpecified and $parseExpressionSpecified)

Example 57. ConvertTextObject.ps1 (continued)

{ Usage return

}

## If they enter no parameters, assume a default delimiter of whitespace if(not $($delimiterSpecified or $parseExpressionSpecified)) {

$delimiter = "\s+" $delimiterSpecified = $true }

## Cycle through the $inputObjects, and parse it into objects foreach($inputObject in $inputObjects) {

if(not $inputObject) { $inputObject = "" } foreach($inputLine in $inputObject.ToString()) {

ParseTextObject $inputLine $delimiter $parseExpression ` $propertyType $propertyName } } }

function Usage

{ "Usage: " " ConvertTextObject" " ConvertTextObject ParseExpression parseExpression " +

"[PropertyName propertyName] [PropertyType propertyType]" " ConvertTextObject Delimiter delimiter " + "[PropertyName propertyName] [PropertyType propertyType]" return }

## Function definition ParseTextObject. ## Perform the heavylifting parse a string into its components. ## for each component, add it as a note to the Object that we return function ParseTextObject {

param( $textInput, $delimiter, $parseExpression, $propertyTypes, $propertyNames)

$parseExpressionSpecified = not $delimiter

$returnObject = NewObject PSObject

$matches = $null $matchCount = 0 if($parseExpressionSpecified) {

Example 57. ConvertTextObject.ps1 (continued)

## Populates the matches variable by default [void] ($textInput match $parseExpression) $matchCount = $matches.Count

} else {

$matches = [Regex]::Split($textInput, $delimiter) $matchCount = $matches.Length }

$counter = 0 if($parseExpressionSpecified) { $counter++ } for(; $counter lt $matchCount; $counter++) {

$propertyName = "None" $propertyType = [string]

## Parse by Expression if($parseExpressionSpecified) {

$propertyName = "Property$counter"

## Get the property name if($counter le $propertyNames.Length) {

if($propertyName[$counter 1]) { $propertyName = $propertyNames[$counter 1] } }

## Get the property value if($counter le $propertyTypes.Length) {

if($types[$counter 1]) { $propertyType = $propertyTypes[$counter 1] }

} } ## Parse by delimiter else {

$propertyName = "Property$($counter + 1)"

## Get the property name if($counter lt $propertyNames.Length) {

if($propertyNames[$counter]) { $propertyName = $propertyNames[$counter] } }

Example 57. ConvertTextObject.ps1 (continued)

## Get the property value if($counter lt $propertyTypes.Length) {

if($propertyTypes[$counter]) { $propertyType = $propertyTypes[$counter] } } }

AddNote $returnObject $propertyName ` ($matches[$counter] as $propertyType) }

$returnObject }

## Add a note to an object function AddNote ($object, $name, $value) {

$object | AddMember NoteProperty $name $value }

Main $input $parseExpression $propertyType $propertyName $delimiter

Generate Large Reports and Text Streams in Windows PowerShell

Problem

You want to write a script that generates a large report or large amount of data

Solution

The best approach to generating a large amount of data is to take advantage of PowerShell’s streaming behavior whenever possible. Opt for solutions that pipeline data between commands:

GetChildItem C:\ *.txt Recurse | OutFile c:\temp\AllTextFiles.txt

rather than collect the output at each stage:

$files = GetChildItem C:\ *.txt –Recurse $files | OutFile c:\temp\AllTextFiles.txt

If your script generates a large text report (and streaming is not an option), use the StringBuilder class:

$output = NewObject System.Text.StringBuilder

GetChildItem C:\ *.txt Recurse |

ForeachObject { [void] $output.Append($_.FullName + "`n") }

$output.ToString()

rather than simple text concatenation:

$output = "" GetChildItem C:\ *.txt Recurse | ForeachObject { $output += $_.FullName } $output

Discussion

In PowerShell, combining commands in a pipeline is a fundamental concept. As scripts and cmdlets generate output, PowerShell passes that output to the next command in the pipeline as soon as it can. In the solution, the GetChildItem commands that retrieve all text files on the C: drive take a very long time to complete. However, since they begin to generate data almost immediately, PowerShell can pass that data onto the next command as soon as the GetChildItem cmdlet produces it. This is true of any commands that generate or consume data and is called streaming. The pipeline completes almost as soon as the GetChildItem cmdlet finishes producing its data and uses memory very efficiently as it does so.

The second GetChildItem example (that collects its data) prevents PowerShell from taking advantage of this streaming opportunity. It first stores all the files in an array, which, because of the amount of data, takes a long time and enormous amount of memory. Then, it sends all those objects into the output file, which takes a long time as well.

However, most commands can consume data produced by the pipeline directly, as illustrated by the OutFile cmdlet. For those commands, PowerShell provides streaming behavior as long as you combine the commands into a pipeline. For commands that do not support data coming from the pipeline directly, the ForeachObject cmdlet (with the aliases of foreach and %) lets you to still work with each piece of data as the previous command produces it, as shown in the StringBuilder example.

Creating large text reports

When you generate large reports, it is common to store the entire report into a string, and then write that string out to a file once the script completes. You can usually accomplish this most effectively by streaming the text directly to its destination (a file or the screen), but sometimes this is not possible.

Since PowerShell makes it so easy to add more text to the end of a string (as in $output += $_.FullName), many initially opt for that approach. This works great for smalltomedium strings, but causes significant performance problems for large strings.

As an example of this performance difference, compare the following:

PS >MeasureCommand { >> $output = NewObject Text.StringBuilder

>> 1..10000 | >> ForeachObject { $output.Append("Hello World") } >> } >>

(...) TotalSeconds : 2.3471592

PS >MeasureCommand { >> $output = "" >> 1..10000 | ForeachObject { $output += "Hello World" } >> } >>

(...) TotalSeconds : 4.9884882

In the .NET Framework (and therefore PowerShell), strings never change after you create them. When you add more text to the end of a string, PowerShell has to build a new string by combining the two smaller strings. This operation takes a long time for large strings, which is why the .NET Framework includes the System.Text. StringBuilder class. Unlike normal strings, the StringBuilder class assumes that you will modify its data—an assumption that allows it to adapt to change much more efficiently.

Help Category:

Get Windows Dedicated Server

Only reading will not help you, you have to practice it! So get it now.

Processor RAM Storage Server Detail
Intel Atom C2350 1.7 GHz 2c/2t 4 GB DDR3 1× 1 TB (HDD SATA) Configure Server
Intel Atom C2350 1.7 GHz 2c/2t 4 GB DDR3 1× 128 GB (SSD SATA) Configure Server
Intel Atom C2750 2.4 GHz 8c/8t 8 GB DDR3 1× 1 TB (HDD SATA) Configure Server
Intel Xeon E3-1230 v2 3.3 GHz 4c/8t 16 GB DDR3 1× 256 GB (SSD SATA) Configure Server
Intel Atom C2350 1.7 GHz 2c/2t 4 GB DDR3 1× 250 GB (SSD SATA) Configure Server

What Our Clients Say