PowerShell and arrays as performance problem in scripts
2024-11-25My Powershell script is running for so long! That's a sentence I hear again and again. Very often I find at least one line in the script $myArray += $Item
.
You might be thinking now, what does that have to do with it?
But the fact is that it does and I'll explain to you here why that is the case.
Arrays in PowerShell
The performance killer
The following code construction is often seen in PowerShell scripts and when using ChatGPT or GitHub Copilot, such code is often generated:
$myArray=@()
Foreach($object in $objects){
# do some stuff with the object
If($object.SomeProperty -match '^regular expression'){
$myArray += $Object
}
}
If you are only working with a few objects, you can use this without any noticeable negative effects. However, if you are working with many objects, this use of an array is terrible.
This is because a standard array in PowerShell is not a dynamic data type. Therefore, the computer has to copy all the data from the existing array plus the new object into a new array and thus to a new location. This is a time-consuming action.
A real-life example
As a real-life example, let's take a pile of firewood. Now you want to add a piece of firewood to the pile. But you can't because there's no room for the extra piece.
So, you put the new piece in a new location and then take the existing pile and move it to the new location. And now you repeat that for the next 10,000 pieces of firewood. Would you really do that?
The performance boost
If you need a dynamic array where you can add and remove elements, you should use Generic.List instead of a static array.
If you don't need a dynamic list, you can get a performance boost by changing the above code to:
$myArray=@(Foreach($object in $objects){
# do some stuff with the object
If($object.SomeProperty -match ‘^regular expression’){
$object # you can use Write-Output $object instead
}
})
You don’t believe me?
Test it by your own with the following code:
Measure-Command {
$myArray1 = @()
foreach ($i in $(1..10000) ) {
$myArray1 += $i
}
}
Measure-Command {
$myArray2 = @(foreach ($z in $(1..10000) ) {
$z
})
}
If I run this 100 times in a loop and calculate how much faster variant 2 is on average compared to variant 1, I get the following results: On 100 runs: Variant2 is 1399.76556 milliseconds faster than Variant1 on average The average runtime for Variant1 is 1407.03501 milliseconds The average runtime for Variant2 is 7.26945 milliseconds
However, you have to keep in mind that in this test I'm only using integers to populate the array. If you're working with a larger data type like Active Directory users or groups, the difference is much bigger.
In Powershell, you can create a Generic.List for strings like this:
$myDynamicArray = new-object "System.Collections.Generic.List[System.String]"
or if you cannot specify the datatype for the list use:
$myDynamicArray = New-Object "System.Collections.Generic.List[System.Object]"
Alternatively, you can write:
$myDynamicArray = New-Object System.Collections.Generic.List``1[System.String]
The whole testing code:
$runs = 100
$numberOfObjects=10000
$objectsForTest = $(1..$numberOfObjects)
$TimediffArray=@(
for ($t = 0; $t -lt $runs; $t++) {
$CmdResult1 = Measure-Command {
$myArray1 = @()
foreach ($i in $objectsForTest ) {
$myArray1 += $i
}
Remove-Variable myArray1 -Force
}
$CmdResult2=Measure-Command {
$myArray2 = @(foreach ($z in $objectsForTest ) {
$z
})
Remove-Variable myArray2 -Force
}
new-object psobject -property @{
'Variant1'=$CmdResult1.TotalMilliseconds;
'Variant2'=$CmdResult2.TotalMilliseconds;
'Difference'=$CmdResult1.TotalMilliseconds - $CmdResult2.TotalMilliseconds
}
}
)
[double]$difference=0.0
[double]$Variant1=0.0
[double]$Variant2=0.0
foreach($diff in $TimediffArray){
$difference+=$diff.Difference
$Variant1+=$diff.Variant1
$Variant2+=$diff.Variant2
}
write-host "On $($TimediffArray.Count) runs:`r`nVariant2 is $($difference/$TimediffArray.Count) milliseconds faster than Variant1 on average`r`nthe average runtime for Variant1 is $($Variant1/$TimediffArray.Count) milliseconds`r`nthe average runtime for Variant2 is $($Variant2/$TimediffArray.Count) milliseconds"