PowerShell – UTF8 and BOM

When you save something as UTF-8 in PowerShell, it, most likely, will be as UTF-8 with BOM. This can be a problem if there is characters not in the 7-bit ASCII character set it won’t be handled properly. So latin, non-latin, chinese, … are not handled by it.

The good news is from PowerShell 6+ the encoding is UTF8.

What is BOM ?

BOM stand for Byte Order Mark. It is an optional Unicode character. It indicate the endianness of the encoding.

How to save in UTF-8 without BOM

In a nutshell what you need to set false on the attribute with:

New-Object System.Text.UTF8Encoding($false)

After that, it’s a matter to save the encoding setting for the file. Here’s 2 examples XML and text file as a bonus the text file example manipulate JSON file.

XML Edition

[XML] $XmlDocument = ( Select-Xml -Path $FilePath  -XPath / ).Node

[System.Xml.XmlWriterSettings] $XmlSettings = New-Object System.Xml.XmlWriterSettings

#Preserve Windows formating
$XmlSettings.Indent = $true

#Keeping UTF-8 without BOM
$XmlSettings.Encoding = New-Object System.Text.UTF8Encoding($false)

[System.Xml.XmlWriter] $XmlWriter = [System.Xml.XmlWriter]::Create($FilePath, $XmlSettings)

#Close Handle and flush

Text file Edition

Here’s is a JSON example.

$FilePath = "C:\MyFolder\textfile.json"
$FileContent = Get-Content -Path $FilePath | ConvertFrom-Json

$UTF8Only = New-Object System.Text.UTF8Encoding($false)
[System.IO.File]::WriteAllLines($FilePath, @($FileContent | ConvertTo-Json), $UTF8Only)