File is rejected by a third-party system, when it is utf-8 encoded with BOM

1 minute read

Recently, I had a chance to refactor codes for a system that sends a file to DreamMail. Because we did not touch any FTP functionality, we were complacent and tested it up to where files are exported and sent via FTP. Development is completed and the application was deployed to a testing environment. Very thankfully and luckily, Jon (a team mate) and I happened to see that all new files generated by the refactored code are rejected in DreamMail ftp server. First we thought they fail because we use a testing account but it was worring that all files fail. We started investigating the issue. Picked one file that was successfully processed and uploaded it again. It worked without any problem. Then we compared the two files, successful one and failed one. Yet there was no difference. I assumed that something was different but was not visible, so downloaded Ultra-Edit because it can display file in hex code and checked the two files. The difference was the first 3 characters which is called BOM, Byte-Order Mark.

BOM is zero-width, no-break space and therefore, not visible in most of text editor. "It is conventionally used as a marker to indicate that text is encoded in UTF-8, UTF-16 or UTF-32." (wikipedia) In hex value it is EF BB BF.

If the file stars with BOM, DreamMail client rejects it for some reason, thinking it is malformed.  My code that prepended BOM to the file was this.

[sourcecode language="C#"]
using (StreamWriter sw = new StreamWriter(fs, Encoding.UTF8))

Fix for this is simple. Do not specify encoding in StreamWriter constructor.

[sourcecode language="C#"]
using (StreamWriter sw = new StreamWriter(fs))

There is a post in Expert Exchange that tells you to use Encoding.ASCII. It fixes the problem, but all text will be written in ASCII. You will lose lots of characters.