Convert a text document with korean character set to unicode utf-8

1 minute read

I'm working on a Korean website. It's about zen breathing and meditation. So, I have all the documents I want to upload. I read them from the file and displayed on the web page with the hope of seeing those documents in a nicely formatted page.


All the characters were broken. After a couple of hours of digging, it turns out that the documents were saved in ANSI, with Korean charset. It may work on Korean windows, but didn't on my machine, and wouldn't on most of hosting environment. So, I had to convert the encoding to utf-8.

What is the code page for Korean? Though I'm a Korean and developer, I don't know. After a few trials and errors, I discovered the code was 949.

This is my conversion code in C#

[sourcecode language="csharp"]
static void Main(string[] args)
var folder = new DirectoryInfo(@"c:\temp\SundoWeb\Content\html");
var files = folder.GetFiles();

foreach (var file in files)
Console.WriteLine("Converting {0} ...", file.Name);

private static void Convert(FileInfo file)
var bytes = File.ReadAllBytes(file.FullName);
var encodedBytes = Encoding.Convert(Encoding.GetEncoding(949), Encoding.UTF8, bytes);

File.WriteAllBytes(@"c:\temp\" + file.Name, encodedBytes);


Displaying unicode file on the web is easy. You just set the meta data within head element.

[sourcecode language="html"]
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />