Convert a text document with korean character set to unicode utf-8

I’m working on a Korean website. It’s about zen breathing and meditation. So, I have all the documents I want to upload. I read them from the file and displayed on the web page with the hope of seeing those documents in a nicely formatted page.

boom!

All the characters were broken. After a couple of hours of digging, it turns out that the documents were saved in ANSI, with Korean charset. It may work on Korean windows, but didn’t on my machine, and wouldn’t on most of hosting environment. So, I had to convert the encoding to utf-8.

What is the code page for Korean? Though I’m a Korean and developer, I don’t know. After a few trials and errors, I discovered the code was 949.

This is my conversion code in C#

static void Main(string[] args)
{
	var folder = new DirectoryInfo(@"c:\temp\SundoWeb\Content\html");
	var files = folder.GetFiles();

	foreach (var file in files)
	{
		Console.WriteLine("Converting {0} ...", file.Name);
		Convert(file);
	}
}

private static void Convert(FileInfo file)
{
	var bytes = File.ReadAllBytes(file.FullName);
	var encodedBytes = Encoding.Convert(Encoding.GetEncoding(949), Encoding.UTF8, bytes);

	File.WriteAllBytes(@"c:\temp\" + file.Name, encodedBytes);
}

Displaying unicode file on the web is easy. You just set the meta data within head element.

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Convert a text document with korean character set to unicode utf-8

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s