Tuesday, October 23, 2007

Convert UTF-8 to Unicode

private static string Utf8ToUnicode(string utf8)
{
    return Encoding.Unicode.GetString(
        Encoding.Convert(
        Encoding.UTF8,
        Encoding.Unicode,
        Encoding.UTF8.GetBytes(utf8)));
 }

LINQ solution:
private static string Utf8ToUnicode(string utf8)
{
  return   Encoding.UTF8.
       GetString(input.Select(item => (byte)item).ToArray()); 
}

5 comments:

Anonymous said...

Thank you for this little code snippet!!! I was looking for it.

Anonymous said...

Hello

this code does not work.

Here a working code:

public static string DecodeUtf8(string s_Input)
{
byte[] u8_Utf = new byte[s_Input.Length];

for (int i=0; i<s_Input.Length; i++)
{
// If there are characters above 255 it is IMPOSSIBLE that it is an UTF8 string.
// It is already in Unicode format, there is nothing to do!
if (s_Input[i] > 255)
return s_Input;

u8_Utf[i] = (byte)s_Input[i];
}

return Encoding.UTF8.GetString(u8_Utf);
}

Anonymous said...

anonymous, sorry to disillusion you, but your code doesn't work either.
UTF8 values > 127 are actually part of multi-byte characters and each of these needs to be converted to a single unicode character. UTF8 is not the same as old ASCII, except from 0 to 127.

So I don't know what encoding standard you are decoding (it wouldn't be a valid conversion for extended ASCII either), but it ain't utf8. Sorry.

Jyotsna said...

Hi, I just tried your code. But somehow it is not working for me.

utf8 = "Hôtel";
string output = Encoding.Unicode.GetString(
Encoding.Convert(
Encoding.UTF8,
Encoding.Unicode,
Encoding.UTF8.GetBytes(utf8)));

return output;

I have a conversion function in C++ which works wonders. I am looking for an equivalent C# to do the same.

The output of C++ for this input string is Hôtel. But in C# I am getting same as input.

What are your thoughts on this problem ?

Thanks for any inputs and help
Jyotsna

Gever said...

Hi,
Use LINQ solution.