String.Compare with Unicode characters

I am trying to sort a list of Greek Unicode strings, but I can't get the .NET framework to correctly sort or compare the accented vowels. For example
String.Compare(" "," ") // or String.Compare("\u1f00","\u1f30");
always returns 0 and
String.Compare(" ","α")
always returns -1 regardless of the Culture information (current, invariant, Greek) I use in the other arguments of the Compare function. Is there a way to make the first comparison return a negative number


Answer this question

String.Compare with Unicode characters

  • zu35926

    Do you think it is related to the implemenation of the CLI I haven't tried it but what does a native API call return

  • yanyee

    Or you can also use the suggestion i made if you are running on Vista. :-)

    If you are running .NET 2.0 you can also use normalization support to strip the diacritics....


  • Hasan ACAR

    Did someone say my name :-)

    The answer to this question can be found in the following blog post:

    The city elders won't give this string weight, either (aka On being consistently dead wrong, aka Ordinal or bust )

    Michael [MSFT]
    NLS Collation/Locale/Keyboard Technical Lead
    Globalization Infratstructure and Font Technologies


  • lightiv

    1. String.Compare(" ", " ", StringComparison.Ordinal);
    2. String.CompareOrdinal(" ", " ");
    3. CompareInfo compareInfo = CompareInfo.GetCompareInfo("el");
      Console.WriteLine(compareInfo.Compare(" ", " ", CompareOptions.Ordinal));



  • datahook

    I don't know - what API calls can I use to do such string comparisons

    I did check the Latin Extended Additional characters, and they work. It just seems to be a problem with the Greek Extended characters.

  • KingKarter

    Thanks everyone for the help - at least I found where not to look for the solution to my problem. It is not quite "Ordinal or bust" sorting however, I can use the good old pre-.NET and pre-Unicode solution of creating a personal Compare function (using the ICompare interface in .NET), where each string is stripped of all its accents (with a big switch statement to to convert all the alpha + accents characters to a normal alpha, etc), and then the stripped strings are compared alphabetically. If they are the same, then the two original strings are compared with Ordinal sorting. At that at least works, even though I would have liked to have a one-step solution with String.Compare() in .NET.

  • Sharath paleri

    Gabriel Lozano-Moran wrote:

    I don't know Greek but isn't it possible that the mentioned string compare functions are correct when they return 0 meaning that the mentioned compared characters are linguistic equal

    String.Compare(" ","α")

    CompareStringOrdinal in Vista can be used to check for binary equality I am not running Vista so I can't try this out.



    Actually
    String.Compare(" ","α")
    returns -1 when it should as you said return 0. And that is another error different from the one I mentioned, as α is in the normal Greek code page whereas is in the Greek Extended page. So it seems to me that

    String.Compare(Greek Extended character, Greek Extended character)=0
    String.Compare(Greek normal character, Greek Extended character)=1

    always regardless of what the actual Greek letters are. At least
    String.Compare(Greek normal character, Greek normal character)
    is correct, giving -1, 0 or 1 depending upon the two letters.

    CompareStringOrdinal doesn't help, as Ordinal comparisons work in .NET and anyway do not order the Extended Greek letters correctly. As someone else pointed out, the problem seems to be in the Windows API.

  • Dottj

    I don't know Greek but isn't it possible that the mentioned string compare functions are correct when they return 0 meaning that the mentioned compared characters are linguistic equal

    String.Compare(" ","α")

    CompareStringOrdinal in Vista can be used to check for binary equality I am not running Vista so I can't try this out.



  • georgeob

    I will probably be changing to Vista when it comes out, but I can not rely on all the users of my program having Vista (in fact they won't).

    I am (and all my users will be) using .NET 2.0, but normalization does not remove the accents, because the accents are not separate Unicode characters, but there is one Unicode character that represents the character and the accents together. So
    " ".Normalize()
    equals
    " " // = "\u1f96"
    for all the Normalize forms.

  • GShap

    I forgot to say that an Ordinal comparison does not help me - it works in this case, but not with the rest of the Unicode Greek alphabet. For example the Unicode characters \u1f00 and \u1f70 are both alpha letters with different accents, whilst \u1f30 is an iota letter. So whilst an ordinal comparison puts \u1f00 ( ) and \u1f30 ( ) in the correct order, it does not put \u1f30 and \u1f70 in the correct order ie alphabetical.

  • winstonSmith

    Comparing *any* character from the Greek Extended block \u1f00-\u1fff returns 0, regardless of culture or case-sensitivity. Sound like a bug-by-omission, you might want to record this at Product Feedback. I don't know enough about the .nlp tables in Rotor to verify. This is the test program I used:

    CultureInfo greek = CultureInfo.GetCultureInfo("el-GR");
    for (char ch1 = '\u1f00'; ch1 < '\u1ffe'; ++ch1)
    for (char ch2 = ++ch1; ch2 < '\u1fff'; ++ch2) {
    if (0 != String.Compare(ch1 + "", ch2 + "", false, greek))
    Console.WriteLine("found one");
    }



  • JackG

    Imho if there is someone that could help you it would be Michael Kaplan from MSFT or maybe one of the MVP's can post this on the internal channels...

  • voltagefreak

    Yes, I know for sure that if someone has a Unicode prob there is one man to call, he is known as the Unicode man :-p

  • Jason D. Camp

    The native API function is CompareString(). Same problem though, it returns CSTR_EQUAL for the all the characters. Windows Vista has CompareStringEx() but I can't try it yet.


  • String.Compare with Unicode characters