StreamReader encoding autodetect, and fallback default problems, plz help

Hi Im having some problem to get StreamReader behave "correct" ( )

Take a look:

fileStream = new StreamReader(fileInfo.OpenRead(), Encoding.GetEncoding(1252), true);

... As I understand this will check first bytes of file for UTF8 mask and if there isn't it will fallback to default encoding 1252

But no metter which file I oppen I allways get fileStream.CurrentEncoding = 1252. I Tryed with many different files and chechek that

EF BB BF in place, but no UTF8 returned from CurrentEncoding.

Thanks for helping !




Answer this question

StreamReader encoding autodetect, and fallback default problems, plz help

  • xion.truth


    private static Encoding detectEncoding(string filename)
    {
    byte[] data = new byte[3];
    StreamReader r = new StreamReader(filename);
    r.BaseStream.Read(data, 0, data.Length);
    r.Close();

    if (data[0] == 0xEF && data[1] == 0xBB && data[2] == 0xBF)
    return Encoding.UTF8;
    else if (data[0] == 0xFE && data[1] == 0xFF)
    return Encoding.BigEndianUnicode;
    else if (data[0] == 0xFF && data[1] == 0xFE)
    return Encoding.Unicode;
    else if (data[0] == 0x2B && data[1] == 0x2F && data[2] == 0x76)
    return Encoding.UTF7;
    else
    return Encoding.Default;
    }

  • sticksnap

    You want to use the StreamReader(String, Boolean) constructor that tells it to autodetect the BOM. Here's an example:

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    Dim sw As New StreamWriter("c:\temp\utf8.txt", False, System.Text.Encoding.UTF8)
    sw.WriteLine("Hello world")
    sw.Close()
    End Sub

    Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
    Dim sr As New StreamReader("c:\temp\utf8.txt", True)
    Debug.Print("Read '{0}' in encoding {1}", sr.ReadLine, sr.CurrentEncoding.EncodingName)
    sr.Close()
    End Sub



  • webflier

    That's a problem. In code page #1252, the BOM codes represent legitimate characters. There is thus no way that StreamReader could autodetect the encoding with 100% guaranteed accuracy. If you can live with 99.9% accuracy, you could try to read the BOM yourself by "pre-opening" the file and reading the first few characters. BOM encoding are as follows (in hex):
    EF-BB-BF: UTF-8
    FE-FF: UTF-16, big endian
    FF-FE: UTF-16, little endian
    00-00-FE-FF: UTF-32, big endian
    FF-FE-00-00: UTF-32, little endian
    2B-2F-76-xx: UTF-7
    and several really obscure ones...



  • Rotte2

    Thank tou wery much for your answer.

    You are correct i'm trying to use BOM autodetect, the problem is system needs to process two types of text files

    Windows1252 encoding (has no BOM) and UTF8, I need read the file get som information from it and write to new file witch must be in same code standart.

    When I use: StreamReader(String, true) I always get UTF8 enconding becouse when I open Windows1252 encoded file and there is no BOM and StreamReader defaults to UTF8 (So I allways get UTF8 no mether UTF8 or Windows1252 file).

    The problem begins when I need to create new file and store readed information, couse then I don't know for shure which code standart was used by original file. So new created file is allways UTF8 standard when I read property

    fileStream.CurrentEncoding And try to use it as parameter to stream writer.

    Becouse of that I tryed to Use:

    StreamReader(fileInfo.OpenRead(), Encoding.GetEncoding(1252), true);

    As I understand when I create instanse with this constructor it should default to cp1252 when no BOM is found, but strange thing now I allways get cp1252 no mether if file is 1252 or UTF8 !



  • StreamReader encoding autodetect, and fallback default problems, plz help