Ultra-fast deserialization/serialization

Hi,

I'm wondering if there is a way to speed this up:
I have objects that implement the following methods:

public void Serialize(System.IO.BinaryWriter writer)
{
writer.Write(this.price);
writer.Write(this.size);
writer.Write(this.dateTime.Ticks);
}

public MyObject Deserialize(System.IO.BinaryReader reader)
{
return new MyObject(reader.ReadDouble(), reader.ReadInt32(), new DateTime(reader.ReadInt64()));
}

That's what I use for serialization & deserialization. Currently I manage to write 10,000,000 objects in 5.5 seconds, however my Raid 0 drives can write up to 140MB/s, so I think there should be some upside left (10mio objects result in 195MB file). I guess the problem is that the .Write method is called so often (30mio times in this case). So, I was wondering if anybody has some ideas or techniques how the performance can be improved a bit

Thanks,

Tom


Answer this question

Ultra-fast deserialization/serialization

  • Callum

    I usually serialize objects like:

    XmlSerializer theSerializer = new XmlSerializer(typeof(class));

    theSerializer.Serialize(someStream, classInstance);

    which does it fast.

    I don't know if you are actually truely serializing objects or just writing down the values (like a standard/plain basic xml doc)

    However still, I think that writing such large amount of objects/values in that time frame is pretty fast and you would expect some perf issues with the large amount of data to be written. Xml can be a bit expensive at times when writing so I guess this is where the hit is.

    Sorry I could not be of further help, just thought I would chip in my thoughts/views



  • rperreta

    here some screenshots of the profiling...

    http://www.kingvest.de/q/storage1.png
    http://www.kingvest.de/q/storage2.png
    http://www.kingvest.de/q/storage3.png

  • Fogel

    Probably the first read after positioning will read a number of bytes to fill the whole buffer and the next reads will just read from the buffer ... and because that the first read after positioning it is a lot slower.

    I may be wrong, but I have a feeling that Ms didn't implement as a special case when you do stream.Position=x, and x is already equal with stream.Position.


  • imanish11111

    The problem I have left now is that the deserialization performance stinks compared to the serialization performance. One big problems seems to be the deserialization of DateTimes.
    writer.Write(this.dateTime.Ticks); is like 10 times faster than this.dateTime = new DateTime(reader.ReadInt64());
    anything I can do about that Also, in general read operations seem to be slower than write operations (I did some performance profiling).

    That's how I deserialize:

    for (int i = 0; i < 10000000; i++)
    {
    Trade tr = tradesIdea;
    }

    ---->

    public T this[int index]
    {
    get
    {
    stream.Position = objectSize * index + HEADER_SIZE;
    T obj = new T();
    obj.Deserialize(reader);
    return obj;
    }
    set
    {
    throw new Exception("The method or operation is not implemented.");
    }
    }

  • drasko982

    no, in another sample run I replaced the new DateTime with long l = reader.ReadinInt64() and it was just as slow, so it's not the DateTime that's so expensive to be allocated.
    I read that FileStream already has BufferedStream inbuilt, so using BufferedStream is pretty much useless (http://blogs.msdn.com/brada/archive/2004/04/15/114329.aspx)

  • NickNotYet

    I think you're right because otherwise the stream positioning wouldn't be so slow in my sample as the position already matches the new position

  • hackmonkey

    hm, actually I just found out that not new DateTime(reader.ReadInt64()) is bottleneck but the reader.ReadInt64 is Why is:
    reader.ReadInt64() about 10 times slower than:
    reader.ReadDouble() .. they're both 8 bytes

  • Alex Farber

    What can I say Tom. It is something wrong in your test if it shows that ReadInt64 is 10 times slower than ReadDouble, because it is clear that is the same thing. Nice one the thing with FileStream/BufferedStream :)


  • CET PRG455

    hm, but look at that output from ANTS profiler http://www.kingvest.de/q/storage3.png .. it shows ReadInt64 to be a lot slower.

    I need an indexer for the objects when I'm looking for certain ones, though I will implement a sequential read for the enumeration.

    Any other tips or hints what I could do Would it make a big difference to do block reads

  • IgorP

    It shows new DateTime(reader.ReadInt64()) is a lot slower.

    Not the reading is the slow thing. If you want to test only the performance of reading, just do a simple loop and read everything without creating the objects (T, DateTime).

    And do another simple loop and create 10 millions of T and DateTime. And you will see where your program is spending most of the time.

    If you already use BufferedStream, this class is using block reads. The default value of the block is 4k, I believe. You can increase it, but I don't think you will see a big difference.


  • Wizzie

    If you want raw performance custom binary serialization is the only way to go (like you did).

    Couple of hints

    - try to use BufferedStream, if you didn't use it already, which will improve the performance

    - writing more objects already serialized in a buffer in one Write call

    - preallocate the file if you can estimate the size; and expand it using big chunks

    - look for disk fragmentation

    - if you want best performance use interop with the native I/O API and implement unbuffered I/O (see CreateFile and the FILE_FLAG_NO_BUFFERING flag)


  • daverage

    ReadInt64 or ReadDouble is the same thing. I don't know why you think that this is the problem. I'm sure that it isn't.

    Deserialization is slower because you need to create the objects where you put the deserialized data: new T, new DateTime. No surprise. You are creating millions of objects.

    Also setting the position inside the stream is slowing down the things - just read them sequentially. Why you need to set the position


  • une

    Hi Dumitru,

    thanks for the BufferedStream tip, this increased performance to 3.28seconds per 10 million serialized objects (down from 5.5).
    As for collecting objects before writing and preallocating .. this is tough because I do not know how many objects are going to be written ... could be one or 100million ...

    Thanks again,

    Tom

  • MingMa

    hmm... this is odd.. I just switched the fields, so it reads/writes double first and then Int64... with the result that then ReadDouble is 10 times slower than ReadInt64.
    Could it be that the first read after position is set is slower than the sequential ones for some reason

  • Ultra-fast deserialization/serialization