Hi,
I'm wondering if there is a way to speed this up:
I have objects that implement the following methods:
public void Serialize(System.IO.BinaryWriter writer)
{
writer.Write(this.price);
writer.Write(this.size);
writer.Write(this.dateTime.Ticks);
}
public MyObject Deserialize(System.IO.BinaryReader reader)
{
return new MyObject(reader.ReadDouble(), reader.ReadInt32(), new DateTime(reader.ReadInt64()));
}
That's what I use for serialization & deserialization. Currently I manage to write 10,000,000 objects in 5.5 seconds, however my Raid 0 drives can write up to 140MB/s, so I think there should be some upside left (10mio objects result in 195MB file). I guess the problem is that the .Write method is called so often (30mio times in this case). So, I was wondering if anybody has some ideas or techniques how the performance can be improved a bit
Thanks,
Tom

Ultra-fast deserialization/serialization
Callum
I usually serialize objects like:
XmlSerializer theSerializer = new XmlSerializer(typeof(class));
theSerializer.Serialize(someStream, classInstance);
which does it fast.
I don't know if you are actually truely serializing objects or just writing down the values (like a standard/plain basic xml doc)
However still, I think that writing such large amount of objects/values in that time frame is pretty fast and you would expect some perf issues with the large amount of data to be written. Xml can be a bit expensive at times when writing so I guess this is where the hit is.
Sorry I could not be of further help, just thought I would chip in my thoughts/views
rperreta
http://www.kingvest.de/q/storage1.png
http://www.kingvest.de/q/storage2.png
http://www.kingvest.de/q/storage3.png
Fogel
Probably the first read after positioning will read a number of bytes to fill the whole buffer and the next reads will just read from the buffer ... and because that the first read after positioning it is a lot slower.
I may be wrong, but I have a feeling that Ms didn't implement as a special case when you do stream.Position=x, and x is already equal with stream.Position.
imanish11111
writer.Write(this.dateTime.Ticks); is like 10 times faster than this.dateTime = new DateTime(reader.ReadInt64());
anything I can do about that Also, in general read operations seem to be slower than write operations (I did some performance profiling).
That's how I deserialize:
for (int i = 0; i < 10000000; i++)
{
Trade tr = trades
}
---->
public T this[int index]
{
get
{
stream.Position = objectSize * index + HEADER_SIZE;
T obj = new T();
obj.Deserialize(reader);
return obj;
}
set
{
throw new Exception("The method or operation is not implemented.");
}
}
drasko982
I read that FileStream already has BufferedStream inbuilt, so using BufferedStream is pretty much useless (http://blogs.msdn.com/brada/archive/2004/04/15/114329.aspx)
NickNotYet
hackmonkey
reader.ReadInt64() about 10 times slower than:
reader.ReadDouble() .. they're both 8 bytes
Alex Farber
What can I say Tom. It is something wrong in your test if it shows that ReadInt64 is 10 times slower than ReadDouble, because it is clear that is the same thing. Nice one the thing with FileStream/BufferedStream :)
CET PRG455
I need an indexer for the objects when I'm looking for certain ones, though I will implement a sequential read for the enumeration.
Any other tips or hints what I could do Would it make a big difference to do block reads
IgorP
It shows new DateTime(reader.ReadInt64()) is a lot slower.
Not the reading is the slow thing. If you want to test only the performance of reading, just do a simple loop and read everything without creating the objects (T, DateTime).
And do another simple loop and create 10 millions of T and DateTime. And you will see where your program is spending most of the time.
If you already use BufferedStream, this class is using block reads. The default value of the block is 4k, I believe. You can increase it, but I don't think you will see a big difference.
Wizzie
If you want raw performance custom binary serialization is the only way to go (like you did).
Couple of hints
- try to use BufferedStream, if you didn't use it already, which will improve the performance
- writing more objects already serialized in a buffer in one Write call
- preallocate the file if you can estimate the size; and expand it using big chunks
- look for disk fragmentation
- if you want best performance use interop with the native I/O API and implement unbuffered I/O (see CreateFile and the FILE_FLAG_NO_BUFFERING flag)
daverage
ReadInt64 or ReadDouble is the same thing. I don't know why you think that this is the problem. I'm sure that it isn't.
Deserialization is slower because you need to create the objects where you put the deserialized data: new T, new DateTime. No surprise. You are creating millions of objects.
Also setting the position inside the stream is slowing down the things - just read them sequentially. Why you need to set the position
une
thanks for the BufferedStream tip, this increased performance to 3.28seconds per 10 million serialized objects (down from 5.5).
As for collecting objects before writing and preallocating .. this is tough because I do not know how many objects are going to be written ... could be one or 100million ...
Thanks again,
Tom
MingMa
Could it be that the first read after position is set is slower than the sequential ones for some reason