Fast searching of a byte array

Can anyone provide guidance on what would be the fastest method of searching a large byte array for some particular sequence of bytes I know I could convert it to a string and do an IndexOf but I would like to avoid that. Any ideas

Answer this question

Fast searching of a byte array

  • arielito

    You can opt to come out the code of KMP pattern matching algorithm. That will allows you to match almost anything you want. KMP => Knuth Morris Pratt algorithm.


  • Denvas

    if you were looking for the bytes 03 02 01 in the following array: 01 01 02 02 03 03 03 02 01 01 01 02 02 02 03 03, if you sorted that array looking for (also sorted) 01 02 03 you'd now not find any matches in: 01 01 01 01 01 02 02 02 02 02 02 03 03 03 03 03

  • SpeedOfSPin

    For big byte array (20 more items maybe), you can copy it and use quick-sort to sort it then use binary-search...

    Even if you were just looking for a single byte, I'd guess that the array would have to be much large (several thousand, I'd say), before sorting becomes faster than a linear search. (actually, as the search would be O(N) and the sort O(N log N), unless you can get several searches out of one sort, it would never be faster)



  • Barthi

    Oh, I made a mistake, sorting a byte array seems not reasonable...

  • Will_M

    For big byte array (20 more items maybe), you can copy it and use quick-sort to sort it then use binary-search...

    But it depends..



  • kirupa

    Thanks SteveDrake - you have been a good help...

    I was wondering if you how to handle searching a byte array within a byte array which my span several blocks. For example : I receive a buffer (byte array) which I need to search for a sequence of bytes (works great with your example) but the problem is that the I can recieve multiple blocks of byte array's and the sequence of bytes which I am searching might span one or more blocks - which makes it difficult to search without somehow concatenating the blocks and then searching - that would incur a performance overhead which I would like to avoid. Any thoughts


  • lyntonS

    Hello,

    Peter Ritchie comments are spot on, you need to check each byte.

     

    I just knocked up this :

     

    using System;
    using System.Collections.Generic;
    using System.Text;

    namespace FindByteArrayinByteArray
    {
        class Program
        {
            static int indexOf(byte[] array, byte[] value)
            {
                int found = 0;
                for (int i = 0; i < array.Length; i++)
                {
                    if (array[ i ] == value[found])
                    {
                        if (++found == value.Length)
                        {
                            return i - found + 1;
                        }
                    }
                    else
                    {
                        found = 0;
                    }
                            }
                return -1;
            }

            public static void Main()
            {
                System.Console.WriteLine(indexOf(new byte[] { 1, 2, 3, 4 }, new byte[] { 1, 2, 3 }));
                System.Console.WriteLine(indexOf(new byte[] { 1, 2, 3, 4 }, new byte[] { 3, 2, 3 }));
                System.Console.WriteLine(indexOf(new byte[] { 1, 2, 3, 4 }, new byte[] { 2, 3, 4 }));
                System.Console.WriteLine(indexOf(new byte[] { 1, 2, 3, 4 }, new byte[] { 4 }));

                System.Console.ReadKey();
            }
        }
    }

     

    Thanks

    Steve


     


  • billb59

    If it's un-ordered data you're pretty much stuck checking each byte--which is what IndexOf would do...

  • Fast searching of a byte array