Dr Neil first mentioned this on the via virtual earth site recently and now I need to use them. Just though I would put my ideas out here and see what you all think and can add.
To be clear this is a concept and I’m looking for ideas / help before I build it.
Basically the issue is you have X number of points to put on the map where X is very large.
1. If you go and add them all it is going to slow down to a crawl.
2. If you have points at a close proximity at certain zoom levels they are going to sit on top of each other and be unhoverable.
The solution is to have a mega cluster that provides information about the points it represents.
Firstly we need to look at how we calculate what points need to be merged to become a mega cluster, secondly we need to look at what a mega cluster shows on hover to be useful.
AJAX
So straightaway with this many points we need to make an asynchronous call to a web service to get out points in XML. This can then be bound to the VE in JavaScript (or Atlas) in various ways. What we want is every time the map is panned or zoomed to get a new set of data. The parameters we will need to pass are: top left Lat Lon, bottom right Lat Long (so we know the area that is visible) and the zoom level.
Calculations
1. So I assume you have a collection of all the points in some way. We need to get only the ones that are visible. If the points are stored in the database we could query to get only these. If we had a more common set of points perhaps these are cached, if so we need to get a subset of just the visible ones.
2. Now we need to identify which points are so close to each other they should be made into a mega cluster. We know the zoom level, we should have a constant value that indicates how big our icons are in pixels so what we need is a value in degrees for our current zoom level that indicates the radius of our icon. Using this we can group our points but there are some complications.
If you have a set of evenly spaced points all within the range of each other to be clustered which ones get clustered with which points
My algorithm:
I’m sure someone out there is a maths genius (I hope so) so I wonder if there is an algorithm that already exists to do this in the most optimal format.
For now I propose we simply have a sorted list of our visible points, sorted by longitude.
We go through the list in order, for each point if we:
look backwards in the list for any points within the range that are not already grouped, as the points are in order we exit as soon as it exceeds the range.
WE then look forwards in the list for any points within the range, again we short out.
For each value we find that longitude is in the range we check to see if the latitude is in the range also, if it is we group this. Perhaps we add a value to indicate a new point group id, or remove it from the list and build a new list of points, I’m undecided and up for ideas.
Importantly we end up with a list of points, some of them grouped, all of them visible in the current view.
Looks
So we could just pass this information out to VE but it doesn’t know what to do with our new grouping. We could potentially still have a large number of points all grouped together.
Now points have a balloon that appears when you hover. It could contain anything but most commonly it has a title, description, an image.
Option 1 – “VE Windows” (Popup over the top of VE – I assume these can be made I have never made one).
I’m thinking for a mega cluster we provide a brief list of links to the points’ balloons with a more link. This way if there is say less then 5 you can see the titles and go to the one you are after immediately else you need to see the whole list.
For the whole list we could open up a new “VE window” and show the complete list of titles as links.
When you click on a link either from the initial 5 or from the list it would open another “VE window” (replacing the one that is up, or minimising it) showing the balloon information.
Option 2 – Zoom
Same concept with the lists of titles but instead on being individual links there is a single “zoom to expand” link, perhaps also invoked by clicking on the point. It simply centres on the point and zooms in a specified amount.
This would force a new request and the points would effectively be separated into smaller groups or individual points. Icons would help here.
The issue here is what happens when you have Y number of points at exactly the same location. These can never be separated. How could this happen you ask Say you took 10 photos from the same spot and are plotting photos of the world…
Hey it was long, you made it through it, share you thoughts.
John.

Mega Cluster logic
Sri_Prad
My next steps are:
1) create a grid that compensates for the curve of the earth.
2) create a function that snaps points to that grid server side (basic within bounds of grid square).
3) calulate the required grid for each zoomlevel and specified icon size.
4) display grouped points as mega clusters
5) optimise the algorithm
I'll try to put each step up on my site for those that are interested.
John.
vijay1
I tried using some funky algorithims to draw a 500Km grid:
http://soulsolutions.com.au/MegaClusterEx2.htm
(ie only, Fair bit of CPU work here so be patient)
Immediatly you can see that this was silly as it follows the same issue as using degrees.
I then found the very nice map.PixelToLatLong(x,y) function and ploted a 100 pixel grid
http://soulsolutions.com.au/MegaClusterEx3.htm
(ie only again - need to look at polylines in FF problem)
So I have the peices of the puzzle but now to put it together on the server side...
John.
Sen_p_kumar
have a look at this PPT http://dimacs.rutgers.edu/Workshops/MiningTutorial/pindyk-slides.ppt,
maybe this "Algorithms for Nearest Neighbor Search" can be a starting point for a datastructure or anything else related to your algorithm... found it yesterday looking for s/t similar!
hope it can help you ;-)
cheers
DavidThi808
http://viavirtualearth.com/VVE/Articles/Clustering.ashx
All the code is there.
John.
Soul Solutions - Virtual Earth and Dotnetnuke
Andrew Sims
http://soulsolutions.com.au/MegaClusterEx1.htm
(IE only sorry)
Note the curvature of the earth.
John.
EGShaw
It seems I was a little too simple in my algorithm for clustering. It appears you need to factor in the curve of the earth and use something like the "Haversine formula" to calulate the distance between two points:
dlon = lon2 - lon1;
dlat = lat2 - lat1;
a = (sin(dlat/2)) ^2 + cos(lat1) * cos(lat2) * (sin(dlon/2)) ^2;
c = 2 * atan2(sqrt(a), sqrt(1-a));
d = R * c;
(Where R is the Radius of the earth)
Then we create a matrix with the distance between every point.
I can see this being very slow, the first thing we can do is eliminate points are are clearly two far away based on my simple evaluation of dlon and dlat. The code we write should be mulitheaded utilising a divide and conquer algorithm as suggested by nside.
I will run up a few tests over the next week.
John.
tevisB
teqmem
Did you ever post the article I am very interested in this approach. I would like to modify it to work with a c. 30,000 point polyline to display a high-resolution track (at high zoom levels) without overwhelming the client or the server (at low zoom levels).
Regards,
- John
IonGuy
airwalker2000
You should use the excellent Cluster 3.0 clustering engine which is perfect for this kind of task. It provides k-means clustering (that's the hard math part of the problem) capabilities that you can improve by using a simple divide and conquer algorithm. This way you get very natural and precise clustering (feeding all the points to the k-means algorithm gives strange results and is very slow). It works best with static data but I could imagine someone using it for dynamic datasets.
Regards,
Holy Seraph
After reading some of the articles on clustering I came across a simply idea. Instead of calculating distances between points etc why not just lay down a grid of points that do not overlap for the size of the icon and current zoom level and then simply snap the points to that grid.
The calculation for the grid will be a little complex to factor in the curvature of the earth but then only a single pass over the visible points will be required to group them.
My first steps will be to create the grid and confirm it will work. I can already see that this could be optimised based on a sorted set of points to only calculate the grid points where necessary over two passes of the data, one for latitude, one for longitude.
Any thoughts
idoprz
You're the first to ask.
I'll put something together and send it to Dr Neil over at viavirtualearth.com within the next few days.
I'm using it to plot map replays of thousands of point so adding some polylines shouldn't be in issue.
Its all vb.net based on the AJAX example from channel 9. Maybe a good place to start is to watch their articles on VE while I put this together.
John.
ANB_149
Jay Williams
With a bit of help from the logic in the WorldWind Hack on Viavirtualearth (thanks Casey) i got clustering working server side.
Checkout the screen shot of 100,000 points clustered over the middle east.
http://soulsolutions.com.au/megacluster.jpg
Lots of tweaking to go buts it is incredibly fast server side already. Client side is the killer, the javascript rendering time for the above screen shot was > 30sec. Turning off the labels helped but i think VE just won't work with too many points.
So let me know if your interested and i'll turn this into an article for Viavirtualearth. Otherwise I'll assume no-one else wants to plot 100,000 points in VE ;)
John.
FahimatMicrosoftForum