Architecting aggregation of data

We have a group of very similar sql server databases running on our server. We are plannning to create an application which will need to get information from all these similar databases.

What do you all think is the best way to go about doing it

I am thinking about creating a realtime or periodically updated central database aggregating the info from all the others. How can I do this DTS Or create a data cache in memory

A web application, web service and windows forms will be running off this database.

Should be able to do a fast search. Is caching the database the only way to achieve this

How about XML databases

thanks a lot in advance.




Answer this question

Architecting aggregation of data

  • Nick Sheng

    If I understand you correctly you are looking for creating an ODS - Operational Data Store. The correct way to go about aggregating the data depends on the overall system architecture. if you have an SOA or other proper separation of the databases than it is probably better to go through the owning services (esp. in an SOA) otherwise SSIS can do a good job

    You can read a little about ODS in a SOA pattern I published sometimes ago called "Aggregated Reporting"

    Arnon



  • tdtdtd

    Thanks for your suggestions and for giving some direction Arnon. I will explore this further.

  • swathi_challa

    Thanks you all for your guidance. For our situation, I am currently looking at SSIS to understand it better before considering other alternatives. The reason being, direct aggregating ( new application/process) at the database level looks more appealing to me compared to options requiring either modifying the applications/services around the current systems to accomodate other types of aggregation. I will post back once I have decided if this SSIS thing really works for us or not. thanks.




  • Karl Erickson

    The best aggregation approach will depend on the data you are aggregting and what your latency requirements are. If it's OK for the aggregate to be behand the source databases then using SSIS jobs to update the aggregate is efficient and pretty flexible. If you want the aggregate to be close to up to date all the time then something like replication, triggers with Service Broker messages, or BizTalk is more appropriate. If each source system is the authoritative source for a subset of the data then replication is probably a good approach. For example, if each source system covers an issolated subset of customers or a unique set of inventory items then just replicating to the same table in the aggregate database will aggregate the data. If more than one of the sources contains the same data - the same customers on two systems for example - then your aggregation process will need to have enough intelligence to handle conflicting updates from different source systems.
  • AronW

    forthangol,

    Just want to recommend two articles, both from the last MS Architecture Journal (devoted to data access strategies)



  • Architecting aggregation of data