XML or a database?

I'm creating a program that basically parses textfiles and stores them into a database for easier manipulation (so I can run queries on a DB instead of parsing/comparing/etc raw text on-the-fly)

Currently I am using an Access database to store everything, but now that I realize .NET has such great XML support, I figure this may be a better option. 

I'm thinking of storing each file into its own XML file then loading the selected file into a dataset and pulling info from it.  This is basically what I'm doing now, but instead of XML, it's the Access db.

Does anyone have any advice on which method is better

 EDIT:

Wow, I forgot to mention something important.  What I wanted to know is, how do I handle "auto-numbering" if I don't have Access to take care of that for me.



Answer this question

XML or a database?

  • Hir&#233&#59;n

    Personally,

    I understand what your getting at, i just think your thinking of XML in the wrong context. XML is a great way of handling structured data, but your using TEXT files, you'll need to transform then into the structured format. You can create a schema for the XML document and use that to validate the XML document once you have parsed the text file. That might be a little bit of work, but it will pay off in the end.

    As for storing the information, using access isn't a bad idea if you have a column for each bit of important data. If you storing the text file as one chunk, that isn't such a good idea. I'm assuming your not.

    In regards to your dataset idea, yes a dataset is XML and is also Data. They are one and the same. A dataset can be serialized to your hard disk for persistence in XML format, then loaded back and then the information inside saved to the DB.

    If you need me to clariffy, let me know.

    Hope that helps.


  • Skapol

    Somewhat related but possibly overkill is that if you used SQL Server 2005 Express, I know it supports XML as a native datatype so there may be (but probably not) some benefit in storing the XML files in the DB. You can query the XML data structure using SQL and that might serve to your advantage. But certainly using straight XML files could be more than sufficient but just figured I'd give you a heads up so you can decide for yourself if there is any functionality there you could use.

  • Chris Honcoop

    I would add that using a database makes synchronization much easier if you were to ever have multiple instances of your application running requiring access to the same data. Also, having the rich data management tools available for Access (and other databases) is a plus as well.

    In terms of auto-numbering, for XML, I think you need to handle that yourself, since XML doesn't have a notion of an auto-increment key field.

  • iccle

    I was a bit vague on what the text files actually are.  These are fixed length files that have to adhere to certain standards.  The table it is inserted to depends on the first few characters of the line.

    Once the user selects a file, the program parses it into a typed dataset.  It's then put into a listview (not datagrid) so the user can confirm it parsed correctly.  After they press the confirm button, it will write it to an XML file so they can load it whenever they want (into the dataset) and look it over/manipulate it/etc.

    As for the auto-numbering, I may not even need it anymore.  I was using a bunch of related tables to not only hold the contents of the file, but also to store what company it belongs to and a little info about the file.  I may just be able to keep my files and companies table, and store the actual data in XML. 

    The reason I'm steering away from an Access database is because this program needs to constantly query a seperate database for a different reason. 

    None of this is shared, so I don't have to worry about multiple users accessing the same resources.

     

     

    ----------

    Edited for spelling


  • Nikola Atanasov

    Start with the schema, allow everything else to flow therefrom. Define a schema and you can generate a typed data set. Once you have the typed dataset's source you can then populate the dataset directly from the file parsing process. Then you can use the dataset to save its contents out to file in a neccessarily valid format which the DataSet will also be able to read back in. With that method you don't have to bother writing any XmlWriter code or any such.

    As a side not you can generate schemas from .NET classes and vice versa using the xsd.exe utility the "XML Schema Definition Tool" that's included as part of .NET. It may prove useful. Read about it here:

    http://msdn.microsoft.com/library/default.asp url=/library/en-us/cptools/html/cpconXMLSchemaDefinitionToolXsdexe.asp

    Visual Studio has a schema designer tool however if you aren't comfortable with writing out a schema by hand I would recommend learning how to construct an untyped DataSet at runtime. The DataSet has methods for creating tables and rows and columns and relations and associating them with each other as well as populate this ad-hoc DataSet with its ad-hoc Data. Once you have created this in memory construct you can call this untyped DataSet's WriteXmlSchema method to output a properly formatted schema which you can then use to generate a typed dataset.

    Perhaps the Group # should be a field of the XML or an attribute (more complicated) rather than a container meh, maybe not. You do know that VB has the Do Until keyword (Do While Not is so wierd to look at).

    I do hope some of this wonderful information is useful to you.



  • xyzt

    If you do still need the autonumbering, the DataTable can generate that for you.

    Parse your data into the typed DataSet/DataTable with auto-increment on, then save it off as XML.

    DJ



  • forrestcupp

    No benefit I can think of and in all honesty I'm not precisely sure how to do it.

  • GM55

    I wouldn't mind using SQL server, but this app is for use at my work, and it's not exactly "supported" by the company, it's just a side project I'm doing at home to make my job easier.


    So I started trying to parse this file into XML...I created a few classes, each with a few variables, let's say:

    Public Class Files
    Public g1 as Group1
    Public g2 as Group2
    End Class

    Public Class Group1
    Public Data as String
    End Class

    Public Class Group2
    Public Data as String
    End Class

    Now I have a loop that runs through the file and parses it to different parts (pseudocode to save room):

    Dim stream as StreamReader
    Dim file as StreamWriter
    Dim writer as XmlSerializer
    Dim myFile as Files

    Do While Not EOF
    If stream.substring(0, 6) = "Group1"
    myFile.Group1.Data= stream.substring(6, 10)
    Else
    myFile.Group2.Data = stream.substring(6, 10)
    End If

    writer.serialize(file, myFile)

    Next


    Something like that...

    So what this does is create an XML file with multiple roots, which of course gives an error when try to read it because it's invalid XML.

    It does this...

    <Files>
    <Group1>
    <Data>Blah</Data>
    </Group1>
    </Files>
    <Files>
    <Group2>
    <Data>Blah</Data>
    </Group2>
    </Files>


    I want it to do this...

    <Files>
    <Group1>
    <Data>Blah</Data>
    </Group1>
    <Group2>
    <Data>Blah</Data>
    </Group2>
    </Files>

  • Okugops

    cool, thanks for the added information - in this case, it seems like XML is a good choice for you, since you are interacting mainly with the typed dataset anyways, and the XML file is really just a offline cache of your data.

  • malc_s

    You're right, I don't know what I was thinking. I even explained the process I was going to do in the other post and I started to try parsing directly from the file to XML.

    Would there be an advantage to making the Group # an attribute, rather than an element

  • wraithzshadow

    What i would recommend doing (i assume your using Visual Studio), is open that XML file in the studio, and right click on it, and choose, Generate Schema, the studio will take your sample and spit out a Schema for you. You'll need to change the types, it usually puts System.String for everything, and from there you'll be able to validate your text file.

    Also, parsing XML by hand using string manipulation is painful, use an XmlDocument, it will be much faster and you can navigate your file with ease.

    Let me know if you need anymore help


  • XML or a database?