I'm creating a program that basically parses textfiles and stores them into a database for easier manipulation (so I can run queries on a DB instead of parsing/comparing/etc raw text on-the-fly)
Currently I am using an Access database to store everything, but now that I realize .NET has such great XML support, I figure this may be a better option.
I'm thinking of storing each file into its own XML file then loading the selected file into a dataset and pulling info from it. This is basically what I'm doing now, but instead of XML, it's the Access db.
Does anyone have any advice on which method is better
EDIT:
Wow, I forgot to mention something important. What I wanted to know is, how do I handle "auto-numbering" if I don't have Access to take care of that for me.

XML or a database?
Hir&#233&#59;n
Personally,
I understand what your getting at, i just think your thinking of XML in the wrong context. XML is a great way of handling structured data, but your using TEXT files, you'll need to transform then into the structured format. You can create a schema for the XML document and use that to validate the XML document once you have parsed the text file. That might be a little bit of work, but it will pay off in the end.
As for storing the information, using access isn't a bad idea if you have a column for each bit of important data. If you storing the text file as one chunk, that isn't such a good idea. I'm assuming your not.
In regards to your dataset idea, yes a dataset is XML and is also Data. They are one and the same. A dataset can be serialized to your hard disk for persistence in XML format, then loaded back and then the information inside saved to the DB.
If you need me to clariffy, let me know.
Hope that helps.
Skapol
Chris Honcoop
In terms of auto-numbering, for XML, I think you need to handle that yourself, since XML doesn't have a notion of an auto-increment key field.
iccle
I was a bit vague on what the text files actually are. These are fixed length files that have to adhere to certain standards. The table it is inserted to depends on the first few characters of the line.
Once the user selects a file, the program parses it into a typed dataset. It's then put into a listview (not datagrid) so the user can confirm it parsed correctly. After they press the confirm button, it will write it to an XML file so they can load it whenever they want (into the dataset) and look it over/manipulate it/etc.
As for the auto-numbering, I may not even need it anymore. I was using a bunch of related tables to not only hold the contents of the file, but also to store what company it belongs to and a little info about the file. I may just be able to keep my files and companies table, and store the actual data in XML.
The reason I'm steering away from an Access database is because this program needs to constantly query a seperate database for a different reason.
None of this is shared, so I don't have to worry about multiple users accessing the same resources.
----------
Edited for spelling
Nikola Atanasov
Start with the schema, allow everything else to flow therefrom. Define a schema and you can generate a typed data set. Once you have the typed dataset's source you can then populate the dataset directly from the file parsing process. Then you can use the dataset to save its contents out to file in a neccessarily valid format which the DataSet will also be able to read back in. With that method you don't have to bother writing any XmlWriter code or any such.
As a side not you can generate schemas from .NET classes and vice versa using the xsd.exe utility the "XML Schema Definition Tool" that's included as part of .NET. It may prove useful. Read about it here:
http://msdn.microsoft.com/library/default.asp url=/library/en-us/cptools/html/cpconXMLSchemaDefinitionToolXsdexe.asp
Visual Studio has a schema designer tool however if you aren't comfortable with writing out a schema by hand I would recommend learning how to construct an untyped DataSet at runtime. The DataSet has methods for creating tables and rows and columns and relations and associating them with each other as well as populate this ad-hoc DataSet with its ad-hoc Data. Once you have created this in memory construct you can call this untyped DataSet's WriteXmlSchema method to output a properly formatted schema which you can then use to generate a typed dataset.
Perhaps the Group # should be a field of the XML or an attribute (more complicated) rather than a container meh, maybe not. You do know that VB has the Do Until keyword (Do While Not is so wierd to look at).
I do hope some of this wonderful information is useful to you.
xyzt
If you do still need the autonumbering, the DataTable can generate that for you.
Parse your data into the typed DataSet/DataTable with auto-increment on, then save it off as XML.
DJ
forrestcupp
GM55
So I started trying to parse this file into XML...I created a few classes, each with a few variables, let's say:
Public Class Files
Public g1 as Group1
Public g2 as Group2
End Class
Public Class Group1
Public Data as String
End Class
Public Class Group2
Public Data as String
End Class
Now I have a loop that runs through the file and parses it to different parts (pseudocode to save room):
Dim stream as StreamReader
Dim file as StreamWriter
Dim writer as XmlSerializer
Dim myFile as Files
Do While Not EOF
If stream.substring(0, 6) = "Group1"
myFile.Group1.Data= stream.substring(6, 10)
Else
myFile.Group2.Data = stream.substring(6, 10)
End If
writer.serialize(file, myFile)
Next
Something like that...
So what this does is create an XML file with multiple roots, which of course gives an error when try to read it because it's invalid XML.
It does this...
<Files>
<Group1>
<Data>Blah</Data>
</Group1>
</Files>
<Files>
<Group2>
<Data>Blah</Data>
</Group2>
</Files>
I want it to do this...
<Files>
<Group1>
<Data>Blah</Data>
</Group1>
<Group2>
<Data>Blah</Data>
</Group2>
</Files>
Okugops
malc_s
Would there be an advantage to making the Group # an attribute, rather than an element
wraithzshadow
What i would recommend doing (i assume your using Visual Studio), is open that XML file in the studio, and right click on it, and choose, Generate Schema, the studio will take your sample and spit out a Schema for you. You'll need to change the types, it usually puts System.String for everything, and from there you'll be able to validate your text file.
Also, parsing XML by hand using string manipulation is painful, use an XmlDocument, it will be much faster and you can navigate your file with ease.
Let me know if you need anymore help