From Chapter 5. of Processing XML with Java, "Reading XML":

The main point is this: most programs you write are going to read documents written in a specific XML vocabulary. They are not going to be designed to handle absolutely any well-formed document that comes down the pipe. Your programs will make assumptions about the content and structure of those documents, just as they now make assumptions about the content and structure of external objects.


That's really an intelligent way of coming at reading XML, and it's the way I argue is the way it should, in about 75% of real world cases where XML is used, be done. That Harold does it, even tongue in cheek, is far more important than my recommendation, of course. His books on Java are top rate stuff, imo. The bottom line here is that you know darn well what the XML file you want to read is going to look like. Why put in any libraries for parsing it all out that you don't absolutely need? KISS.

Here's some of the code where he's reading out the XML from his specific example.

private static BigInteger readFibonacciXMLRPCResponse(
InputStream in) throws IOException, NumberFormatException,
StringIndexOutOfBoundsException {

StringBuffer sb = new StringBuffer();
Reader reader = new InputStreamReader(in, "UTF-8");
int c;
while ((c = in.read()) != -1) sb.append((char) c);

String document = sb.toString();
String startTag = "";
String endTag = "
";
int start = document.indexOf(startTag) + startTag.length();
int end = document.indexOf(endTag);
String result = document.substring(start, end);
return new BigInteger(result);

}


I think you can see what's going on. Honestly, that's the right way to do it in what is a much more common, real-world situation than one might suspect. Now granted, Harold adds...

Straight text parsing is not the appropriate tool with which to navigate an XML document. The structure and semantics of an XML document is encoded in the documentโ€™s markup, its tags and its attributes; and you need a tool that is designed to recognize and understand this structure as well as reporting any possible errors in this structure. This tool is called an XML parser.


But think of how many times you've seen XML used -- essentially as a text file. There's absolutely no reason to learn all the new APIs necessary to parse XML in these cases. You could just as easily have created your own file format, but then I understand why one might use XML instead. There is structure that would take weeks for a typical programming team to put together in conference. Why not lean on an over-engineered structured for your flat files? If you do step past known, easy to consume formats, then you don't have to re-engineer to use XML APIs.

XML's strength is to compose very large batches of structured data for anonymous consumption, but that doesn't mean that's the way it's most commonly used, nor the only way that XML's structure can be used. Let your approach, both for the file and the parsing, match your needs.

Labels: ,