Get position in String from line number and line position

82 views Asked by At

I am using the XmlReader to parse XML data received over a socket. The data is sometimes broken into separate packets and when that happens, XmlReader will throw an XmlException for the incomplete XML. When this happens, I want to keep the partial XML message and concatenate it with the next data I receive which will contain the remaining XML. The problem I am having is keeping only the incomplete XML.

Here's sample XML (from Sample XML File) that illustrates the problem:

<book id="bk101">
  <author>Gambardella, Matthew</author>
  <title>XML Developer's Guide</title>
  <description>An in-depth look at creating applications 
  with XML.</description>
</book>
<book id="bk102">
  <author>Ralls, Kim</author>
  <title>Midnight Rain</title>
  <description>A former architect battles corporate zombies, 
  an evil sorceress, and her own childhood to become queen 
  of the world.</description>
</book>
<book id="bk103">
  <author>Corets, Eva</author>
  <title>Maeve Ascendant</title>
  <description>After the collapse of a nanotechnology 
  society in England, the young survivors lay the 
  foundation for a new society.</description>
</book>

Imagine 100 books were sold in one transaction and that the XML was divided in two packets. The 1st packet could end like this:

<book id="bk103">
  <author>Corets, Eva</author>
  <title>Maeve Ascendant</title>
  <description>After the collapse of a nano

and the following packet would contain the rest of the XML:

technology 
  society in England, the young survivors lay the 
  foundation for a new society.</description>
</book>

XmlReader will parse the first two books properly and I can get the XML using ReadOuterXml. It will then throw an exception on the 3rd book so I want to keep, from the original XML, only the substring starting at <book id="bk103">. XmlException gives me the LineNumber and LinePosition of the error but this is not the position I want. Instead, after each XML element XmlReader has read, I can cast my XmlReader as IXmlLineInfo and get the position of the reader after ReadOuterXml is called. This gives me a LineNumber and LinePosition again.

My question is: how can I convert LineNumber and LinePosition to an index within my original XML string. In the XML example I used, the XmlException would be at Line 17, Position 44. The position after the last valid element would be Line 13, Position 8. The position I am looking for is 440.

TL;DR

How do I get 440 from Line 13, Position 8. I am looking to use Substring on the original XML.

2

There are 2 answers

0
Étienne Laneville On

One simplistic way to do it is to loop to find the position of the 12th instance of System.Environment.NewLine, then add the value of LinePosition to that position:

string previousFragment = "";

while (!_cancellationToken.IsCancellationRequested)
{

    // Read from buffer
    string bufferData = _buffer.Take();
    string localBuffer = previousFragment + bufferData;

    // Clear out previous fragment
    previousFragment = "";

    using (TextReader textReader = new StringReader(localBuffer))
    {
        XmlReader xmlReader;
        xmlReader = XmlReader.Create(textReader, _xmlReaderSettings);

        int lastLine;
        int lastPosition;

        try
        {
            while (xmlReader.Read())
            {
                if (xmlReader.NodeType == XmlNodeType.Element)
                {
                    string validXml = xmlReader.ReadOuterXml;
                }

                IXmlLineInfo lineInfo = xmlReader;
                lastLine = lineInfo.LineNumber;
                lastPosition = lineInfo.LinePosition;
            }
        }
        catch (XmlException ex)
        {
            localBuffer = localBuffer.ReplaceLineEndings();
            int position = 0;

            for (int counter = 1; counter <= lastLine - 1; counter++)
                position = localBuffer.IndexOf(System.Environment.NewLine, position + 1);
            position += lastPosition + 1;

            previousFragment = localBuffer.Substring(position);
        }
    }
}

In this code:

  1. _buffer is a BlockingCollection of String data received from socket.
  2. _xmlReaderSettings is an XmlReaderSettings object with ValidationType set to ValidationType.None and ConformanceLevel set to ConformanceLevel.Fragment.
  3. This code runs as a Task. When new XML arrives, it is added to _buffer.

Is there a better way to do this?

2
jdweng On

I use following code for reading huge xml files. I added exception handlers (code need to be added for exception) that I've never used before. See if this helps

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;


namespace ConsoleApplication2
{
    class Program
    {
        const string URL = "URL";
        static void Main(string[] args)
        {
            XmlReader reader = XmlReader.Create(URL);

            while(!reader.EOF)
            {
                try
                {
                    if(reader.Name != "book")
                    {
                        reader.ReadToFollowing("book");
                    }
                }
                catch(Exception ex)
                {

                }
                try
                {
                    if(!reader.EOF)
                    {
                        XElement book = (XElement)XElement.ReadFrom(reader);
                    }
                }
                catch(Exception ex)
                {

                }
            }
        }
    }
 
}