Skip to content

Text node split into multiple BytesText events when entities are present #968

@ImJeremyHe

Description

@ImJeremyHe

I noticed that when parsing XML with quick-xml, a text node can be split into multiple BytesText events if it contains XML entities.

For example:

    #[test]
    fn test_text_node() {
        let xml = r#"<f>_xlfn.DISPIMG(&quot;ID_72CA26DEC13E452487646D77B0F1058F&quot;,1)</f>"#;
        let mut reader = quick_xml::Reader::from_reader(xml.as_bytes());
        let mut buf = Vec::<u8>::new();
        loop {
            match reader.read_event_into(&mut buf) {
                Ok(Event::Eof) => break,
                Ok(e) => {
                    println!("{:?}", e);
                }
                _ => {}
            }
        }
    }

The output is like:

Start(BytesStart { buf: Borrowed("f"), name_len: 1 })
Text(BytesText { content: Borrowed("_xlfn.DISPIMG(") })
GeneralRef(BytesRef { content: Borrowed("quot") })
Text(BytesText { content: Borrowed("ID_72CA26DEC13E452487646D77B0F1058F") })
GeneralRef(BytesRef { content: Borrowed("quot") })
Text(BytesText { content: Borrowed(",1)") })
End(BytesEnd { name: Borrowed("f") })

produces multiple BytesText events instead of a single one containing the entire text.

Is this intentional? If so, is there a recommended way to reconstruct the complete text node from the emitted events? I was using v0.39 which didn't have this issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions