如何使用java.xml.xpath解析提供的XML？

推进器

我正在尝试解析此XML：

<?xml version="1.0" encoding="UTF-8"?>
<veranstaltungen>
  <veranstaltung id="201611211500#25045271">
    <titel>Mal- und Zeichen-Treff</titel>
    <start>2016-11-21 15:00:00</start>
    <veranstaltungsort id="20011507">
      <name>Freizeitclub - ganz unbehindert </name>
      <anschrift>Macht los e.V.
Lipezker Straße 48
03048 Cottbus
</anschrift>
      <telefon>xxxx xxxx </telefon>
      <fax>0355 xxxx</fax>
[...]
</veranstaltungen>

如您所见，其中一些文本包含空格甚至换行符。我遇到了来自节点的文本的问题anschrift，因为我需要在数据库中找到正确的位置数据。问题是，返回的字符串是：

Macht los e.V.Lipezker Straße 4803048 Cottbus

代替：

Macht los e.V. Lipezker Straße 48 03048 Cottbus

我知道解析的正确方法应该是正确的，normalie-space()但是我不能完全弄清楚该怎么做。我尝试了这个：

// Does not work; afaik because xpath 1 normalizes just the first node
xPath.compile("normalize-space(veranstaltungen/veranstaltung[position()=1]/veranstaltungsort/anschrift/text()"));

// Does not work
xPath.compile("veranstaltungen/veranstaltung[position()=1]/veranstaltungsort[normalize-space(anschrift/text())]"));

我还尝试了此处给出的解决方案：xpath归一化空间以返回规范化字符串的序列

xPathExpression = xPath.compile("veranstaltungen/veranstaltung[position()=1]/veranstaltungsort");
NodeList result = (NodeList) xPathExpression.evaluate(doc, XPathConstants.NODESET);

String normalize = "normalize-space(.)";
xPathExpression = xPath.compile(normalize);

int length = result.getLength();
for (int i = 0; i < length; i++) {
    System.out.println(xPathExpression.evaluate(result.item(i), XPathConstants.STRING));
}

System.out打印：

Macht los e.V.Lipezker Straße 4803048 Cottbus

我究竟做错了什么？

更新

我已经有一种解决方法，但这不能成为解决方案。以下几行代码显示了如何将HTTPResponse中的String放在一起：

try (BufferedReader reader = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), Charset.forName(charset)))) {
  final StringBuilder stringBuilder = new StringBuilder();
  String              line;

  while ((line = reader.readLine()) != null) {
    // stringBuilder.append(line);
    // WORKAROUND: Add a space after each line
    stringBuilder.append(line).append(" ");
  }

  // Work with the red lines
}

我宁愿有一个坚实的解决方案。

推进器

多亏了Markus的帮助，我才得以解决该问题。原因是BufferedReader的readLine（）方法丢弃了换行符。以下代码片段对我有用（也许可以改进）：

public Document getDocument() throws IOException, ParserConfigurationException, SAXException {

  final HttpResponse response = getResponse(); // returns a HttpResonse
  final HttpEntity   entity   = response.getEntity();
  final Charset      charset  = ContentType.getOrDefault(entity).getCharset();  

  // Not 100% sure if I have to close the InputStreamReader. But I guess so.
  try (InputStreamReader isr = new InputStreamReader(entity.getContent(), charset == null ? Charset.forName("UTF-8") : charset)) {
    return documentBuilderFactory.newDocumentBuilder().parse(new InputSource(isr));
  }
}

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-05-19

我来说两句

0 条评论

登录后参与评论

上一篇：Azure DocumentDB受限制的请求

如何使用java.xml.xpath解析提供的XML？

如何使用java.xml.xpath解析提供的XML？

构建类似于Jarvis的本地语言应用程序

在 Avalonia 中是否有带有柱子的 TreeView 或类似的东西？

Qt Creator Windows 10 - “使用 jom 而不是 nmake”不起作用

SQL Server中的非确定性数据类型

使用next.js时出现服务器错误，错误：找不到react-redux上下文值；请确保组件包装在<Provider>中

Swift 2.1-对单个单元格使用UITableView

Hashchange事件侦听器在将事件处理程序附加到事件之前进行侦听

HttpClient中的角度变化检测

如何了解DFT结果

错误：找不到存根。请确保已调用spring-cloud-contract：convert

Embers js中的更改侦听器上的组合框

在Wagtail管理员中，如何禁用图像和文档的摘要项？

如何避免每次重新编译所有文件？

Java中的循环开关案例

ng升级性能注意事项

Swift中的指针替代品？

如何使用geoChoroplethChart和dc.js在Mapchart的路径上添加标签或自定义值？

使用分隔符将成对相邻的数组元素相互连接

在同一Pushwoosh应用程序上Pushwoosh多个捆绑ID

ggplot：对齐多个分面图-所有大小不同的分面

完全禁用暂停（在内核级别？-必须与使用的DE和登录状态无关！）