如何使用 XML::LibXML 解析 xml 字典

马球

如何使用 XML::LibXML 解析 xml 字典

至少我认为它被称为字典。我没有找到描述以下内容的好资源:

  1. 这种 XML 格式
  2. XML::LibXML。Perldoc 甚至没有提到我在 StackOverflow 上找到的 findvalue 或 textContent 。

我有这个xml:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Clippings</key>
    <array>
        <dict>
            <key>Abbreviation</key>
            <string>;adeb</string>
            <key>Creation Date</key>
            <string>2017-04-22T22:02:32Z</string>
            <key>DateLastUsed</key>
            <string>2021-05-21T13:53:20Z</string>
            <key>Label</key>
            <string></string>
            <key>Modification Date</key>
            <string>2018-04-30T18:16:36Z</string>
            <key>Option</key>
            <string></string>
            <key>Plain Text</key>
            <string>print STDERR "(debug ⌘) $_\n" for @⌘;</string>
            <key>SortOrder</key>
            <string>1</string>
            <key>TotUsed</key>
            <integer>579</integer>
            <key>uuidString</key>
            <string>96707AF9-E9C4-4770-A930-B6889C354243</string>
        </dict>
        <dict>
            <key>Abbreviation</key>
            <string>;ahtml</string>
            <key>DateLastUsed</key>
            <string>2016-03-09T15:50:43Z</string>
            <key>Modification Date</key>
            <string>2016-05-18T09:35:49Z</string>
            <key>Option</key>
            <string></string>
            <key>Plain Text</key>
            <string>print "(debug) Enter to continue:\n";
my $debug = &lt;STDIN&gt;;</string>
            <key>Service</key>
            <string></string>
            <key>SortOrder</key>
            <string>130</string>
            <key>TotUsed</key>
            <integer>1</integer>
            <key>uuidString</key>
            <string>E74E6ADC-7BE3-493B-AD1E-32729CAB2B77</string>
        </dict>
    </array>
    <key>sortOrderForThisSet</key>
    <string>0</string>
    <key>theAbbsFilePath</key>
    <string>/Users/user/path/to/file</string>
    <key>unsavedChanges</key>
    <false/>
    <key>uuidCheckDone</key>
    <string>yes</string>
</dict>
</plist>

到目前为止,尝试这个 perl:

#!/usr/bin/perl
use strict;
use warnings;

my $filename = '/Users/user/path/to/file';

use XML::LibXML;

# Parse the XML
my $xml = XML::LibXML->load_xml(location => $filename);

for my $entry ($xml->findnodes('//dict')) {
    my $key = $entry->findvalue('@key');
    my $value = $entry->textContent;

    print "$key = $value";
}

该值将所有值一起打印,而键根本不打印任何内容。

池上

首先,让我们解决您关于“ Perldoc 甚至没有提到 findvalue 或 textContent ”的绝对错误的说法

文档节点记录在XML::LibXML::Document 中以下几乎是该文件所说的第一件事:

它继承了 DOM 规范中指定的XML::LibXML::Node 的所有函数

元素节点记录在XML::LibXML::Element 中以下是该文件所说的第一件事:

该类继承自XML::LibXML::Node此处未列出继承方法的文档。

双方findvaluetextContent记录在XML ::的libxml ::节点


说到实际问题。

设计 XML 模式的人被告知要使用 XML,但显然他们不理解 XML。即使是对像 JSON 这样的任意数据结构进行编码的模式也是糟糕的。设计它的人只是为了将数据在使用前转换为不同的格式。以下是这样做的:

use strict;
use warnings;
use feature qw( say state );

use Carp              qw( croak );
use Types::Serialiser qw( );
use XML::LibXML       qw( );

sub qname {
   my ($node) = @_;
   my $ns   = $node->namespaceURI();
   my $name = $node->nodeName();
   return defined($ns) ? "{$ns}$name" : $name;
}

sub deserialize_array {
   my ($array_node) = @_;
   return [ map { deserialize_value($_) } $array_node->findnodes("*") ];
}

sub deserialize_dict {
   my ($dict_node) = @_;

   my $dict = {};
   my @children = $dict_node->findnodes("*");
   while (@children) {
      my $key_node = shift(@children);
      qname($key_node) eq "key"
         or croak("Expected key");

      my $val_node = shift(@children)
         or croak("Expected value");

      my $key = $key_node->textContent();
      my $val = deserialize_value($val_node);
      $dict->{$key} = $val;
   }

   return $dict;
}

sub deserialize_value {
   my ($val_node) = @_;

   state $deserializers = {
      string  => sub { $_[0]->textContent() },
      integer => sub { 0 + $_[0]->textContent() },
      real    => sub { 0 + $_[0]->textContent() },
      true    => sub { $Types::Serialiser::true },
      false   => sub { $Types::Serialiser::false },
      data    => sub { croak("data values not currently supported"); },
      date    => sub { croak("date values not currently supported"); },
      array   => \&deserialize_array,
      dict    => \&deserialize_dict,
   };

   my $val_type = qname($val_node);
   my $deserializer = $deserializers->{$val_type}
      or croak("Unrecognized value type \"$val_type\"");

   return $deserializer->($val_node);
}

sub deserialize_doc {
   my ($doc) = @_;
   my @children = $doc->documentElement->findnodes("*");
   croak("Root element has too few children") if @children == 0;
   croak("Root element has too many children") if @children > 1;
   return deserialize_value($children[0]);
}

{
   my $doc = XML::LibXML->load_xml( location => $ARGV[0] );
   my $prop_list = deserialize_doc($doc);
   ...
}

如果你转储$prop_list,它看起来像

$prop_list = {
  "Clippings" => [
    {
      "Creation Date" => "2017-04-22T22:02:32Z",
      "TotUsed" => 579,
      "DateLastUsed" => "2021-05-21T13:53:20Z",
      "Abbreviation" => ";adeb",
      "Plain Text" => "print STDERR \"(debug \x{2318}) \$_\\n\" for \@\x{2318};",
      "uuidString" => "96707AF9-E9C4-4770-A930-B6889C354243",
      "Modification Date" => "2018-04-30T18:16:36Z",
      "SortOrder" => 1,
      "Label" => "",
      "Option" => ""
    },
    {
      "Option" => "",
      "Modification Date" => "2016-05-18T09:35:49Z",
      "SortOrder" => 130,
      "uuidString" => "E74E6ADC-7BE3-493B-AD1E-32729CAB2B77",
      "Plain Text" => "print \"(debug) Enter to continue:\\n\";\nmy \$debug = <STDIN>;",
      "Abbreviation" => ";ahtml",
      "Service" => "",
      "TotUsed" => 1,
      "DateLastUsed" => "2016-03-09T15:50:43Z"
    }
  ],
  "unsavedChanges" => $false,
  "sortOrderForThisSet" => 0,
  "uuidCheckDone" => "yes",
  "theAbbsFilePath" => "/Users/user/path/to/file"
};

以上是使用获得的

use Data::Dumper;
local $Data::Dumper::Indent = 1;
local $Data::Dumper::Useqq = 1;
print(Data::Dumper->Dump(
   [ $Types::Serialiser::true, $Types::Serialiser::false, $prop_list ],
   [qw( true false prop_list )],
));

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章