有什么方法可以将Weka j48决策树输出映射为RDF格式？

Mohamed EL Tair 发表于 Dev

穆罕默德·泰尔（Mohamed EL Tair）

我想基于Weka j48决策树输出，使用耶拿创建一个本体。但是，在将该输出输入到Jena之前，需要将其映射为RDF格式。有什么办法做这种映射吗？

编辑1：

映射前j48决策树输出的样本部分：

与决策树输出相对应的RDF的样本部分：

这两个屏幕来自本研究论文（幻灯片4）：

使用自适应本体进行有效的垃圾邮件过滤

马可13

可能没有内置的方法可以做到这一点。

免责声明：我以前从未与Jena和RDF合作。因此，此答案可能不完整或错过了预期的转换要点。

But nevertheless, and first of all, a short rant:

<rant>

_{The snippets that are published in the paper (namely, the output of the Weka classifier and the RDF) are incomplete and obviously inconsistent. The process of the conversion is not described at all. Instead, they only mentioned:}

The challenge we faced was mainly to make J48 classification outputs to RDF and gave it to Jena

(sic!)

_{Now, they solved it somehow. They could have provided their conversion code on a public, open-source repository. This would allow others to provide improvements, it would increase the visibility and verifiability of their approach. But instead, they wasted their time and the time of the readers with screenshots of various websites that serve as page-fillers in a pitiful attempt to squeeze yet another publication out of their approach.}

</rant>

The following is my best-effort approach to provide some of the building blocks that may be necessary for the conversion. It has to be taken with a grain of salt, because I'm not familiar with the underlying methods and libraries. But I hope that it can be considered as "helpful" nevertheless.

The Weka Classifier implementations usually do not provide the structures that they are using for their internal workings. So it is not possible to access the internal tree structure directly. However, there is a method prefix() that returns a string representation of the tree.

The code below contains a very pragmatic (and thus, somewhat brittle...) method that parses this string and builds a tree structure that contains the relevant information. This structure consists of TreeNode objects:

static class TreeNode
{
    String label;
    String attribute;
    String relation;
    String value;
    ...
}

The label is the class label that was used for the classifier. This is only non-null for leaf nodes. For the example from the paper, this would be "0" or "1", indicating whether an email is spam or not.
The attribute is the attribute that a decision is based on. For the example from the paper, such an attribute may be word_freq_remove
The relation and value are the strings representing the decision criteria. These may be "<=" and "0.08" for example.

After such a tree structure has been created, it can be converted into an Apache Jena Model instance. The code contains such a conversion method, but due to my lack of familiarity with RDF, I'm not sure whether it makes sense conceptually. Adjustments may be necessary in order to create the "desired" RDF structure out of this tree structure. But naively, the output looks like it could make sense.

import java.io.FileInputStream;
import java.util.ArrayList;
import java.util.List;

import org.apache.jena.rdf.model.Model;
import org.apache.jena.rdf.model.ModelFactory;
import org.apache.jena.rdf.model.Property;
import org.apache.jena.rdf.model.Resource;
import org.apache.jena.rdf.model.Statement;

import weka.classifiers.trees.J48;
import weka.core.Instances;
import weka.core.converters.ArffLoader;

public class WekaClassifierToRdf
{
    public static void main(String[] args) throws Exception
    {
        String fileName = "./data/iris.arff";
        ArffLoader arffLoader = new ArffLoader();
        arffLoader.setSource(new FileInputStream(fileName));
        Instances instances = arffLoader.getDataSet();
        instances.setClassIndex(4);
        //System.out.println(instances);

        J48 classifier = new J48();
        classifier.buildClassifier(instances);

        System.out.println(classifier);

        String prefixTreeString = classifier.prefix();
        TreeNode node = processPrefixTreeString(prefixTreeString);

        System.out.println("Tree:");
        System.out.println(node.createString());

        Model model = createModel(node);

        System.out.println("Model:");
        model.write(System.out, "RDF/XML-ABBREV");
    }

    private static TreeNode processPrefixTreeString(String inputString)
    {
        String string = inputString.replaceAll("\\n", "");

        //System.out.println("Input is " + string);

        int open = string.indexOf("[");
        int close = string.lastIndexOf("]");
        String part = string.substring(open + 1, close);

        //System.out.println("Part " + part);

        int colon = part.indexOf(":");
        if (colon == -1)
        {
            TreeNode node = new TreeNode();

            int openAfterLabel = part.lastIndexOf("(");
            String label = part.substring(0, openAfterLabel).trim();
            node.label = label;
            return node;
        }

        String attributeName = part.substring(0, colon);

        //System.out.println("attributeName " + attributeName);

        int comma = part.indexOf(",", colon);

        int leftOpen = part.indexOf("[", comma);

        String leftCondition = part.substring(colon + 1, comma).trim();
        String rightCondition = part.substring(comma + 1, leftOpen).trim();

        int leftSpace = leftCondition.indexOf(" ");
        String leftRelation = leftCondition.substring(0, leftSpace).trim();
        String leftValue = leftCondition.substring(leftSpace + 1).trim();

        int rightSpace = rightCondition.indexOf(" ");
        String rightRelation = rightCondition.substring(0, rightSpace).trim();
        String rightValue = rightCondition.substring(rightSpace + 1).trim();

        //System.out.println("leftCondition " + leftCondition);
        //System.out.println("rightCondition " + rightCondition);

        int leftClose = findClosing(part, leftOpen + 1);
        String left = part.substring(leftOpen, leftClose + 1);

        //System.out.println("left " + left);

        int rightOpen = part.indexOf("[", leftClose);
        int rightClose = findClosing(part, rightOpen + 1);
        String right = part.substring(rightOpen, rightClose + 1);

        //System.out.println("right " + right);

        TreeNode leftNode = processPrefixTreeString(left);
        leftNode.relation = leftRelation;
        leftNode.value = leftValue;

        TreeNode rightNode = processPrefixTreeString(right);
        rightNode.relation = rightRelation;
        rightNode.value = rightValue;

        TreeNode result = new TreeNode();
        result.attribute = attributeName;
        result.children.add(leftNode);
        result.children.add(rightNode);
        return result;

    }

    private static int findClosing(String string, int startIndex)
    {
        int stack = 0;
        for (int i=startIndex; i<string.length(); i++)
        {
            char c = string.charAt(i);
            if (c == '[')
            {
                stack++;
            }
            if (c == ']')
            {
                if (stack == 0)
                {
                    return i;
                }
                stack--;
            }
        }
        return -1;
    }

    static class TreeNode
    {
        String label;
        String attribute;
        String relation;
        String value;
        List<TreeNode> children = new ArrayList<TreeNode>();

        String createString()
        {
            StringBuilder sb = new StringBuilder();
            createString("", sb);
            return sb.toString();
        }

        private void createString(String indent, StringBuilder sb)
        {
            if (children.isEmpty())
            {
                sb.append(indent + label);
            }
            sb.append("\n");
            for (TreeNode child : children)
            {
                sb.append(indent + "if " + attribute + " " + child.relation
                    + " " + child.value + ": ");
                child.createString(indent + "  ", sb);
            }
        }

        @Override
        public String toString()
        {
            return "TreeNode [label=" + label + ", attribute=" + attribute
                + ", relation=" + relation + ", value=" + value + "]";
        }
    }    

    private static String createPropertyString(TreeNode node)
    {
        if ("<".equals(node.relation))
        {
            return "lt_" + node.value;
        }
        if ("<=".equals(node.relation))
        {
            return "lte_" + node.value;
        }
        if (">".equals(node.relation))
        {
            return "gt_" + node.value;
        }
        if (">=".equals(node.relation))
        {
            return "gte_" + node.value;
        }
        System.err.println("Unknown relation: " + node.relation);
        return "UNKNOWN";
    }    

    static Model createModel(TreeNode node)
    {
        Model model = ModelFactory.createDefaultModel();

        String baseUri = "http://www.example.com/example#";
        model.createResource(baseUri);
        model.setNsPrefix("base", baseUri);
        populateModel(model, baseUri, node, node.attribute);
        return model;
    }

    private static void populateModel(Model model, String baseUri,
        TreeNode node, String resourceName)
    {
        //System.out.println("Populate with " + resourceName);

        for (TreeNode child : node.children)
        {
            if (child.label != null)
            {
                Resource resource =
                    model.createResource(baseUri + resourceName);
                String propertyString = createPropertyString(child);
                Property property =
                    model.createProperty(baseUri, propertyString);
                Statement statement = model.createLiteralStatement(resource,
                    property, child.label);
                model.add(statement);
            }
            else
            {
                Resource resource =
                    model.createResource(baseUri + resourceName);
                String propertyString = createPropertyString(child);
                Property property =
                    model.createProperty(baseUri, propertyString);

                String nextResourceName = resourceName + "_" + child.attribute;
                Resource childResource =
                    model.createResource(baseUri + nextResourceName);
                Statement statement =
                    model.createStatement(resource, property, childResource);
                model.add(statement);
            }
        }
        for (TreeNode child : node.children)
        {
            String nextResourceName = resourceName + "_" + child.attribute;
            populateModel(model, baseUri, child, nextResourceName);
        }
    }

}

该程序从ARFF文件中解析著名的Iris数据集，运行J48分类器，构建树结构，并生成并打印RDF模型。输出显示在这里：

由Weka打印的分类器：

J48 pruned tree
------------------

petalwidth <= 0.6: Iris-setosa (50.0)
petalwidth > 0.6
|   petalwidth <= 1.7
|   |   petallength <= 4.9: Iris-versicolor (48.0/1.0)
|   |   petallength > 4.9
|   |   |   petalwidth <= 1.5: Iris-virginica (3.0)
|   |   |   petalwidth > 1.5: Iris-versicolor (3.0/1.0)
|   petalwidth > 1.7: Iris-virginica (46.0/1.0)

Number of Leaves  :     5

Size of the tree :     9

内部构建的树结构的字符串表示形式：

Tree:

if petalwidth <= 0.6:   Iris-setosa
if petalwidth > 0.6: 
  if petalwidth <= 1.7: 
    if petallength <= 4.9:       Iris-versicolor
    if petallength > 4.9: 
      if petalwidth <= 1.5:         Iris-virginica
      if petalwidth > 1.5:         Iris-versicolor
  if petalwidth > 1.7:     Iris-virginica

生成的RDF模型：

Model:
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:base="http://www.example.com/example#">
  <rdf:Description rdf:about="http://www.example.com/example#petalwidth">
    <base:gt_0.6>
      <rdf:Description rdf:about="http://www.example.com/example#petalwidth_petalwidth">
        <base:gt_1.7>Iris-virginica</base:gt_1.7>
        <base:lte_1.7>
          <rdf:Description rdf:about="http://www.example.com/example#petalwidth_petalwidth_petallength">
            <base:gt_4.9>
              <rdf:Description rdf:about="http://www.example.com/example#petalwidth_petalwidth_petallength_petalwidth">
                <base:gt_1.5>Iris-versicolor</base:gt_1.5>
                <base:lte_1.5>Iris-virginica</base:lte_1.5>
              </rdf:Description>
            </base:gt_4.9>
            <base:lte_4.9>Iris-versicolor</base:lte_4.9>
          </rdf:Description>
        </base:lte_1.7>
      </rdf:Description>
    </base:gt_0.6>
    <base:lte_0.6>Iris-setosa</base:lte_0.6>
  </rdf:Description>
</rdf:RDF>

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-11-23

我来说两句

0 条评论

登录后参与评论

上一篇：在Woocommerce管理员电子邮件通知中显示产品自定义字段值

TOP 榜单

文章

有什么方法可以将Weka j48决策树输出映射为RDF格式？

有什么方法可以将Weka j48决策树输出映射为RDF格式？

UITableView的项目向下滚动后更改颜色，然后快速备份

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Mac OS X更新后的GRUB 2问题

验证REST API参数

Java Eclipse中的错误13，如何解决？

带有错误“ where”条件的查询如何返回结果？

ggplot：对齐多个分面图-所有大小不同的分面

尝试反复更改屏幕上按钮的位置 - kotlin android studio

如何从视图一次更新多行（ASP.NET - Core）

计算数据帧中每行的NA

蓝屏死机没有修复解决方案

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

离子动态工具栏背景色

VB.net将2条特定行导出到DataGridView

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

在Windows 7中无法删除文件（2）

python中的boto3文件上传

当我尝试下载 StanfordNLP en 模型时，出现错误

Node.js中未捕获的异常错误，发生调用