有什么方法可以将Weka j48决策树输出映射为RDF格式?

穆罕默德·泰尔(Mohamed EL Tair)

我想基于Weka j48决策树输出,使用耶拿创建一个本体。但是,在将该输出输入到Jena之前,需要将其映射为RDF格式。有什么办法做这种映射吗?

编辑1:

映射前j48决策树输出的样本部分: 之前

与决策树输出相对应的RDF的样本部分: 后

这两个屏幕来自本研究论文(幻灯片4):

使用自适应本体进行有效的垃圾邮件过滤

马可13

可能没有内置的方法可以做到这一点。

免责声明:我以前从未与Jena和RDF合作。因此,此答案可能不完整或错过了预期的转换要点。

But nevertheless, and first of all, a short rant:


<rant>

The snippets that are published in the paper (namely, the output of the Weka classifier and the RDF) are incomplete and obviously inconsistent. The process of the conversion is not described at all. Instead, they only mentioned:

The challenge we faced was mainly to make J48 classification outputs to RDF and gave it to Jena

(sic!)

Now, they solved it somehow. They could have provided their conversion code on a public, open-source repository. This would allow others to provide improvements, it would increase the visibility and verifiability of their approach. But instead, they wasted their time and the time of the readers with screenshots of various websites that serve as page-fillers in a pitiful attempt to squeeze yet another publication out of their approach.

</rant>


The following is my best-effort approach to provide some of the building blocks that may be necessary for the conversion. It has to be taken with a grain of salt, because I'm not familiar with the underlying methods and libraries. But I hope that it can be considered as "helpful" nevertheless.

The Weka Classifier implementations usually do not provide the structures that they are using for their internal workings. So it is not possible to access the internal tree structure directly. However, there is a method prefix() that returns a string representation of the tree.

The code below contains a very pragmatic (and thus, somewhat brittle...) method that parses this string and builds a tree structure that contains the relevant information. This structure consists of TreeNode objects:

static class TreeNode
{
    String label;
    String attribute;
    String relation;
    String value;
    ...
}
  • The label is the class label that was used for the classifier. This is only non-null for leaf nodes. For the example from the paper, this would be "0" or "1", indicating whether an email is spam or not.

  • The attribute is the attribute that a decision is based on. For the example from the paper, such an attribute may be word_freq_remove

  • The relation and value are the strings representing the decision criteria. These may be "<=" and "0.08" for example.

After such a tree structure has been created, it can be converted into an Apache Jena Model instance. The code contains such a conversion method, but due to my lack of familiarity with RDF, I'm not sure whether it makes sense conceptually. Adjustments may be necessary in order to create the "desired" RDF structure out of this tree structure. But naively, the output looks like it could make sense.

import java.io.FileInputStream;
import java.util.ArrayList;
import java.util.List;

import org.apache.jena.rdf.model.Model;
import org.apache.jena.rdf.model.ModelFactory;
import org.apache.jena.rdf.model.Property;
import org.apache.jena.rdf.model.Resource;
import org.apache.jena.rdf.model.Statement;

import weka.classifiers.trees.J48;
import weka.core.Instances;
import weka.core.converters.ArffLoader;

public class WekaClassifierToRdf
{
    public static void main(String[] args) throws Exception
    {
        String fileName = "./data/iris.arff";
        ArffLoader arffLoader = new ArffLoader();
        arffLoader.setSource(new FileInputStream(fileName));
        Instances instances = arffLoader.getDataSet();
        instances.setClassIndex(4);
        //System.out.println(instances);

        J48 classifier = new J48();
        classifier.buildClassifier(instances);

        System.out.println(classifier);

        String prefixTreeString = classifier.prefix();
        TreeNode node = processPrefixTreeString(prefixTreeString);

        System.out.println("Tree:");
        System.out.println(node.createString());

        Model model = createModel(node);

        System.out.println("Model:");
        model.write(System.out, "RDF/XML-ABBREV");
    }

    private static TreeNode processPrefixTreeString(String inputString)
    {
        String string = inputString.replaceAll("\\n", "");

        //System.out.println("Input is " + string);

        int open = string.indexOf("[");
        int close = string.lastIndexOf("]");
        String part = string.substring(open + 1, close);

        //System.out.println("Part " + part);

        int colon = part.indexOf(":");
        if (colon == -1)
        {
            TreeNode node = new TreeNode();

            int openAfterLabel = part.lastIndexOf("(");
            String label = part.substring(0, openAfterLabel).trim();
            node.label = label;
            return node;
        }

        String attributeName = part.substring(0, colon);

        //System.out.println("attributeName " + attributeName);

        int comma = part.indexOf(",", colon);

        int leftOpen = part.indexOf("[", comma);

        String leftCondition = part.substring(colon + 1, comma).trim();
        String rightCondition = part.substring(comma + 1, leftOpen).trim();

        int leftSpace = leftCondition.indexOf(" ");
        String leftRelation = leftCondition.substring(0, leftSpace).trim();
        String leftValue = leftCondition.substring(leftSpace + 1).trim();

        int rightSpace = rightCondition.indexOf(" ");
        String rightRelation = rightCondition.substring(0, rightSpace).trim();
        String rightValue = rightCondition.substring(rightSpace + 1).trim();

        //System.out.println("leftCondition " + leftCondition);
        //System.out.println("rightCondition " + rightCondition);

        int leftClose = findClosing(part, leftOpen + 1);
        String left = part.substring(leftOpen, leftClose + 1);

        //System.out.println("left " + left);

        int rightOpen = part.indexOf("[", leftClose);
        int rightClose = findClosing(part, rightOpen + 1);
        String right = part.substring(rightOpen, rightClose + 1);

        //System.out.println("right " + right);

        TreeNode leftNode = processPrefixTreeString(left);
        leftNode.relation = leftRelation;
        leftNode.value = leftValue;

        TreeNode rightNode = processPrefixTreeString(right);
        rightNode.relation = rightRelation;
        rightNode.value = rightValue;

        TreeNode result = new TreeNode();
        result.attribute = attributeName;
        result.children.add(leftNode);
        result.children.add(rightNode);
        return result;

    }

    private static int findClosing(String string, int startIndex)
    {
        int stack = 0;
        for (int i=startIndex; i<string.length(); i++)
        {
            char c = string.charAt(i);
            if (c == '[')
            {
                stack++;
            }
            if (c == ']')
            {
                if (stack == 0)
                {
                    return i;
                }
                stack--;
            }
        }
        return -1;
    }

    static class TreeNode
    {
        String label;
        String attribute;
        String relation;
        String value;
        List<TreeNode> children = new ArrayList<TreeNode>();

        String createString()
        {
            StringBuilder sb = new StringBuilder();
            createString("", sb);
            return sb.toString();
        }

        private void createString(String indent, StringBuilder sb)
        {
            if (children.isEmpty())
            {
                sb.append(indent + label);
            }
            sb.append("\n");
            for (TreeNode child : children)
            {
                sb.append(indent + "if " + attribute + " " + child.relation
                    + " " + child.value + ": ");
                child.createString(indent + "  ", sb);
            }
        }

        @Override
        public String toString()
        {
            return "TreeNode [label=" + label + ", attribute=" + attribute
                + ", relation=" + relation + ", value=" + value + "]";
        }
    }    

    private static String createPropertyString(TreeNode node)
    {
        if ("<".equals(node.relation))
        {
            return "lt_" + node.value;
        }
        if ("<=".equals(node.relation))
        {
            return "lte_" + node.value;
        }
        if (">".equals(node.relation))
        {
            return "gt_" + node.value;
        }
        if (">=".equals(node.relation))
        {
            return "gte_" + node.value;
        }
        System.err.println("Unknown relation: " + node.relation);
        return "UNKNOWN";
    }    

    static Model createModel(TreeNode node)
    {
        Model model = ModelFactory.createDefaultModel();

        String baseUri = "http://www.example.com/example#";
        model.createResource(baseUri);
        model.setNsPrefix("base", baseUri);
        populateModel(model, baseUri, node, node.attribute);
        return model;
    }

    private static void populateModel(Model model, String baseUri,
        TreeNode node, String resourceName)
    {
        //System.out.println("Populate with " + resourceName);

        for (TreeNode child : node.children)
        {
            if (child.label != null)
            {
                Resource resource =
                    model.createResource(baseUri + resourceName);
                String propertyString = createPropertyString(child);
                Property property =
                    model.createProperty(baseUri, propertyString);
                Statement statement = model.createLiteralStatement(resource,
                    property, child.label);
                model.add(statement);
            }
            else
            {
                Resource resource =
                    model.createResource(baseUri + resourceName);
                String propertyString = createPropertyString(child);
                Property property =
                    model.createProperty(baseUri, propertyString);

                String nextResourceName = resourceName + "_" + child.attribute;
                Resource childResource =
                    model.createResource(baseUri + nextResourceName);
                Statement statement =
                    model.createStatement(resource, property, childResource);
                model.add(statement);
            }
        }
        for (TreeNode child : node.children)
        {
            String nextResourceName = resourceName + "_" + child.attribute;
            populateModel(model, baseUri, child, nextResourceName);
        }
    }

}

该程序从ARFF文件中解析著名的Iris数据集,运行J48分类器,构建树结构,并生成并打印RDF模型。输出显示在这里:

由Weka打印的分类器:

J48 pruned tree
------------------

petalwidth <= 0.6: Iris-setosa (50.0)
petalwidth > 0.6
|   petalwidth <= 1.7
|   |   petallength <= 4.9: Iris-versicolor (48.0/1.0)
|   |   petallength > 4.9
|   |   |   petalwidth <= 1.5: Iris-virginica (3.0)
|   |   |   petalwidth > 1.5: Iris-versicolor (3.0/1.0)
|   petalwidth > 1.7: Iris-virginica (46.0/1.0)

Number of Leaves  :     5

Size of the tree :     9

内部构建的树结构的字符串表示形式:

Tree:

if petalwidth <= 0.6:   Iris-setosa
if petalwidth > 0.6: 
  if petalwidth <= 1.7: 
    if petallength <= 4.9:       Iris-versicolor
    if petallength > 4.9: 
      if petalwidth <= 1.5:         Iris-virginica
      if petalwidth > 1.5:         Iris-versicolor
  if petalwidth > 1.7:     Iris-virginica

生成的RDF模型:

Model:
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:base="http://www.example.com/example#">
  <rdf:Description rdf:about="http://www.example.com/example#petalwidth">
    <base:gt_0.6>
      <rdf:Description rdf:about="http://www.example.com/example#petalwidth_petalwidth">
        <base:gt_1.7>Iris-virginica</base:gt_1.7>
        <base:lte_1.7>
          <rdf:Description rdf:about="http://www.example.com/example#petalwidth_petalwidth_petallength">
            <base:gt_4.9>
              <rdf:Description rdf:about="http://www.example.com/example#petalwidth_petalwidth_petallength_petalwidth">
                <base:gt_1.5>Iris-versicolor</base:gt_1.5>
                <base:lte_1.5>Iris-virginica</base:lte_1.5>
              </rdf:Description>
            </base:gt_4.9>
            <base:lte_4.9>Iris-versicolor</base:lte_4.9>
          </rdf:Description>
        </base:lte_1.7>
      </rdf:Description>
    </base:gt_0.6>
    <base:lte_0.6>Iris-setosa</base:lte_0.6>
  </rdf:Description>
</rdf:RDF>

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章

Weka J-48决策树未完成

Weka J48分类不跟随树

以 Weka J48 格式保存 Scikit-learn RandomForestClassifier

在Python中从一个决策树(J48)分类转换为整体

决策树总是将类标签预测为是

为什么决策树无法在WEKA中正常工作?

在WEKA j48算法中useLaplace参数有什么作用?

有什么方法可以在决策树的每个叶子下获取样本?

Weka决策树Java列表

有什么方法可以将轮换设置为layout参数

scikit-learn:将多输出决策树转换为CoreML模型

将决策树的输出保存到文本文件中

决策树的深度取决于什么?

在R中使用rpart Package,为决策树模型选择所有变量时出错

Weka:可以输出什么作为源的限制?

训练有素的“决策树”与“决策路径”

有什么方法可以将SQL Server日期时间转换为Google表格可以读取的格式?

将Adaboost(自适应增强)方法与决策树结合使用的示例是什么

如何处理Weka J48的名义数据

如何使用带有信息增益和随机属性选择的j48 weka进行分类?

有什么方法可以将Spark的Dataset.show()方法的输出作为字符串获取?

有没有一种方法可以将决策树与分类变量一起使用而无需一键编码?

将决策树节点映射到单热向量的最快方法是什么?

有什么方法可以可视化决策树(sklearn),其中分类特征从一个热编码特征中合并而来?

有什么方法可以将窗口映射到多个X服务器?

有什么方法可以将 Sharepoint 2010 用户定义的列映射到 sql 中的特定列?

SystemML决策树-“无法将节点数为1.0的样本数量减少到匹配10个”

是否可以在scikit-learn中打印决策树?

有什么方法可以将表格列表与数据透视列表格式相匹配?