Hello World Saxon with Java

Thufir

Using the JAR files installed through apt for Saxon-HE and tagsoup parsing html is a one-liner as:

thufir@dur:~/saxon$ 
thufir@dur:~/saxon$ java -cp /usr/share/java/Saxon-HE-9.8.0.14.jar:/usr/share/java/tagsoup-1.2.1.jar net.sf.saxon.Query -x:org.ccil.cowan.tagsoup.Parser -qs:doc\(\'http://books.toscrape.com/\'\) 
<?xml version="1.0" encoding="UTF-8"?><!--[if lt IE 7]>      <html lang="en-us" class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]--><!--[if IE 7]>         <html lang="en-us" class="no-js lt-ie9 lt-ie8"> <![endif]--><!--[if IE 8]>         <html lang="en-us" class="no-js lt-ie9"> <![endif]--><!--[if gt IE 8]><!--><html xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml" class="no-js" lang="en-us"><!--<![endif]--><head><title>
    All products | Books to Scrape - Sandbox
..        
        <!-- Version: N/A -->

thufir@dur:~/saxon$ 
thufir@dur:~/saxon$ 

How would I do that from Java? In particular, what imports are required from Saxon for this execution? Perhaps using Saxon and the JAXP interface?

also:

http://codingwithpassion.blogspot.com/2011/03/saxon-xslt-java-example.html

Michael Kay

You will find many simple examples of invoking transformations using Saxon from Java in the saxon-resources download available on both the saxonica.com and sourceforge.net web sites.

It's difficult to know exactly what you want here, because your command line example isn't using Saxon to do anything useful other than invoking the TagSoup parser and serializing the result. The simplest way to do that from Java is with a JAXP identity transformation, which runs just as well with the built-in XSLT transformer in the JDK as with Saxon:

TransformerFactory factory = TransformerFactory.newInstance();
XMLReader xmlReader = XMLReaderFactory.createXMLReader("org.ccil.cowan.tagsoup.Parser");
Source input = new SAXSource(xmlReader, new InputSource("http://books.toscrape.com/"));
Result output = new StreamResult(System.out);
factory.newTransformer().transform(input, output);

If you want to add some XSLT or XQuery processing then of course that's perfectly possible (I would always use the s9api API for Saxon, but you can also use JAXP or XQJ), but the details depend on exactly what you want to do.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related