java-如何在MySQL中获取文件路径并从目录获取后续文件？

夜魔侠：

我在Java中有一个方法需要在MySQL的表中进行扫描以查找文件路径。

这是一个示例表文件队列：

 UniqueID   FilePath                 Status     
 1          C:\Folder1\abc.pdf       Active
 2          C:\Folder1\def.pdf       Active
 3          C:\Folder1\efg.pdf       Error

我想浏览表格并使用查找文件Status= Active。然后，我将获取文件路径并从该位置定位实际文件，并开始对这些文件进行一些处理（提取文本）。

我是Java的新手，到目前为止，我正在这样做，如下所示：

public void doScan_DB() throws Exception{

        Properties props=new Properties();


        InputStream in = getClass().getResourceAsStream("/db.properties");

        props.load(in);
        in.close();



        String driver = props.getProperty("jdbc.driver");
        if(driver!=null){
            Class.forName(driver);

        }

        String url=props.getProperty("jdbc.url");
        String username=props.getProperty("jdbc.username");
        String password=props.getProperty("jdbc.password");

        Connection con = DriverManager.getConnection(url,username,password);
         Statement statement = con.createStatement();
         ResultSet rs=statement.executeQuery("select * from filequeue where Status='Active'");

    while(rs.next()){

      // grab those files and call index()

    }

    }




}

从这里开始，如何继续捕获文件，然后调用索引函数将文本提取到文件中？

另外，请告诉我我的做法是否错误。

编辑：包括我的其他功能以提取PDF文本：

 public void doScan() throws Exception{


        File folder = new File("D:\\PDF1");
        File[] listOfFiles = folder.listFiles();

        for (File file : listOfFiles) {
            if (file.isFile()) {
                //  HashSet<String> uniqueWords = new HashSet<>();
                ArrayList<String> list
                        = new ArrayList<String>();
                String path = "D:\\PDF1\\" + file.getName();
                try (PDDocument document = PDDocument.load(new File(path))) {

                    if (!document.isEncrypted()) {

                        PDFTextStripper tStripper = new PDFTextStripper();
                        String pdfFileInText = tStripper.getText(document);
                        String lines[] = pdfFileInText.split("\\r?\\n");
                        for (String line : lines) {
                            String[] words = line.split(" ");
                            // words.replaceAll("([\\W]+$)|(^[\\W]+)", ""));


                            for (String word : words) {
                                // check if one or more special characters at end of string then remove OR
                                // check special characters in beginning of the string then remove
                                // uniqueWords.add(word.replaceAll("([\\W]+$)|(^[\\W]+)", ""));
                                list.add(word.replaceAll("([\\W]+$)|(^[\\W]+)", ""));
                                // uniqueWords.add(word.replaceAll("([\\W]+$)|(^[\\W]+)", ""));
                            }

                        }


                    }
                } catch (IOException e) {
                    System.err.println("Exception while trying to read pdf document - " + e);
                }

                String[] words1 =list.toArray(new String[list.size()]);
                // String[] words2 =uniqueWords.toArray(new String[uniqueWords.size()]);

                // MysqlAccessIndex connection = new MysqlAccessIndex();



                index(words1,path);




                System.out.println("Completed");

            }
        }

阿育：

您可以通过以下方式获取路径和文件

    while(rs.next()){

        String path= rs.getString(2);
    // Create a PdfDocument instance
    PdfDocument doc = new PdfDocument();
    try {
      // Load an existing document
      doc.load(path);
      // Get page count and display it on console output
      System.out.println(
        "Number of pages in sample_doc1.pdf is " +
        doc.getPageCount());
      // Close document
      doc.close();      
    } catch (IOException | PdfException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }
}

您将需要其他JARS，这将为您提供PDF的预定义方法。

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-09-14

我来说两句

0 条评论

登录后参与评论

java-如何在MySQL中获取文件路径并从目录获取后续文件？

java-如何在MySQL中获取文件路径并从目录获取后续文件？

Linux的官方Adobe Flash存储库是否已过时？

如何使用HttpClient的在使用SSL证书，无论多么“糟糕”是

错误：“ javac”未被识别为内部或外部命令，

Modbus Python施耐德PM5300

为什么Object.hashCode（）不遵循Java代码约定

如何正确比较 scala.xml 节点？

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

在令牌内联程序集错误之前预期为 ')'

数据表中有多个子行，asp.net核心中来自sql server的数据

VBA 自动化错误：-2147221080 (800401a8)

错误TS2365：运算符'！=='无法应用于类型'“（”'和'“）”'

如何在JavaScript中获取数组的第n个元素？

检查嵌套列表中的长度是否相同

如何将sklearn.naive_bayes与（多个）分类功能一起使用？

ValueError：尝试同时迭代两个列表时，解包的值太多（预期为 2）

ES5的代理替代

在同一Pushwoosh应用程序上Pushwoosh多个捆绑ID

如何监视应用程序而不是单个进程的CPU使用率？

如何检查字符串输入的格式

解决类Koin的实例时出错

如何自动选择正确的键盘布局？-仅具有一个键盘布局