如何使用php从pdf提取特定文本

unlucky_guy

我需要在mysql表中存储应聘者的姓名及其ID，我已经使用pdfparser提取了文本

<?php

// Include Composer autoloader if not already done.
include 'vendor\autoload.php';

// Parse pdf file and build necessary objects.
$parser = new  \Smalot\PdfParser\Parser();
$pdf    = $parser->parseFile('C:\Desktop\Data\ApplicationForm.pdf');

$text = $pdf->getText();
echo $text;

?>

现在它只是显示提取的文本，现在我需要从填充有提取文本的页面（运行上述程序时出现的页面）中提取名称和ID，在单击查看页面源时，我找到了我需要的ID

出现在：-

tr 1115 * 15 td.line-number 31 * 15和td.line-content：1084 * 15，行号值= 12

名称存在于：-

tr 1115 * 15 td.line-number 31 * 15和td.line-content：1084 * 15，行号值= 13

由于我不知道如何获取此信息，我现在迷路了。请帮助我。

我有多个pdf，并且我需要的所有信息都在同一位置（在同一位置，我的意思是行号值= 13，tr 1115 * 15 td.line-number 31 * 15和td.line-content：1084 * 15，）我只想找到解决这个问题的方法，请帮帮我。

如果您有任何疑问，我会澄清，如果问题不清楚，我会改善。

unlucky_guy

我需要从pdf中提取候选人的姓名及其ID，因此在使用pdfparser之后，我提取了文本并使用php下载了html页面

<?php
$filename = 'filename.txt';
header('Content-disposition: attachment; filename=' . $filename);
header('Content-type: text');
// ... the rest of your file
?>
<?php

// Include Composer autoloader if not already done.
include 'C:\Users\Downloads\pdfparser-master (1)\pdfparser-master\vendor\autoload.php';

// Parse pdf file and build necessary objects.
$parser = new  \Smalot\PdfParser\Parser();
$pdf    = $parser->parseFile('C:\Users\Desktop\Data\ApplicationForm (3).pdf');

$text = $pdf->getText();
echo $text;


?>

我这样做是因为我需要的信息位于视图源页面的第12行和第13行，而这正是我需要的所有pdf的信息，因此在以文本文件形式下载html页面之后，我使用了下面的代码来提取我从下载的文件中需要的文本并将其存储在数据库中

<?php

$source = file("filename.txt");

$number =$source[12];
$name = $source[13];
$gslink = "https://www.google.co.in/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=google+scholar+".$name;        
$dblplink = "https://www.google.co.in/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=dblp+".$name ;
$servername = "127.0.0.1";
$username = "root";
$password = "";
$dbname = "mydb";
// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);
// Check connection
if ($conn->connect_error) {
    die("Connection failed: " . $conn->connect_error);
} 
$sql = "INSERT INTO faculty (candidate_no,candidate_name,gs_link,dblp_link)VALUES('$number','$name','$gslink','$dblplink')";
if ($conn->query($sql) === TRUE) {
    echo "New record created successfully";
} else {
    echo "Error: " . $sql . "<br>" . $conn->error;
}

$conn->close();
?>

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-04-30

我来说两句

0 条评论

登录后参与评论

上一篇：Heroku拒绝推送Meteor应用

TOP 榜单

文章

如何使用php从pdf提取特定文本

如何使用php从pdf提取特定文本

隐藏发件人没有短信PHP

Hashchange事件侦听器在将事件处理程序附加到事件之前进行侦听

在浏览器中请求URL时会发生什么？

flask-admin 如何自定义删除按钮

材质UI垂直滑块。如何改变在垂直材料UI滑块导轨的厚度（反应）

用日期数据透视表和日期顺序查询

Jqgrid：多级别组摘要

java io ioexception无法解析服务器地址解析器的响应

Swift如何使用Base64Url编码JWT标头和有效负载之类的json对象

sshd AllowGroups组未授予访问权限

jQuery无限滚动固定div中的滚动

android 背部按下

Flexbox CSS 对齐属性环境惰性？

为什么随机森林中的平均降低基尼系数取决于人口规模？

ClickHouse 创建临时表

为什么PlusShare.Builder setRecipients方法不起作用？

如何在Android中识别MICR代码

PyQt4.QtCore模块无法向sip模块注册

正则表达式，用于查找所有以任何字母开头和数字开头的文件

是否可以通过编程方式对很多动画进行重新着色？

机器密钥生成