如何使用Python从Google Doc中提取或读取图像

企鹅

我正在尝试从Google文档中读取数据。因此，我现在正在使用python，并且已经实现了Google Docs AP I并使用python。我只是复制粘贴谷歌提供的代码，而做了一些修改，我成功地读取数据逐行但只有文字！现在，我正在尝试新的东西并插入了图像。这是它的样子。

我的Google文档内容的图像

Google文件连结

非常简单的权利...它有一个项目符号点和包含图像和“ Hello”文本的子项目符号点。现在，当我读取数据（逐行读取）时，我尝试打印出API返回的内容，并再次返回一个dictionary包含dictionaries。这是它的样子。

{'startIndex': 1, 'endIndex': 41, 'paragraph': {'elements': [{'startIndex': 1, 'endIndex': 41, 'textRun': {'content': 'This is the Python Programming Language\n', 'textStyle': {}}}], 'paragraphStyle': {'namedStyleType': 'NORMAL_TEXT', 'direction': 'LEFT_TO_RIGHT', 'indentFirstLine': {'magnitude': 18, 'unit': 'PT'}, 'indentStart': {'magnitude': 36, 'unit': 'PT'}}, 'bullet': {'listId': 'kix.y7w314ij0ywy', 'textStyle': {'underline': False}}}}


{'startIndex': 41, 'endIndex': 43, 'paragraph': {'elements': [{'startIndex': 41, 'endIndex': 42, 'inlineObjectElement': {'inlineObjectId': 'kix.o4cuh6wash2n', 'textStyle': {}}}, {'startIndex': 42, 'endIndex': 43, 'textRun': {'content': '\n', 'textStyle': {}}}], 'paragraphStyle': {'namedStyleType': 'NORMAL_TEXT', 'direction': 'LEFT_TO_RIGHT', 'indentFirstLine': {'magnitude': 54, 'unit': 'PT'}, 'indentStart': {'magnitude': 72, 'unit': 'PT'}}, 'bullet': {'listId': 'kix.y7w314ij0ywy', 'nestingLevel': 1, 'textStyle': {'underline': False}}}}


{'startIndex': 43, 'endIndex': 49, 'paragraph': {'elements': [{'startIndex': 43, 'endIndex': 49, 'textRun': {'content': 'Hello\n', 'textStyle': {}}}], 'paragraphStyle': {'namedStyleType': 'NORMAL_TEXT', 'direction': 'LEFT_TO_RIGHT', 'indentFirstLine': {'magnitude': 54, 'unit': 'PT'}, 'indentStart': {'magnitude': 72, 'unit': 'PT'}}, 'bullet': {'listId': 'kix.y7w314ij0ywy', 'nestingLevel': 1, 'textStyle': {'underline': False}}}}

如您所见，共有3个字典，其中包含它们的key和value对。请注意，这三个用于文档中的每一行。正如您还可以看到的那样，这里有密钥content，密钥value是文档中的文本。

如果您查看嵌套字典，则为以下字典：

{'content': 'This is the Python Programming Language\n', 'textStyle': {}}
{'content': '\n', 'textStyle': {}}
{'content': 'Hello\n', 'textStyle': {}}

现在，我注意到它\n为图像所包含的行返回了a 。我也一直在寻找至少它可能有一个key，它的值将是图像的临时URL，但似乎没有。所以我的问题是否有办法以某种方式使用我正在使用的API读取此图像（也包括EXTRACT IT）？可能我只是缺少一些东西...有人可以帮我吗？任何其他替代解决方案将不胜感激！谢谢！

顺便说一下，这里是谷歌提供的源代码，我对read_strucutural_elements函数进行了修改，以实现我的个人目的读取数据，但是在那里您可以看到API会为每行数据返回字典的方式。我也注意到，API在某种程度上确实读它一行一行地返回dictionary它

def main():
    """Shows basic usage of the Docs API.
    Prints the title of a sample document.
    """
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    service = build('docs', 'v1', credentials=creds)

    # Retrieve the documents contents from the Docs service.
    document = service.documents().get(documentId=DOCUMENT_ID).execute()

    #print('The title of the document is: {}'.format(document.get('title')))
    data = read_strucutural_elements(document.get("body").get("content"))

这是read_strucutural_elements函数，我只是在那里打印出elements参数中的元素，其中该参数逐行包含那些数据。

def read_strucutural_elements(elements):

    for value in elements:
        print(value) #the value of the value variable is the nested dictionaries I've shown above
        print()

非常感谢你！

彼得罗彼得

查看字典输出，图像是具有特定ID的inlineObject。您应该能够使用其URL检索图像。要获取该网址，请参阅相关问题：如何获取该网址到Google文档图像

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-01-26

我来说两句

0 条评论

登录后参与评论

TOP 榜单

文章

如何使用Python从Google Doc中提取或读取图像

如何使用Python从Google Doc中提取或读取图像

计算数据帧R中的字符串频率

Android Studio Kotlin：提取为常量

Excel 2016图表将增长与4个参数进行比较

获取并汇总所有关联的数据

如何使用Redux-Toolkit重置Redux Store

http：// localhost：3000 /＃！/为什么我在localhost链接中得到“＃！/”。

将加号/减号添加到jQuery菜单

算术中的c ++常量类型转换

TYPO3：将 Formhandler 添加到新闻扩展

TreeMap中的自定义排序

如何开始为Ubuntu开发

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

无法使用 envoy 访问 .ssh/config

在Ubuntu和Windows中，触摸板有时会滞后。硬件问题？

遍历元素数组以每X秒在浏览器上显示

在Jenkins服务器中使用Selenium和Ruby进行的黄瓜测试失败，但在本地计算机中通过

警告消息：在matrix（unlist（drop.item），ncol = 10，byrow = TRUE）中：数据长度[16]不是列数的倍数[10]>？

未捕获的SyntaxError：带有Ajax帖子的意外令牌u

如何使用tweepy流式传输来自指定用户的推文（仅在该用户发布推文时流式传输）

尝试在Dell XPS13 9360上安装Windows 7时出错

如果从DB接收到的值为空，则JMeter JDBC调用将返回该值作为参数名称