如何使用python从Google Spreadsheet获取工作表ID?

克里斯

我想为Google Spreadsheet Workbook中每个工作表确定一种在URL中获取工作表ID的方法。例如,此工作簿的“ sheet2”的工作表ID为“ 1244369280”,因为其网址为https://docs.google.com/spreadsheets/d/1yd8qTYjRns4_OT8PbsZzH0zajvzguKSKS79dq6j--hnTs/edit#gid=1244369280

我发现的一种方法是提取Google Spreadsheet的XML,因为根据此问题,获取工作表ID的唯一方法是流式化工作表的XML,但是示例在Javascript中,我需要用Python做到这一点

这是我想在Python中执行的Javascript代码:

  Dim worksheetFeed As WorksheetFeed
  Dim query As WorksheetQuery
  Dim worksheet As WorksheetEntry
  Dim output As New MemoryStream
  Dim xml As String
  Dim gid As String = String.Empty

  Try
    _service = New Spreadsheets.SpreadsheetsService("ServiceName")
    _service.setUserCredentials(UserId, Password)
    query = New WorksheetQuery(feedUrl)
    worksheetFeed = _service.Query(query)
    worksheet = worksheetFeed.Entries(0)

    ' Save worksheet feed to memory stream so we can 
    ' get the xml returned from the feed url and look for
    ' the gid.  Gid allows us to download the specific worksheet tab
    Using output
      worksheet.SaveToXml(output)
    End Using

    xml = Encoding.ASCII.GetString(output.ToArray())

从Google Spreadsheet获取XML的最好方法似乎是使用Gdata,因此我下载了GData并尝试使用自己的凭据运行Google Spreadsheet示例

见下文

#!/usr/bin/python
#
# Copyright (C) 2007 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


__author__ = '[email protected] (Laura Beth Lincoln)'


try:
  from xml.etree import ElementTree
except ImportError:
  from elementtree import ElementTree
import gdata.spreadsheet.service
import gdata.service
import atom.service
import gdata.spreadsheet
import atom
import getopt
import sys
import string


class SimpleCRUD:

  def __init__(self, email, password):
    self.gd_client = gdata.spreadsheet.service.SpreadsheetsService()
    self.gd_client.email = '[email protected]'
    self.gd_client.password = 'jkjkdioerzumawya'
    self.gd_client.source = 'Spreadsheets GData Sample'
    self.gd_client.ProgrammaticLogin()
    self.curr_key = ''
    self.curr_wksht_id = ''
    self.list_feed = None

  def _PromptForSpreadsheet(self):
    # Get the list of spreadsheets
    feed = self.gd_client.GetSpreadsheetsFeed()
    self._PrintFeed(feed)
    input = raw_input('\nSelection: ')
    id_parts = feed.entry[string.atoi(input)].id.text.split('/')
    self.curr_key = id_parts[len(id_parts) - 1]

  def _PromptForWorksheet(self):
    # Get the list of worksheets
    feed = self.gd_client.GetWorksheetsFeed(self.curr_key)
    self._PrintFeed(feed)
    input = raw_input('\nSelection: ')
    id_parts = feed.entry[string.atoi(input)].id.text.split('/')
    self.curr_wksht_id = id_parts[len(id_parts) - 1]

  def _PromptForCellsAction(self):
    print ('dump\n'
           'update {row} {col} {input_value}\n'
           '\n')
    input = raw_input('Command: ')
    command = input.split(' ', 1)
    if command[0] == 'dump':
      self._CellsGetAction()
    elif command[0] == 'update':
      parsed = command[1].split(' ', 2)
      if len(parsed) == 3:
        self._CellsUpdateAction(parsed[0], parsed[1], parsed[2])
      else:
        self._CellsUpdateAction(parsed[0], parsed[1], '')
    else:
      self._InvalidCommandError(input)

  def _PromptForListAction(self):
    print ('dump\n'
           'insert {row_data} (example: insert label=content)\n'
           'update {row_index} {row_data}\n'
           'delete {row_index}\n'
           'Note: No uppercase letters in column names!\n'
           '\n')
    input = raw_input('Command: ')
    command = input.split(' ' , 1)
    if command[0] == 'dump':
      self._ListGetAction()
    elif command[0] == 'insert':
      self._ListInsertAction(command[1])
    elif command[0] == 'update':
      parsed = command[1].split(' ', 1)
      self._ListUpdateAction(parsed[0], parsed[1])
    elif command[0] == 'delete':
      self._ListDeleteAction(command[1])
    else:
      self._InvalidCommandError(input)

  def _CellsGetAction(self):
    # Get the feed of cells
    feed = self.gd_client.GetCellsFeed(self.curr_key, self.curr_wksht_id)
    self._PrintFeed(feed)

  def _CellsUpdateAction(self, row, col, inputValue):
    entry = self.gd_client.UpdateCell(row=row, col=col, inputValue=inputValue, 
        key=self.curr_key, wksht_id=self.curr_wksht_id)
    if isinstance(entry, gdata.spreadsheet.SpreadsheetsCell):
      print 'Updated!'

  def _ListGetAction(self):
    # Get the list feed
    self.list_feed = self.gd_client.GetListFeed(self.curr_key, self.curr_wksht_id)
    self._PrintFeed(self.list_feed)

  def _ListInsertAction(self, row_data):
    entry = self.gd_client.InsertRow(self._StringToDictionary(row_data), 
        self.curr_key, self.curr_wksht_id)
    if isinstance(entry, gdata.spreadsheet.SpreadsheetsList):
      print 'Inserted!'

  def _ListUpdateAction(self, index, row_data):
    self.list_feed = self.gd_client.GetListFeed(self.curr_key, self.curr_wksht_id)
    entry = self.gd_client.UpdateRow(
        self.list_feed.entry[string.atoi(index)], 
        self._StringToDictionary(row_data))
    if isinstance(entry, gdata.spreadsheet.SpreadsheetsList):
      print 'Updated!'

  def _ListDeleteAction(self, index):
    self.list_feed = self.gd_client.GetListFeed(self.curr_key, self.curr_wksht_id)
    self.gd_client.DeleteRow(self.list_feed.entry[string.atoi(index)])
    print 'Deleted!'

  def _StringToDictionary(self, row_data):
    dict = {}
    for param in row_data.split():
      temp = param.split('=')
      dict[temp[0]] = temp[1]
    return dict

  def _PrintFeed(self, feed):
    for i, entry in enumerate(feed.entry):
      if isinstance(feed, gdata.spreadsheet.SpreadsheetsCellsFeed):
        print '%s %s\n' % (entry.title.text, entry.content.text)
      elif isinstance(feed, gdata.spreadsheet.SpreadsheetsListFeed):
        print '%s %s %s' % (i, entry.title.text, entry.content.text)
        # Print this row's value for each column (the custom dictionary is
        # built using the gsx: elements in the entry.)
        print 'Contents:'
        for key in entry.custom:  
          print '  %s: %s' % (key, entry.custom[key].text) 
        print '\n',
      else:
        print '%s %s\n' % (i, entry.title.text)

  def _InvalidCommandError(self, input):
    print 'Invalid input: %s\n' % (input)

  def Run(self):
    self._PromptForSpreadsheet()
    self._PromptForWorksheet()
    input = raw_input('cells or list? ')
    if input == 'cells':
      while True:
        self._PromptForCellsAction()
    elif input == 'list':
      while True:
        self._PromptForListAction()


def main():
  # parse command line options
  try:
    opts, args = getopt.getopt(sys.argv[1:], "", ["user=", "pw="])
  except getopt.error, msg:
    print 'python spreadsheetExample.py --user [username] --pw [password] '
    sys.exit(2)

  user = '[email protected]'
  pw = 'fakepassword'
  key = ''
  # Process options
  for o, a in opts:
    if o == "--user":
      user = a
    elif o == "--pw":
      pw = a

  if user == '' or pw == '':
    print 'python spreadsheetExample.py --user [username] --pw [password] '
    sys.exit(2)

  sample = SimpleCRUD(user, pw)
  sample.Run()


if __name__ == '__main__':
  main()

但是,这将返回以下错误:

Traceback (most recent call last):
  File "/Users/Chris/Desktop/gdata_test.py", line 200, in <module>
    main()
  File "/Users/Chris/Desktop/gdata_test.py", line 196, in main
    sample.Run()
  File "/Users/Chris/Desktop/gdata_test.py", line 162, in Run
    self._PromptForSpreadsheet()
  File "/Users/Chris/Desktop/gdata_test.py", line 49, in _PromptForSpreadsheet
    feed = self.gd_client.GetSpreadsheetsFeed()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gdata/spreadsheet/service.py", line 99, in GetSpreadsheetsFeed
    converter=gdata.spreadsheet.SpreadsheetsSpreadsheetsFeedFromString)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gdata/service.py", line 1074, in Get
    return converter(result_body)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gdata/spreadsheet/__init__.py", line 395, in SpreadsheetsSpreadsheetsFeedFromString
    xml_string)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/atom/__init__.py", line 93, in optional_warn_function
    return f(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/atom/__init__.py", line 127, in CreateClassFromXMLString
    tree = ElementTree.fromstring(xml_string)
  File "<string>", line 125, in XML
cElementTree.ParseError: no element found: line 1, column 0
[Finished in 0.3s with exit code 1]
[shell_cmd: python -u "/Users/Chris/Desktop/gdata_test.py"]
[dir: /Users/Chris/Desktop]
[path: /usr/bin:/bin:/usr/sbin:/sbin]

我还应该提到,我一直在使用Gspread作为与Google Spreadsheets进行交互的方法,但是当我运行以下代码时,我得到了帮助,但是我需要具有工作表ID。

gc = gspread.authorize(credentials)
sh = gc.open_by_url('google_spreadsheet_url')
sh.get_id_fields() 
>> {'spreadsheet_id': '1BgCEn-3Nor7UxOEPwD-qv8qXe7CaveJBrn9_Lcpo4W4','worksheet_id': 'oqitk0d'}
ec

请参阅self.gd_client.ProgrammaticLogin()电话-这是引起主要问题的原因,因为它使用的是“ ClientLogin”授权方法,该方法最初已弃用,然后在2015年4月20日删除

实际上,我将研究更新鲜,更积极开发的gspread模块。


这是一个有点疯狂的示例,该示例演示如何为给定的电子表格和工作表名称提取实际的“ gid”值。请注意,您首先需要使用OAuth凭据生成JSON文件(假设您已经完成了此操作)。

代码(添加了希望有助于理解的注释):

import urlparse
import xml.etree.ElementTree as ET

import gspread
from oauth2client.service_account import ServiceAccountCredentials

SPREADSHEET_NAME = 'My Test Spreadsheet'
WORKSHEET_NAME = "Sheet2"

PATH_TO_JSON_KEYFILE = '/path/to/json/key/file.json'
NAMESPACES = {'ns0': 'http://www.w3.org/2005/Atom'}
SCOPES = ['https://spreadsheets.google.com/feeds']

# log in
credentials = ServiceAccountCredentials.from_json_keyfile_name(PATH_TO_JSON_KEYFILE, SCOPES)
gss_client = gspread.authorize(credentials)

# open spreadsheet
gss = gss_client.open(SPREADSHEET_NAME)

# extract the full feed url
root = gss._feed_entry
full_feed_url = next(elm.attrib["href"] for elm in root.findall("ns0:link", namespaces=NAMESPACES) if "full" in elm.attrib["href"])

# get the feed and extract the gid value for a given sheet name
response = gss_client.session.get(full_feed_url)
root = ET.fromstring(response.content)
sheet_entry = next(elm for elm in root.findall("ns0:entry", namespaces=NAMESPACES)
                   if elm.find("ns0:title", namespaces=NAMESPACES).text == WORKSHEET_NAME)
link = next(elm.attrib["href"] for elm in sheet_entry.findall("ns0:link", namespaces=NAMESPACES)
            if "gid=" in elm.attrib["href"])

# extract "gid" from URL
gid = urlparse.parse_qs(urlparse.urlparse(link).query)["gid"][0]
print(gid)

看起来还有一种方法可以将工作表ID转换为gid值,请参见:

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章

使用Google Spreadsheet中的脚本从许多不同的工作表中获取数据

Google Spreadsheet从两个不同的工作表中获取唯一值

如何使用python获取文件夹中的工作表ID数组

如何使用Python在Google Spreadsheet中为特定工作表拆分所有合并的单元格?

Excel:如何从工作表 B 中获取价格,其中产品 ID 与工作表 A 中的产品 ID 相同

如何使用xlrd在Python中获取Excel工作表名称

使用ByteScout Spreadsheet在Excel文件中写入多个工作表

如何使用python 3迭代Google工作表中的所有工作表?

如何使用Python解析HTML并获取表ID

如何从另一个工作表中获取基于ID的值

如何获取从副本创建的新文件夹/工作表的ID?

需要使用 ID 和标题从两个不同的工作表中获取数据/

使用具有Excel工作表数据的熊猫获取基于用户ID的数据?

如何找到Google SpreadSheet ID?

如何获取链接/绑定到Google Spreadsheet的Google Apps脚本项目ID

如何使用 Google Sheets 从另一个具有可变工作表名称的工作表中获取值?

如何通过工作表名称获取工作表序列

使用工作表的名称获取工作表的代号

如何使用gspread获取Google Spreadsheet单元格链接

如何使用getElementById获取ID指定的表

如何使用Spreadsheet :: ParseExcel :: SaveParser删除现有xls文件中的工作表?

如何在Google Sheets API中使用工作表ID?

如何使用Apps Script在Google工作表中提取pubhtml ID?

如何打印工作表名称和Google ID

使用Google脚本在列表中列出工作表ID

如何获取工作表的数据范围

如何获取Smartsheet发布的工作表的网址?

Google Spreadsheet ImportHTML表

如何使用Google API for python在特定文件夹下创建工作表?