使用 Python 将 Atom 或 OData XML 文件转换为 OData Json 文件

Conversion of Atom or OData XML file to OData Json file using Python

我一直在尝试将 PowerShell 脚本转换为 Python 代码以从 Sharepoint 下载列表文件。截至目前,大部分编码部分已完成并执行良好。但是,当我将文件从 Sharepoint 下载到本地驱动器时,扩展名为 .json,文件内容与预期不符。

Sharepoint 列表内容类型为=>content-type: application/atom+xml;type=feed;charset=utf-8,格式为xml。由于我无法将内容保存为 .json 格式,我已将文件下载为 .xml 并使用 xmltodict python 将其转换为 .json目前还不错的套餐。

这是我的实际查询:我们如何使用 .json 下载 xml 内容或将 xml 文件转换为 json 文件没有属性类型、标签和命名空间等? 我们需要以下面的 PowerShell 脚本生成的输出格式下载文件,没有任何标签,只有键值对。

我只是共享示例文件内容,而不是复制整个内容,因为它涉及一些敏感数据。

这是原子 xml format/Odata xml.

中的 Sharepoint 网络 url 内容
<?xml version="1.0" encoding="utf-8"?><feed xml:base="https://myorg.sharepoint.com/sites/pwaeng/_api/" xmlns="http://www.w3.org/2005/Atom" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"

<d:Created m:type="Edm.DateTime">2018-05-09T21:21:03Z</d:Created><d:AuthorId m:type="Edm.Int32">1344</d:AuthorId><d:EditorId m:type="Edm.Int32">1344</d:EditorId><d:OData__UIVersionString>1.0</d:OData__UIVersionString><d:Attachments m:type="Edm.Boolean">false</d:Attachments><d:GUID m:type="Edm.Guid">9ef38bd1-a098-4610-98a4-dbf7488a5a27</d:GUID></m:properties></content></entry></feed>

这是Python转换的json数据

{"feed": {"@xml:base": "https://myorg.sharepoint.com/sites/pwaeng/_api/", "@xmlns": "http://www.w3.org/2005/Atom", "@xmlns:d": "http://schemas.microsoft.com/ado/2007/08/dataservices",

"d:Created": {"@m:type": "Edm.DateTime", "#text": "2018-05-09T21:21:03Z"}, "d:AuthorId": {"@m:type": "Edm.Int32", "#text": "1344"}, "d:EditorId": {"@m:type": "Edm.Int32", "#text": "1344"}, "d:OData__UIVersionString": "1.0", "d:Attachments": {"@m:type": "Edm.Boolean", "#text": "false"}, "d:GUID": {"@m:type": "Edm.Guid", "#text": "9ef38bd1-a098-4610-98a4-dbf7488a5a27"}}}}}}

已下载 PowerShell Json 个文件

{"odata.metadata":"https://myorg.sharepoint.com/sites/pwaeng/_api/$metadata#SP.ListData.Program_x0020_RisksListItems","值" :[{"odata.type":"SP.Data.Program_x0020_RisksListItem","odata.id":"a878d166-c19d-4c16-82b4-e150e7e49626","odata.etag":""2"","odata.editLink":"Web/Lists

"Created":"2018-05-09T21:21:03Z","AuthorId":1344,"EditorId":1344,"OData__UIVersionString":"1.0","Attachments":false,"GUID":"9ef38bd1-a098-4610-98a4-dbf7488a5a27"}]}

下面是 Python 代码的一部分。我已经尝试了大部分选项,但未能获得所需的输出。

     listURL = webAbsoluteURL + 
    "/_api/web/lists/GetByTitle('" + List + "')/items"
   

   #print(listURL)
   count = 0
   #print(type(str(count)))
   fileName = "file_" + ListFolder.strip() + "_" + str(count) + "_" + date
   #print(fileName)
   xml_output = Filepath + "/" + fileName + ".xml"  ##USe backslash in Windows
   json_output = Filepath + "/" + fileName + ".json"
   #print(output)
   #print(userName, Password)
   url = listURL
   #ctx = ClientContext(url).with_credentials(UserCredential(userName, Password))
   #web = ctx.web.get().execute_query()
   #print("Web title: {0}".format(web.properties['Title']))
   ctx_auth = AuthenticationContext(webAbsoluteURL)
   token = ctx_auth.acquire_token_for_user(userName, Password)
   #ctx = ClientContext(webAbsoluteURL, ctx_auth)
   #print(token)
   options = RequestOptions(webAbsoluteURL)
   ctx_auth.authenticate_request(options)
   #options.headers = {
   #'accept' : 'text/html,application/xhtml+xml,application/xml',
   #'content-type': 'application/atom+xml;type=feed;charset=utf-8',
   #'X-RequestForceAuthentication' : 'true'
   #}
   response = requests.get(url, headers=options.headers, allow_redirects=True, timeout=60000)
   #print(req.status_code)
   #headers = {
   #'accept' : 'application/json;odata=verbose',
   #'content-type' : 'application/json;odata=verbose',
   #'X-RequestForceAuthentication' : 'true'
   #}
   #response = requests.get(url, allow_redirects=True, headers=headers, timeout=60000)
   #print(response.status_code)
   with open(xml_output, 'wb') as file_save:
      file_save.write(response.content)
   with open(xml_output, 'r', encoding = "UTF-8") as xml_file:
      data_dict = xmltodict.parse(xml_file.read()) # , attr_prefix='')
      xml_file.close()
      #json_data = json.dumps(data_dict, separators=(',', ':'))
      #json_data = json.dumps(data_dict, indent=2)
      json_data = json.dumps(data_dict)
   #with open(json_output, 'w') as json_file:
   #   json.dump(data_dict, json_file)
   #   json_file.close()
   with open(json_output, 'wb') as json_file:
      json_file.write(json_data.encode("UTF-8"))
      json_file.close()

找到了解决方案,而不是使用 xml 到 json 解析器(xmltodict.parse)等,简单的解决方案是添加这个“?&$ format=json" 到网页末尾URL.

XML_DATA_URL = https://myorg.sharepoint.com/sites/pwaeng/_api/projectdata/Tasks

JSON_FORMAT_URL = https://myorg.sharepoint.com/sites/pwaeng/_api/projectdata/Tasks?&$format=json

但是,这不适用于以下类型的 URL。

https://myorg.sharepoint.com/sites/pwaeng/_api/web/lists/GetByTitle('Program Risks')/items

https://myorg.sharepoint.com/sites/pwaeng/_api/web/lists/GetByTitle('Program Risks')/items?&$format=json

如果有人有任何建议,请在此处添加您的评论..