将 XML 文件读取到 Pandas DataFrame
Read XML file to Pandas DataFrame
有人可以帮忙将以下 XML 文件转换为 Pandas 数据帧:
<?xml version="1.0" encoding="UTF-8" ?>
<root>
<bathrooms type="dict">
<n35237 type="number">1.0</n35237>
<n32238 type="number">3.0</n32238>
<n44699 type="number">nan</n44699>
</bathrooms>
<price type="dict">
<n35237 type="number">7020000.0</n35237>
<n32238 type="number">10000000.0</n32238>
<n44699 type="number">4128000.0</n44699>
</price>
<property_id type="dict">
<n35237 type="number">35237.0</n35237>
<n32238 type="number">32238.0</n32238>
<n44699 type="number">44699.0</n44699>
</property_id>
</root>
它应该是这样的 --
OUTPUT
这是我写的代码:-
import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse('real_state.xml')
root = tree.getroot()
dfcols = ['property_id', 'price', 'bathrooms']
df_xml = pd.DataFrame(columns=dfcols)
for node in root:
property_id = node.attrib.get('property_id')
price = node.attrib.get('price')
bathrooms = node.attrib.get('bathrooms')
df_xml = df_xml.append(
pd.Series([property_id, price, bathrooms], index=dfcols),
ignore_index=True)
print(df_xml)
我到处都得到 None,而不是实际值。有人可以告诉我如何修复它。谢谢!
如果数据很简单,像这样,那么你可以这样做:
from lxml import objectify
xml = objectify.parse('Document1.xml')
root = xml.getroot()
bathrooms = [child.text for child in root['bathrooms'].getchildren()]
price = [child.text for child in root['price'].getchildren()]
property_id = [child.text for child in root['property_id'].getchildren()]
data = [bathrooms, price, property_id]
df = pd.DataFrame(data).T
df.columns = ['bathrooms', 'price', 'property_id']
bathrooms price property_id
0 1.0 7020000.0 35237.0
1 3.0 10000000.0 32238.0
2 nan 4128000.0 44699.0
如果它更复杂,那么循环更好。你可以这样做
from lxml import objectify
xml = objectify.parse('Document1.xml')
root = xml.getroot()
data=[]
for i in range(len(root.getchildren())):
data.append([child.text for child in root.getchildren()[i].getchildren()])
df = pd.DataFrame(data).T
df.columns = ['bathrooms', 'price', 'property_id']
我已经成功地使用了 xmltodict 包中的这个函数:
import xmltodict
xmlDict = xmltodict.parse(xmlData)
df = pd.DataFrame.from_dict(xmlDict)
我喜欢这个的一点是,我可以在解析 xml 和制作我的 df 之间轻松地进行一些字典操作。此外,如果结构巧妙,它有助于将数据作为字典来探索。
大家好,我找到了另一种非常简单的方法来解决这些问题。
参考:https://www.youtube.com/watch?v=WVrg5-cjr5k
import xml.etree.ElementTree as ET
import pandas as pd
import codecs
## open notebook and save your xml file to text.xml
with codecs.open('text.xml', 'r', encoding='utf8') as f:
tt = f.read()
def xml2df(xml_data):
root = ET.XML(xml_data)
all_records = []
for i, child in enumerate(root):
record = {}
for sub_child in child:
record[sub_child.tag] = sub_child.text
all_records.append(record)
return pd.DataFrame(all_records)
df_xml1 = xml2df(tt)
print(df_xml1)
为了更好地理解 ET,您可以使用下面的代码来查看 xml
的内容
import xml.etree.ElementTree as ET
import pandas as pd
import codecs
with codecs.open('text.xml', 'r', encoding='utf8') as f:
tt = f.read()
root = ET.XML(tt)
print(type(root))
print(root[0])
for ele in root[0]:
print(ele.tag + '////' + ele.text)
print(root[0][0].tag)
完成 运行 程序后,您可以在下面看到输出:
C:\Users\username\Documents\pycode\Scripts\python.exe C:/Users/username/PycharmProjects/DestinationLight/try.py
n35237 n32238 n44699
0 1.0 3.0 nan
1 7020000.0 10000000.0 4128000.0
2 35237.0 32238.0 44699.0
<class 'xml.etree.ElementTree.Element'>
<Element 'bathrooms' at 0x00000285006B6180>
n35237////1.0
n32238////3.0
n44699////nan
n35237
Process finished with exit code 0
有人可以帮忙将以下 XML 文件转换为 Pandas 数据帧:
<?xml version="1.0" encoding="UTF-8" ?>
<root>
<bathrooms type="dict">
<n35237 type="number">1.0</n35237>
<n32238 type="number">3.0</n32238>
<n44699 type="number">nan</n44699>
</bathrooms>
<price type="dict">
<n35237 type="number">7020000.0</n35237>
<n32238 type="number">10000000.0</n32238>
<n44699 type="number">4128000.0</n44699>
</price>
<property_id type="dict">
<n35237 type="number">35237.0</n35237>
<n32238 type="number">32238.0</n32238>
<n44699 type="number">44699.0</n44699>
</property_id>
</root>
它应该是这样的 --
OUTPUT
这是我写的代码:-
import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse('real_state.xml')
root = tree.getroot()
dfcols = ['property_id', 'price', 'bathrooms']
df_xml = pd.DataFrame(columns=dfcols)
for node in root:
property_id = node.attrib.get('property_id')
price = node.attrib.get('price')
bathrooms = node.attrib.get('bathrooms')
df_xml = df_xml.append(
pd.Series([property_id, price, bathrooms], index=dfcols),
ignore_index=True)
print(df_xml)
我到处都得到 None,而不是实际值。有人可以告诉我如何修复它。谢谢!
如果数据很简单,像这样,那么你可以这样做:
from lxml import objectify
xml = objectify.parse('Document1.xml')
root = xml.getroot()
bathrooms = [child.text for child in root['bathrooms'].getchildren()]
price = [child.text for child in root['price'].getchildren()]
property_id = [child.text for child in root['property_id'].getchildren()]
data = [bathrooms, price, property_id]
df = pd.DataFrame(data).T
df.columns = ['bathrooms', 'price', 'property_id']
bathrooms price property_id
0 1.0 7020000.0 35237.0
1 3.0 10000000.0 32238.0
2 nan 4128000.0 44699.0
如果它更复杂,那么循环更好。你可以这样做
from lxml import objectify
xml = objectify.parse('Document1.xml')
root = xml.getroot()
data=[]
for i in range(len(root.getchildren())):
data.append([child.text for child in root.getchildren()[i].getchildren()])
df = pd.DataFrame(data).T
df.columns = ['bathrooms', 'price', 'property_id']
我已经成功地使用了 xmltodict 包中的这个函数:
import xmltodict
xmlDict = xmltodict.parse(xmlData)
df = pd.DataFrame.from_dict(xmlDict)
我喜欢这个的一点是,我可以在解析 xml 和制作我的 df 之间轻松地进行一些字典操作。此外,如果结构巧妙,它有助于将数据作为字典来探索。
大家好,我找到了另一种非常简单的方法来解决这些问题。 参考:https://www.youtube.com/watch?v=WVrg5-cjr5k
import xml.etree.ElementTree as ET
import pandas as pd
import codecs
## open notebook and save your xml file to text.xml
with codecs.open('text.xml', 'r', encoding='utf8') as f:
tt = f.read()
def xml2df(xml_data):
root = ET.XML(xml_data)
all_records = []
for i, child in enumerate(root):
record = {}
for sub_child in child:
record[sub_child.tag] = sub_child.text
all_records.append(record)
return pd.DataFrame(all_records)
df_xml1 = xml2df(tt)
print(df_xml1)
为了更好地理解 ET,您可以使用下面的代码来查看 xml
的内容import xml.etree.ElementTree as ET
import pandas as pd
import codecs
with codecs.open('text.xml', 'r', encoding='utf8') as f:
tt = f.read()
root = ET.XML(tt)
print(type(root))
print(root[0])
for ele in root[0]:
print(ele.tag + '////' + ele.text)
print(root[0][0].tag)
完成 运行 程序后,您可以在下面看到输出:
C:\Users\username\Documents\pycode\Scripts\python.exe C:/Users/username/PycharmProjects/DestinationLight/try.py
n35237 n32238 n44699
0 1.0 3.0 nan
1 7020000.0 10000000.0 4128000.0
2 35237.0 32238.0 44699.0
<class 'xml.etree.ElementTree.Element'>
<Element 'bathrooms' at 0x00000285006B6180>
n35237////1.0
n32238////3.0
n44699////nan
n35237
Process finished with exit code 0