xarray 从 Pandas 写入 netCDF - 维度问题
xarray writing to netCDF from Pandas - dimension issue
学习如何使用 xarray 从 Pandas DF 生成 netCDF 文件。按照几个教程和 SO 问题 and 但仍有一些问题,因为我无法获得 Date_Time、纬度和经度作为尺寸。当我进行 nc 转储时,它们不正确。
将 txt 文件导入 pandas df 然后将 xr 导入 netCDF 的初始方法:
import pandas as pd
import xarray
#IMport Data from .dat file
colnames1 = ['Date','Time','latitude','longitude','Status','depth']
df2 = pd.read_csv('test.txt',header=0,error_bad_lines=False, names = colnames1,delim_whitespace=True)
# create xray Dataset from Pandas DataFrame
xr = xarray.Dataset.from_dataframe(df2)
# add variable attribute metadata
xr['latitude'].attrs={'units':'degrees', 'long_name':'Latitude'}
xr['longitude'].attrs={'units':'degrees', 'long_name':'Longitude'}
xr['depth'].attrs={'units':'m', 'long_name':'depth'}
# add global attribute metadata
xr.attrs={'Conventions':'CF-1.6', 'title':'Data', 'summary':'Data generated'}
#print xr
print xr
# save to netCDF
xr.to_netcdf('test.nc')
其中 df2 =
Date Time grid_latitude grid_longitude Status depth
2017-09-05 13:01:59 -29.034083 31.068567 2.0 0.0
2017-09-05 13:01:59 -29.039367 31.059150 2.0 0.0
2017-09-05 13:01:59 -29.036650 31.059200 3.0 0.0
2017-09-05 13:01:59 -29.036750 31.065417 7.0 100.0
2017-09-05 13:01:59 -29.039317 31.056050 7.0 100.0
2017-09-05 13:01:59 -29.034000 31.062367 3.0 0.0
2017-09-05 13:01:59 -29.036517 31.049900 3.0 0.0
2017-09-05 13:01:59 -29.031100 31.050000 3.0 0.0
这工作正常,但尺寸不正确(见下文):
<xarray.Dataset>
Dimensions: (index: 8)
Coordinates:
* index (index) int64 0 1 2 3 4 5 6 7
Data variables:
Date (index) object '2017-09-05' '2017-09-05' '2017-09-05' ...
Time (index) object '13:01:59' '13:01:59' '13:01:59' '13:01:59' ...
latitude (index) float64 -29.03 -29.04 -29.04 -29.04 -29.04 -29.03 ...
longitude (index) float64 31.07 31.06 31.06 31.07 31.06 31.06 31.05 31.05
Status (index) float64 2.0 2.0 3.0 7.0 7.0 3.0 3.0 3.0
depth (index) float64 0.0 0.0 0.0 100.0 100.0 0.0 0.0 0.0
Attributes:
title: Data
summary: Data generated
Conventions: CF-1.6
如果我将日期或合并的 Date_Time 设置为 DF 索引,则 Date/Time 的维度很好并被视为维度:
<xarray.Dataset>
Dimensions: (Date: 8)
Coordinates:
* Date (Date) object '2017-09-05' '2017-09-05' '2017-09-05' ...
Data variables:
Time (Date) object '13:01:59' '13:01:59' '13:01:59' '13:01:59' ...
latitude (Date) float64 -29.03 -29.04 -29.04 -29.04 -29.04 -29.03 ...
longitude (Date) float64 31.07 31.06 31.06 31.07 31.06 31.06 31.05 31.05
Status (Date) float64 2.0 2.0 3.0 7.0 7.0 3.0 3.0 3.0
depth (Date) float64 0.0 0.0 0.0 100.0 100.0 0.0 0.0 0.0
Attributes:
title: Data
summary: Data generated
Conventions: CF-1.6
但是如果我在 Date_Time、纬度和经度上设置 df.index,它会恢复为空白(索引)。
将不胜感激获得尺寸设置的指针。使用 netCDF 模块,可以使用语法:lat = dataset.createDimension('lat', 73) 创建维度。 SO 示例 也无济于事。也许我遗漏了什么,或者这是我学习的局限性。我想让它达到 nc 转储产生与此类似的东西的程度。
NetCDF dimension information:
Name: lat
size: 73
type: dtype('float32')
units: u'degrees_north'
actual_range: array([ 90., -90.], dtype=float32)
long_name: u'Latitude'
standard_name: u'latitude'
axis: u'Y'
Name: lon
size: 144
type: dtype('float32')
units: u'degrees_east'
long_name: u'Longitude'
actual_range: array([ 0. , 357.5], dtype=float32)
standard_name: u'longitude'
axis: u'X'
Name: time
size: 366
type: dtype('float64')
units: u'hours since 1-1-1 00:00:0.0'
long_name: u'Time'
actual_range: array([ 17628096., 17636856.])
delta_t: u'0000-00-01 00:00:00'
standard_name: u'time'
axis: u'T'
avg_period: u'0000-00-01 00:00:00'
否则我可以将 DF 列转换为 np 数组,并使用 netCDF 模块吗?提前谢谢了。
我确实冒险尝试过这样的事情,但我怀疑它是否在正确的道路上:
#add dimeensions
#d = {}
#d['time'] = ('time',df2.Time)
#d['latitude'] = ('latitude',df2.latitude)
#d['longitude'] = ('longitude', df2.longitude)
#d['var'] = (['time','latitude','longitude','Depth'], xr)
#xr = xray.Dataset(d)
这是最容易实现的,方法是在转换为 xarray 之前,将 Time
、grid_latitude
和 grid_longitude
组合成 DataFrame 上的 pandas.MultiIndex
和 set_index()
数据集。
例如:
# note that pandas.DataFrame's to_xarray() method is equivalent to
# xarray.Dataset.from_dataframe()
ds = df.set_index(['Time', 'grid_latitude', 'grid_longitude']).to_xarray()
学习如何使用 xarray 从 Pandas DF 生成 netCDF 文件。按照几个教程和 SO 问题
将 txt 文件导入 pandas df 然后将 xr 导入 netCDF 的初始方法:
import pandas as pd
import xarray
#IMport Data from .dat file
colnames1 = ['Date','Time','latitude','longitude','Status','depth']
df2 = pd.read_csv('test.txt',header=0,error_bad_lines=False, names = colnames1,delim_whitespace=True)
# create xray Dataset from Pandas DataFrame
xr = xarray.Dataset.from_dataframe(df2)
# add variable attribute metadata
xr['latitude'].attrs={'units':'degrees', 'long_name':'Latitude'}
xr['longitude'].attrs={'units':'degrees', 'long_name':'Longitude'}
xr['depth'].attrs={'units':'m', 'long_name':'depth'}
# add global attribute metadata
xr.attrs={'Conventions':'CF-1.6', 'title':'Data', 'summary':'Data generated'}
#print xr
print xr
# save to netCDF
xr.to_netcdf('test.nc')
其中 df2 =
Date Time grid_latitude grid_longitude Status depth
2017-09-05 13:01:59 -29.034083 31.068567 2.0 0.0
2017-09-05 13:01:59 -29.039367 31.059150 2.0 0.0
2017-09-05 13:01:59 -29.036650 31.059200 3.0 0.0
2017-09-05 13:01:59 -29.036750 31.065417 7.0 100.0
2017-09-05 13:01:59 -29.039317 31.056050 7.0 100.0
2017-09-05 13:01:59 -29.034000 31.062367 3.0 0.0
2017-09-05 13:01:59 -29.036517 31.049900 3.0 0.0
2017-09-05 13:01:59 -29.031100 31.050000 3.0 0.0
这工作正常,但尺寸不正确(见下文):
<xarray.Dataset>
Dimensions: (index: 8)
Coordinates:
* index (index) int64 0 1 2 3 4 5 6 7
Data variables:
Date (index) object '2017-09-05' '2017-09-05' '2017-09-05' ...
Time (index) object '13:01:59' '13:01:59' '13:01:59' '13:01:59' ...
latitude (index) float64 -29.03 -29.04 -29.04 -29.04 -29.04 -29.03 ...
longitude (index) float64 31.07 31.06 31.06 31.07 31.06 31.06 31.05 31.05
Status (index) float64 2.0 2.0 3.0 7.0 7.0 3.0 3.0 3.0
depth (index) float64 0.0 0.0 0.0 100.0 100.0 0.0 0.0 0.0
Attributes:
title: Data
summary: Data generated
Conventions: CF-1.6
如果我将日期或合并的 Date_Time 设置为 DF 索引,则 Date/Time 的维度很好并被视为维度:
<xarray.Dataset>
Dimensions: (Date: 8)
Coordinates:
* Date (Date) object '2017-09-05' '2017-09-05' '2017-09-05' ...
Data variables:
Time (Date) object '13:01:59' '13:01:59' '13:01:59' '13:01:59' ...
latitude (Date) float64 -29.03 -29.04 -29.04 -29.04 -29.04 -29.03 ...
longitude (Date) float64 31.07 31.06 31.06 31.07 31.06 31.06 31.05 31.05
Status (Date) float64 2.0 2.0 3.0 7.0 7.0 3.0 3.0 3.0
depth (Date) float64 0.0 0.0 0.0 100.0 100.0 0.0 0.0 0.0
Attributes:
title: Data
summary: Data generated
Conventions: CF-1.6
但是如果我在 Date_Time、纬度和经度上设置 df.index,它会恢复为空白(索引)。
将不胜感激获得尺寸设置的指针。使用 netCDF 模块,可以使用语法:lat = dataset.createDimension('lat', 73) 创建维度。 SO 示例
NetCDF dimension information:
Name: lat
size: 73
type: dtype('float32')
units: u'degrees_north'
actual_range: array([ 90., -90.], dtype=float32)
long_name: u'Latitude'
standard_name: u'latitude'
axis: u'Y'
Name: lon
size: 144
type: dtype('float32')
units: u'degrees_east'
long_name: u'Longitude'
actual_range: array([ 0. , 357.5], dtype=float32)
standard_name: u'longitude'
axis: u'X'
Name: time
size: 366
type: dtype('float64')
units: u'hours since 1-1-1 00:00:0.0'
long_name: u'Time'
actual_range: array([ 17628096., 17636856.])
delta_t: u'0000-00-01 00:00:00'
standard_name: u'time'
axis: u'T'
avg_period: u'0000-00-01 00:00:00'
否则我可以将 DF 列转换为 np 数组,并使用 netCDF 模块吗?提前谢谢了。 我确实冒险尝试过这样的事情,但我怀疑它是否在正确的道路上:
#add dimeensions
#d = {}
#d['time'] = ('time',df2.Time)
#d['latitude'] = ('latitude',df2.latitude)
#d['longitude'] = ('longitude', df2.longitude)
#d['var'] = (['time','latitude','longitude','Depth'], xr)
#xr = xray.Dataset(d)
这是最容易实现的,方法是在转换为 xarray 之前,将 Time
、grid_latitude
和 grid_longitude
组合成 DataFrame 上的 pandas.MultiIndex
和 set_index()
数据集。
例如:
# note that pandas.DataFrame's to_xarray() method is equivalent to
# xarray.Dataset.from_dataframe()
ds = df.set_index(['Time', 'grid_latitude', 'grid_longitude']).to_xarray()