将数组从 DatafRame 传递到具有分组和展平数组的函数中
Pass arrays from DatafFame into function with arrays grouped and flattened
我有一个数据框,其中包含数百名参与者的 X 位置数据和三个分组变量(每个参与者的 X 数据长度为 1000 点)。数据框预览:
X Z participantNum obsScenario startPos targetPos
16000 -16.0 -5.0 6950203 2 2 3
16001 -16.0 -5.0 6950203 2 2 3
16002 -16.0 -5.0 6950203 2 2 3
16003 -16.0 -5.0 6950203 2 2 3
16004 -16.0 -5.0 6950203 2 2 3
16005 -16.0 -5.0 6950203 2 2 3
16006 -16.0 -5.0 6950203 2 2 3
16007 -16.0 -5.0 6950203 2 2 3
16008 -16.0 -5.0 6950203 2 2 3
16009 -16.0 -5.0 6950203 2 2 3
我需要将所有 X 数据传递给函数,X 数据由 3 个分组变量分组,每个 X 数据数组在其自己的列中。现在它们都堆叠在一起。
这些是函数:(它首先经过 calc_confidence_interval)
def mean_confidence_interval(data, confidence=0.95):
a = 1.0*np.array(data)
n = len(a)
m, se = np.mean(a), scipy.stats.sem(a)
h = se * scp.stats.t._ppf((1+confidence)/2., n-1)
return m, m+h, m-h
def calc_confidence_interval(data):
mean_ci = []
top_ci =[]
bottom_ci=[]
for column in data.T:
m, t,b=mean_confidence_interval(column)
mean_ci.append(m); top_ci.append(t);bottom_ci.append(b)
return mean_ci, top_ci, bottom_ci
我正在尝试做这样的事情:
calc_CI = df.groupby(['obsScenario', 'startPos', 'targetPos'])['X'].apply(calc_confidence_interval)
calc_CI = calc_CI.join(calc_CI.rename('calc_CI'),
on = ['obsScenario', 'startPos', 'targetPos'])
但我收到错误:TypeError:类型 'numpy.float64' 的对象没有 len(),因为它当前将 X 数据作为单个数组而不是每个参与者的单独列传递,分组通过三个分组变量。
## Traceback
```python
--------------------------------------------------------------------------
calc_CI = allDataF.groupby(['obsScenario', 'startPos', 'targetPos'])['X'].apply(calc_confidence_interval)
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 226, in apply
return super().apply(func, *args, **kwargs)
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 870, in apply
return self._python_apply_general(f, self._selected_obj)
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 892, in _python_apply_general
keys, values, mutated = self.grouper.apply(f, data, self.axis)
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 213, in apply
res = f(group)
File "/Users/lillyrigoli/Desktop/PhD/PhD_Projects/RouteSelection/Analysis_RS/load_filter_plot_CI_RS.py", line 221, in calc_confidence_interval
m, t,b=mean_confidence_interval(column)
File "/Users/lillyrigoli/Desktop/PhD/PhD_Projects/RouteSelection/Analysis_RS/load_filter_plot_CI_RS.py", line 210, in mean_confidence_interval
n = len(a)
TypeError: object of type 'numpy.float64' has no len()
函数 return 置信区间(顶部、平均值和底部)作为列表。
最后我应该得到的输出是这样的,每个分组组合的输出(mean_ci、top_ci、bottom_ci数组)。
obsScenario startPos targetPos mean_ci top_ci bottom_ci
0 1 1 [array of length 1000] [array of length 1000] [array of length 1000]
0 2 2 [array of length 1000] [array of length 1000] [array of length 1000]
1 1 1 [array of length 1000] [array of length 1000] [array of length 1000]
1 2 2 [array of length 1000] [array of length 1000] [array of length 1000]
我认为你可能比尝试使用应用更成功地明确迭代组,这似乎增加了你正在尝试做的事情的复杂性。
results = []
groupby = df.groupby(['obsScenario', 'startPos', 'targetPos'])
for group_name in groupby:
groupdf = groupby.get_group(group_name)
# call your functions here
# append results to results
也可能是您只需要传递额外的参数来申请您的函数按预期工作的情况。 apply
有一个名为 args
的参数,除了 array/series.
之外,它还采用一个位置参数元组传递给应用函数
calc_CI = df.groupby(['obsScenario', 'startPos', 'targetPos'])['X'].apply(calc_confidence_interval, args=(arg1, arg2, ...))
我有一个数据框,其中包含数百名参与者的 X 位置数据和三个分组变量(每个参与者的 X 数据长度为 1000 点)。数据框预览:
X Z participantNum obsScenario startPos targetPos
16000 -16.0 -5.0 6950203 2 2 3
16001 -16.0 -5.0 6950203 2 2 3
16002 -16.0 -5.0 6950203 2 2 3
16003 -16.0 -5.0 6950203 2 2 3
16004 -16.0 -5.0 6950203 2 2 3
16005 -16.0 -5.0 6950203 2 2 3
16006 -16.0 -5.0 6950203 2 2 3
16007 -16.0 -5.0 6950203 2 2 3
16008 -16.0 -5.0 6950203 2 2 3
16009 -16.0 -5.0 6950203 2 2 3
我需要将所有 X 数据传递给函数,X 数据由 3 个分组变量分组,每个 X 数据数组在其自己的列中。现在它们都堆叠在一起。
这些是函数:(它首先经过 calc_confidence_interval)
def mean_confidence_interval(data, confidence=0.95):
a = 1.0*np.array(data)
n = len(a)
m, se = np.mean(a), scipy.stats.sem(a)
h = se * scp.stats.t._ppf((1+confidence)/2., n-1)
return m, m+h, m-h
def calc_confidence_interval(data):
mean_ci = []
top_ci =[]
bottom_ci=[]
for column in data.T:
m, t,b=mean_confidence_interval(column)
mean_ci.append(m); top_ci.append(t);bottom_ci.append(b)
return mean_ci, top_ci, bottom_ci
我正在尝试做这样的事情:
calc_CI = df.groupby(['obsScenario', 'startPos', 'targetPos'])['X'].apply(calc_confidence_interval)
calc_CI = calc_CI.join(calc_CI.rename('calc_CI'),
on = ['obsScenario', 'startPos', 'targetPos'])
但我收到错误:TypeError:类型 'numpy.float64' 的对象没有 len(),因为它当前将 X 数据作为单个数组而不是每个参与者的单独列传递,分组通过三个分组变量。
## Traceback
```python
--------------------------------------------------------------------------
calc_CI = allDataF.groupby(['obsScenario', 'startPos', 'targetPos'])['X'].apply(calc_confidence_interval)
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 226, in apply
return super().apply(func, *args, **kwargs)
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 870, in apply
return self._python_apply_general(f, self._selected_obj)
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 892, in _python_apply_general
keys, values, mutated = self.grouper.apply(f, data, self.axis)
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 213, in apply
res = f(group)
File "/Users/lillyrigoli/Desktop/PhD/PhD_Projects/RouteSelection/Analysis_RS/load_filter_plot_CI_RS.py", line 221, in calc_confidence_interval
m, t,b=mean_confidence_interval(column)
File "/Users/lillyrigoli/Desktop/PhD/PhD_Projects/RouteSelection/Analysis_RS/load_filter_plot_CI_RS.py", line 210, in mean_confidence_interval
n = len(a)
TypeError: object of type 'numpy.float64' has no len()
函数 return 置信区间(顶部、平均值和底部)作为列表。
最后我应该得到的输出是这样的,每个分组组合的输出(mean_ci、top_ci、bottom_ci数组)。
obsScenario startPos targetPos mean_ci top_ci bottom_ci
0 1 1 [array of length 1000] [array of length 1000] [array of length 1000]
0 2 2 [array of length 1000] [array of length 1000] [array of length 1000]
1 1 1 [array of length 1000] [array of length 1000] [array of length 1000]
1 2 2 [array of length 1000] [array of length 1000] [array of length 1000]
我认为你可能比尝试使用应用更成功地明确迭代组,这似乎增加了你正在尝试做的事情的复杂性。
results = []
groupby = df.groupby(['obsScenario', 'startPos', 'targetPos'])
for group_name in groupby:
groupdf = groupby.get_group(group_name)
# call your functions here
# append results to results
也可能是您只需要传递额外的参数来申请您的函数按预期工作的情况。 apply
有一个名为 args
的参数,除了 array/series.
calc_CI = df.groupby(['obsScenario', 'startPos', 'targetPos'])['X'].apply(calc_confidence_interval, args=(arg1, arg2, ...))