如何使用 (x,y) python 将线拟合到 2 个数组

Question

我有这段代码可以制作 2 np 文本的散点图：

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import minimize
import matplotlib.patches as mpatches

# Plot the points
def plot_points(plt, points, style):
    pts=points.reshape(-1,2)
    plt.plot(pts[:,0],pts[:,1],style)

shapes1=np.genfromtxt("volume_6.txt")
shapes2=np.genfromtxt("volume_5.txt")

n_shapes1=int(shapes1.shape[0])
print("Number of shapes", n_shapes1)
n_shapes2=int(shapes2.shape[0])
print("Number of shapes", n_shapes2)

for i in range(n_shapes1):
    plot_points(plt,shapes1[i,:],"ro")
for i in range(n_shapes2):
    plot_points(plt,shapes2[i,:],"b+")

plt.title("Intra-rater variability ~ volume1 vs volume2")
plt.xlabel('number of cases')
plt.ylabel('ml')
plt.show()

我的数据看起来像（抱歉找不到更好的显示方式）：

<!DOCTYPE html>
<html>
<body>
<table style="width:50%" >
  <tr>
    <th>overlap</th>
    <th>volume</th> 
    <th>non-overlap</th>
    <th>volume</th>
  </tr>
  <tr>
    <td>6</td>
    <td>9.869</td>
    <td>1</td>
    <td>24.89</td>
  </tr>
  <tr>
    <td>6</td>
    <td>18.09</td>
    <td>2</td>
    <td>53.075</td>
  </tr>
    <tr>
    <td>5</td>
    <td>15.069</td>
    <td>6</td>
    <td>49.839</td>
  </tr>
    <tr>
    <td>1</td>
    <td>1.945</td>
    <td>6</td>
    <td>44.889</td>
  </tr>
    <tr>
    <td>3</td>
    <td>10.474</td>
    <td>1</td>
    <td>15.187</td>
  </tr>
    <tr>
    <td>4</td>
    <td>4.416</td>
    <td>3</td>
    <td>8.318</td>
  </tr>
    <tr>
    <td>4</td>
    <td>6.419</td>
    <td>3</td>
    <td>8.287</td>
  </tr>
</table>

</body>
</html>

我想尝试的是将线分别拟合到 overlap/non-overlap 与体积案例并估计相关系数：

我试图计算一个斜率，但我得到了一个关于尺寸的错误有人可以帮忙吗？

Answer 1

我相信您遇到的错误是 sklearn 希望您将 x (if x ~ y) 重塑为 (-1, 1) 首先，您的数据在 pandas 数据框中看起来像这样，我只能建议您使用 pandas 但这是您的决定。

          type_ ml     MRS
0   non_overlap  1   24.89
1   non_overlap  2  53.075
2   non_overlap  6  49.839
3   non_overlap  6  44.889
4   non_overlap  1  15.187
5   non_overlap  3   8.318
6   non_overlap  3   8.287
7       overlap  6   9.869
8       overlap  6   18.09
9       overlap  5  15.069
10      overlap  1   1.945
11      overlap  3  10.474
12      overlap  4   4.416
13      overlap  4   6.419

为了计算线性回归，您需要 x 和 y。

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

df # the pandas dataframe
x = np.array(df["MRS"]) # you can both get a pandas Series into a array by calling np.array or acces the value attribute
y = df["ml"].value

# if we use it like that we will get the error you had.
# we need to reshape the x 
x = x.reshape((-1,1))

# the why is not explain but the documentation specify it as x should be of shape  (n_samples, n_features), 

#    Then you can do your linear regression
model = LinearRegression()
model.fit(x, y)
print("coef : ", model.coef_, "intercept : " model.intercept_, "score : ", model.score(x, y))

如何使用 (x,y) python 将线拟合到 2 个数组

How to fit line to 2 array with (x,y) python

python

scatter-plot

curve-fitting