导入 CSV，重塑变量数组以进行逻辑回归

Question

我希望每个人都在 COVID-19 大流行中保持安全。我是 Python 的新手，有一个关于将数据从 CSV 导入 Python 以进行简单逻辑回归分析的快速问题，其中因变量是二元的，自变量是连续的。

我导入了一个CSV文件，然后想用一个变量（Active）作为自变量，另一个变量（Smoke）作为响应变量。我能够将 CSV 文件加载到 Python 但每次我尝试生成逻辑回归模型来预测运动中的烟雾时，我都会收到一个错误，即运动必须重新整形为一列（二维），如目前是一维的。

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
data = pd.read_csv('Pulse.csv') # Read the data from the CSV file
x = data['Active'] # Load the values from Exercise into the independent variable
x = np.array.reshape(-1,1)
y = data['Smoke'] # The dependent variable is set as Smoke

我不断收到以下错误消息：

ValueError: Expected 2D array, got 1D array instead: array=[ 97. 82. 88. 106. 78. 109. 66. 68. 100. 70. 98. 140. 105. 84. 134. 117. 100. 108. 76. 86. 110. 65. 85. 80. 87. 133. 125. 61. 117. 90. 110. 68. 102. 67. 112. 86. 85. 66. 73. 85. 110. 97. 93. 86. 80. 96. 74. 124. 78. 93. 80. 80. 92. 69. 82. 88. 74. 74. 75. 120. 105. 104. 99. 113. 67. 125. 133. 98. 80. 91. 76. 78. 94. 150. 92. 96. 68. 82. 102. 69. 65. 84. 86. 84. 116. 88. 65. 101. 89. 128. 68. 90. 80. 80. 98. 90. 82. 97. 90. 98. 88. 94. 92. 96. 80. 66. 110. 87. 88. 94. 96. 89. 74. 111. 81. 98. 99. 65. 95. 127. 76. 102. 88. 125. 72. 76. 112. 69. 101. 72. 112. 81. 90. 96. 66. 114. 71. 75. 102. 138. 85. 80. 107. 119. 98. 95. 95. 76. 96. 102. 82. 99. 80. 83. 102. 102. 106. 79. 80. 79. 110. 144. 80. 97. 60. 80. 108. 107. 51. 68. 80. 80. 60. 64. 87. 110. 110. 82. 154. 139. 86. 95. 112. 120. 79. 64. 84. 65. 60. 79. 79. 70. 75. 107. 78. 74. 80. 121. 120. 96. 75. 106. 88. 91. 98. 63. 95. 85. 83. 92. 81. 89. 103. 110. 78. 122. 122. 71. 65. 92. 93. 88. 90. 56. 95. 83. 97. 105. 82. 102. 87. 81.]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

以下是完整的、更新后的错误代码 (04/12/2020)： *我无法将错误日志输入到此文档中，因此我已将其复制并粘贴到此 public Google 文档中：https://docs.google.com/document/d/1vtrj6Znv54FJ4Zvv211TQvvCN6Ac5LDaOfvHicQn0nU/edit?usp=sharing

此外，这是 CSV 文件： https://drive.google.com/file/d/1g_-vPNklxRn_3nlNPsR-IOflLfXSzFb1/view?usp=sharing

scikit-learn
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
data = pd.read_csv('Pulse.csv')
x = data['Active']
y = data['Smoke']
lr = LogisticRegression().fit(x.values.reshape(-1,1), y)
p_pred = lr.predict_proba(x.values)
y_pred = lr.predict(x.values)
score_ = lr.score(x.values,y.values)
conf_m = confusion_matrix(y.values,y_pred.values)
report = classification_report(y.values,y_pred.values)
confusion_matrix(y, lr.predict(x))    
cm = confusion_matrix(y, lr.predict(x))
fig, ax = plt.subplots(figsize = (8,8))
ax.imshow(cm)
ax.grid(False)
ax.xaxis.set(ticks=(0,1), ticklabels = ('Predicted 0s', 'Predicted 1s'))
ax.yaxis.set(ticks=(0,1), ticklabels = ('Actual 0s', 'Actual 1s'))
ax.set_ylim(1.5, -0.5)
for i in range(2):
    for j in range(2):
        ax.text(j,i,cm[i,j],ha='center',va='center',color='red', size='45')
plt.show()
print(classification_report(y,model.predict(x)))

Answer 1

试试这个：

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

data = pd.read_csv('Pulse.csv') # Read the data from the CSV file
x = data['Active'] # Load the values from Exercise into the independent variable
y = data['Smoke'] # The dependent variable is set as Smoke

lr = LogisticRegression().fit(x.values.reshape(-1,1), y)

Answer 2

下面的代码应该可以工作：

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
data = pd.read_csv('Pulse.csv')
x = pd.DataFrame(data['Smoke'])
y = data['Smoke']
lr = LogisticRegression()
lr.fit(x,y)
p_pred = lr.predict_proba(x)
y_pred = lr.predict(x)
score_ = lr.score(x,y)
conf_m = confusion_matrix(y,y_pred)
report = classification_report(y,y_pred)

print(score_)
0.8836206896551724

print(conf_m)
[[204   2]
 [ 25   1]]

导入 CSV，重塑变量数组以进行逻辑回归

Importing a CSV, reshaping a variable's array for logistic regression

python

statistics

regression

numpy

reshape