用线性回归估算价格
Estimating price with Linear Regression
我在这里发帖是因为我在其他任何地方都找不到解决我的问题的方法。基本上我们在学校使用 python 学习线性回归,教授希望我们根据 csv table 估计三明治中每种成分的价格以及每个三明治的固定利润。到目前为止,我们只弄乱了一个 X 变量和一个 Y 变量,所以我很困惑我应该在这里做什么?谢谢你。这是 table:
tomato,lettuce,cheese,pickles,palmetto,burger,corn,ham,price
0.05,1,0.05,0,0.05,0.2,0.05,0,18.4
0.05,0,0.05,0.05,0,0.2,0.05,0.05,16.15
0.05,1,0.05,0,0.05,0.4,0,0,22.15
0.05,1,0.05,0,0.05,0.2,0.05,0.05,19.4
0.05,1,0,0,0,0.2,0.05,0.05,18.4
0,0,0.05,0,0,0,0.05,0.05,11.75
0.05,1,0,0,0,0.2,0,0.05,18.15
0.05,1,0.05,0.05,0.05,0.2,0.05,0,18.65
0,0,0.05,0,0,0.2,0.05,0.05,15.75
0.05,1,0.05,0,0.05,0,0.05,0.05,15.4
0.05,1,0,0,0,0.2,0,0,17.15
0.05,1,0,0,0.05,0.2,0.05,0.05,18.9
0,1,0.05,0,0,0.2,0.05,0.05,18.75
你有 9 个独立的回归变量(番茄...价格),每个变量有 13 个样本(13 行)。
所以第一种方法可能是对数据点 "tomato" 进行回归
0.05
0.05
0.05
0.05
0.05
0
0.05
0.05
0
0.05
0.05
0.05
0
然后为 "lettuce" 和其他人做另一个,直到 "price"
18.4
16.15
22.15
19.4
18.4
11.75
18.15
18.65
15.75
15.4
17.15
18.9
18.75
用于查看 CSV 数据的在线查看器:http://www.convertcsv.com/csv-viewer-editor.htm,但 Google SpreadSheet、Excel 等也可以很好地显示它。
SciPy 可能(很可能)也可以在向量上为您完成任务(因此一起处理 9 个变量),但是在 13 行中有 13 个样本的部分仍然存在。
编辑:坏消息,我太累了,没有回答完整的问题,很抱歉。
虽然确实可以将前 8 列(番茄...火腿)作为时间序列,并对它们进行单独回归(这可能是此作业的第一部分),但最后一列 ( price)预计从前8位开始估算。
使用维基百科中的符号,https://en.wikipedia.org/wiki/Linear_regression#Introduction,您的 y
向量是最后一列(价格),X
矩阵是数据的前 8 列(番茄...火腿),在某处用一列 1-s 扩展。
然后选择一种估计方法(该页面中也列出了一些,https://en.wikipedia.org/wiki/Linear_regression#Estimation_methods, but you may want to pick one you have learned about at class). The actual math is there, and NumPy can do the matrix/vector calculations. If you go for "Ordinary least squares", numpy.linalg.lstsq
does the same (https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.lstsq.html#numpy.linalg.lstsq - 您可能会发现添加 1-s 的那一列很熟悉),因此可以用于验证结果。
我在这里发帖是因为我在其他任何地方都找不到解决我的问题的方法。基本上我们在学校使用 python 学习线性回归,教授希望我们根据 csv table 估计三明治中每种成分的价格以及每个三明治的固定利润。到目前为止,我们只弄乱了一个 X 变量和一个 Y 变量,所以我很困惑我应该在这里做什么?谢谢你。这是 table:
tomato,lettuce,cheese,pickles,palmetto,burger,corn,ham,price
0.05,1,0.05,0,0.05,0.2,0.05,0,18.4
0.05,0,0.05,0.05,0,0.2,0.05,0.05,16.15
0.05,1,0.05,0,0.05,0.4,0,0,22.15
0.05,1,0.05,0,0.05,0.2,0.05,0.05,19.4
0.05,1,0,0,0,0.2,0.05,0.05,18.4
0,0,0.05,0,0,0,0.05,0.05,11.75
0.05,1,0,0,0,0.2,0,0.05,18.15
0.05,1,0.05,0.05,0.05,0.2,0.05,0,18.65
0,0,0.05,0,0,0.2,0.05,0.05,15.75
0.05,1,0.05,0,0.05,0,0.05,0.05,15.4
0.05,1,0,0,0,0.2,0,0,17.15
0.05,1,0,0,0.05,0.2,0.05,0.05,18.9
0,1,0.05,0,0,0.2,0.05,0.05,18.75
你有 9 个独立的回归变量(番茄...价格),每个变量有 13 个样本(13 行)。
所以第一种方法可能是对数据点 "tomato" 进行回归 0.05 0.05 0.05 0.05 0.05 0 0.05 0.05 0 0.05 0.05 0.05 0 然后为 "lettuce" 和其他人做另一个,直到 "price" 18.4 16.15 22.15 19.4 18.4 11.75 18.15 18.65 15.75 15.4 17.15 18.9 18.75
用于查看 CSV 数据的在线查看器:http://www.convertcsv.com/csv-viewer-editor.htm,但 Google SpreadSheet、Excel 等也可以很好地显示它。
SciPy 可能(很可能)也可以在向量上为您完成任务(因此一起处理 9 个变量),但是在 13 行中有 13 个样本的部分仍然存在。
编辑:坏消息,我太累了,没有回答完整的问题,很抱歉。
虽然确实可以将前 8 列(番茄...火腿)作为时间序列,并对它们进行单独回归(这可能是此作业的第一部分),但最后一列 ( price)预计从前8位开始估算。
使用维基百科中的符号,https://en.wikipedia.org/wiki/Linear_regression#Introduction,您的 y
向量是最后一列(价格),X
矩阵是数据的前 8 列(番茄...火腿),在某处用一列 1-s 扩展。
然后选择一种估计方法(该页面中也列出了一些,https://en.wikipedia.org/wiki/Linear_regression#Estimation_methods, but you may want to pick one you have learned about at class). The actual math is there, and NumPy can do the matrix/vector calculations. If you go for "Ordinary least squares", numpy.linalg.lstsq
does the same (https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.lstsq.html#numpy.linalg.lstsq - 您可能会发现添加 1-s 的那一列很熟悉),因此可以用于验证结果。