库存预测技术技术分析和未来价格预测

发布时间:2021-04-04 发表于话题:未来股市行情分析预测 点击:10 当前位置:财神股票资讯网 > 教育 > 库存预测技术技术分析和未来价格预测 手机阅读

创建目标变量 (Creating target variable)

多列火车测试拆分 (Multiple train test split)

We are dealing with time-series data here and need to split the data observed at fixed time intervals, in train/test sets. In each split, test indices would be higher than before. To simplify, we are here repeating the process of splitting the time series into train and test sets multiple times. Here, multiple models to be trained and evaluated at additional computational expense but in return we will have a robust estimate on unseen data.

我们在这里处理时间序列数据,需要将在固定时间间隔观察到的数据拆分为训练/测试集中的数据。 在每个分组中,测试指数将比以前更高。 为简化起见,我们在这里重复多次将时间序列分为训练集和测试集的过程。 在这里,要训练和评估多个模型需要付出额外的计算费用,但作为回报,我们将对看不见的数据进行可靠的估计。

Considering our data size is small, we have used 2 splits. The training set has size i * n_samples // (n_splits + 1) + n_samples % (n_splits + 1) in the i``th split, with a test set of size ``n_samples//(n_splits + 1), where n_samples is the number of samples. More details on this can be found at scikit learn user guide.

考虑到我们的数据量很小,我们使用了2个拆分。 训练集在第i个拆分中具有大小i * n_samples //(n_splits + 1)+ n_samples%(n_splits + 1) ,测试集的大小为`` n_samples //(n_splits + 1) ,其中n_samples是样本数。 有关更多详细信息,请参见scikit Learn用户指南。

线性回归 (Linear Regression)

We see here that, model accuracy dropped from 92.67% to 73.7% on test data. Let us view the residuals to ensure that, linear model fits our given data set.

我们在这里看到,根据测试数据,模型的准确性从92.67%下降到73.7%。 让我们查看残差以确保线性模型适合我们的给定数据集。

残差图 (Residuals Plot)

Residuals are the difference between the observed value of the target variable. The residuals plot shows the difference between residuals on the y- axis and the dependent variable on the x-axis. A common use of the residuals plot is to analyze the variance of the error of the model.

残差是目标变量的观察值之间的差。 残差图显示y轴上的残差与x轴上的因变量之间的差异。 残差图的常见用法是分析模型误差的方差。

If the points are randomly dispersed around the x-axis, a linear regression model is usually appropriate for the data; otherwise, a non-linear model is more appropriate. Here, we see a fairly random, uniform distribution of the residuals against the target in two dimensions. This seems to indicate that our linear model is performing well. We can also see from the histogram that our error is normally distributed around zero, indicating a well fitted model.

如果这些点围绕x轴随机散布,则通常适用于数据的线性回归模型; 否则,非线性模型更为合适。 在这里,我们在二维图中看到相对于目标的残差相当随机,均匀的分布。 这似乎表明我们的线性模型运行良好。 我们还可以从直方图中看到,我们的误差通常在零附近分布,表明模型拟合良好。

In regard to improving the performance on test data, few things can be done:

关于提高测试数据的性能,可以做的事情很少:

add more training samples,

添加更多的训练样本, experiment with additional features or reduce number of features,

尝试其他功能或减少功能数量, may experiment with different set of features,

可以尝试不同的功能集, try other regressor such as RANSAC or Huber which can deal with outliers.

尝试使用其他可处理异常值的回归器,例如RANSAC或Huber。

预测 (Forecast)

Below we have used our separately kept dateset (X_future_prediction) for 15 days future prediction.

下面,我们将单独保存的日期集(X_future_prediction)用于未来15天的预测。

未来价格可视化 (Future price visualization)

本文来源:https://www.thyysj.com/info/402613.html

标签组:[残差

相关APP下载

热门话题

教育推荐文章

教育热门文章