如何将寓言/预测在 R 中应用于此数据库？

如何解决如何将寓言/预测在 R 中应用于此数据库？

我正在尝试使用 R 中的 Fable 函数预测多个时间序列。这似乎是最有效的方法，但我是使用 R 的新手，所以我目前正在处理很多问题。我只是想向某人寻求建议和想法。我已经找到了如何仅使用预测功能包来做到这一点，但需要很多额外的步骤。我的数据是一个有 5701 列和 50 行的 excel。每一列作为第一行中的产品名称，接下来的 49 个值是数字，代表从 2017 年 1 月到 2021 年 1 月的销售额。首先，如何将该表转换为 tsibble？我知道我需要这样做才能使用 Fable，但我被困在如此简单的步骤中。然后我想输出一个表格，其中包含未来 3 个学期（2021 年 4 月至 2022 年 9 月）的每月预测，其中产品|日期|模型 Arima（值）| arima 误差（值/值）|模型 ETS|错误ETS|模型天真|天真错误..等。我的主要目标是得到一张包含产品的表格|2021 年 4 月/2021 年 9 月的最佳预测|2021 年 10 月/2021 年 10 月的最佳预测|2022 年 4 月/2022 年 9 月的最佳预测|

我所做的是使用此代码：

newdata <- read_excel("ALLINCOLUMNS.xlsx")
Fcast <- ts(newdata[,1:5701],start= c(1),end=c(49),frequency=12)
output <- lapply(Fcast,function(x) forecast(auto.arima(x)))
prediction <- as.data.frame(output)
write.table(prediction,file= "C:\\Users\\thega\\OneDrive\\Documentos\\finalprediction.csv",sep=",")

默认情况下，它给了我格式 |product1.Point.Forecast||Product1.Lo.80||Product1.Hi.80|Product1.Lo.95|Product1.Hi.95|Product2 的东西。 Point.Forecast|...|Product5071.Hi.95|... 无论如何，我不需要 80 和 95 间隔，这让我在 excel 中使用它变得更加困难。如何以格式获取内容： |点预测产品1|点预测产品2|....|点预测产品5701|，只显示预测？我知道我必须在预测函数中使用 level=NULL ，但它并没有以我尝试过的方式工作。我打算做一个编程来删除这些列，但它不太优雅。最后，有没有办法显示列中方法的所有错误？我想将最好的方法添加到我的表中，因此我需要验证哪个错误较少。

解决方法

{fable} 包在数据采用整洁格式时效果最佳。在您的情况下，产品应该跨行而不是列表示。您可以在此处阅读有关什么是整洁数据的更多信息：https://r4ds.had.co.nz/tidy-data.html 完成此操作后，您还可以在此处阅读有关时间序列的整洁数据的信息：https://otexts.com/fpp3/tsibbles.html

如果没有您的数据集，我只能猜测您的 Fcast 对象（ts() 数据）看起来像这样：

Fcast <- cbind(mdeaths,fdeaths)
Fcast
#>          mdeaths fdeaths
#> Jan 1974    2134     901
#> Feb 1974    1863     689
#> Mar 1974    1877     827
#> Apr 1974    1877     677
#> May 1974    1492     522
#> Jun 1974    1249     406
#> Jul 1974    1280     441
#> and so on ...

也就是说，您的每个产品都有自己的列（并且您有 5701 个产品，而不仅仅是我将在示例中使用的 2 个）。

如果您已经拥有 ts 对象中的数据，您可以使用 as_tsibble(<ts>) 将其转换为整洁的时间序列数据集。

library(tsibble)
as_tsibble(Fcast,pivot_longer = TRUE)
#> # A tsibble: 144 x 3 [1M]
#> # Key:       key [2]
#>       index key     value
#>       <mth> <chr>   <dbl>
#>  1 1974 Jan fdeaths   901
#>  2 1974 Feb fdeaths   689
#>  3 1974 Mar fdeaths   827
#>  4 1974 Apr fdeaths   677
#>  5 1974 May fdeaths   522
#>  6 1974 Jan mdeaths  2134
#>  7 1974 Feb mdeaths  1863
#>  8 1974 Mar mdeaths  1877
#>  9 1974 Apr mdeaths  1877
#> 10 1974 May mdeaths  1492

^{由 reprex package (v0.3.0) 于 2021 年 2 月 25 日创建}

设置 pivot_longer = TRUE 会将列收集为长格式。这种格式适用于 {fable} 包。我们现在有一个 key 列，用于存储系列名称（您数据的产品 ID），而值存储在 value 列中。

有了适当格式的数据，我们现在可以使用 auto ARIMA() 和 forecast() 来获得预测：

library(fable)
#> Loading required package: fabletools
as_tsibble(Fcast,pivot_longer = TRUE) %>% 
  model(ARIMA(value)) %>% 
  forecast()
#> # A fable: 48 x 5 [1M]
#> # Key:     key,.model [2]
#>    key     .model          index        value .mean
#>    <chr>   <chr>           <mth>       <dist> <dbl>
#>  1 fdeaths ARIMA(value) 1980 Jan N(825,6184)  825.
#>  2 fdeaths ARIMA(value) 1980 Feb N(820,6184)  820.
#>  3 fdeaths ARIMA(value) 1980 Mar N(767,6184)  767.
#>  4 fdeaths ARIMA(value) 1980 Apr N(605,6184)  605.
#>  5 fdeaths ARIMA(value) 1980 May N(494,6184)  494.
#>  6 fdeaths ARIMA(value) 1980 Jun N(423,6184)  423.
#>  7 fdeaths ARIMA(value) 1980 Jul N(414,6184)  414.
#>  8 fdeaths ARIMA(value) 1980 Aug N(367,6184)  367.
#>  9 fdeaths ARIMA(value) 1980 Sep N(376,6184)  376.
#> 10 fdeaths ARIMA(value) 1980 Oct N(442,6184)  442.
#> # … with 38 more rows

^{由 reprex package (v0.3.0) 于 2021 年 2 月 25 日创建}

您还可以通过在 model() 中指定多个模型来计算其他模型的预测。

Fcast <- cbind(mdeaths,fdeaths)
library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following objects are masked from 'package:base':
#> 
#>     intersect,setdiff,union
library(fable)
#> Loading required package: fabletools
as_tsibble(Fcast,pivot_longer = TRUE) %>% 
  model(arima = ARIMA(value),ets = ETS(value),snaive = SNAIVE(value)) %>% 
  forecast()
#> # A fable: 144 x 5 [1M]
#> # Key:     key,.model [6]
#>    key     .model    index        value .mean
#>    <chr>   <chr>     <mth>       <dist> <dbl>
#>  1 fdeaths arima  1980 Jan N(825,6184)  825.
#>  2 fdeaths arima  1980 Feb N(820,6184)  820.
#>  3 fdeaths arima  1980 Mar N(767,6184)  767.
#>  4 fdeaths arima  1980 Apr N(605,6184)  605.
#>  5 fdeaths arima  1980 May N(494,6184)  494.
#>  6 fdeaths arima  1980 Jun N(423,6184)  423.
#>  7 fdeaths arima  1980 Jul N(414,6184)  414.
#>  8 fdeaths arima  1980 Aug N(367,6184)  367.
#>  9 fdeaths arima  1980 Sep N(376,6184)  376.
#> 10 fdeaths arima  1980 Oct N(442,6184)  442.
#> # … with 134 more rows

^{由 reprex package (v0.3.0) 于 2021 年 2 月 25 日创建}

.model 列现在标识用于生成每个预测的模型，其中有 3 个模型。

如果您想并排关注点预测，您可以tidyr::pivot_wider() 跨多列预测.mean 值。

library(tsibble)
library(fable)
library(tidyr)
Fcast <- cbind(mdeaths,fdeaths)
as_tsibble(Fcast,snaive = SNAIVE(value)) %>% 
  forecast() %>% 
  as_tibble() %>% 
  pivot_wider(id_cols = c("key","index"),names_from = ".model",values_from = ".mean")
#> # A tibble: 48 x 5
#>    key        index arima   ets snaive
#>    <chr>      <mth> <dbl> <dbl>  <dbl>
#>  1 fdeaths 1980 Jan  825.  789.    821
#>  2 fdeaths 1980 Feb  820.  812.    785
#>  3 fdeaths 1980 Mar  767.  746.    727
#>  4 fdeaths 1980 Apr  605.  592.    612
#>  5 fdeaths 1980 May  494.  479.    478
#>  6 fdeaths 1980 Jun  423.  413.    429
#>  7 fdeaths 1980 Jul  414.  394.    405
#>  8 fdeaths 1980 Aug  367.  355.    379
#>  9 fdeaths 1980 Sep  376.  365.    393
#> 10 fdeaths 1980 Oct  442.  443.    411
#> # … with 38 more rows

^{由 reprex package (v0.3.0) 于 2021 年 2 月 25 日创建}

您可以在此处了解如何评估这些模型/预测的准确性：https://otexts.com/fpp3/accuracy.html

如何将寓言/预测在 R 中应用于此数据库？

如何解决如何将寓言/预测在 R 中应用于此数据库？

解决方法

相关推荐