Backtest a strategy on different datasets
Suppose you wanted to test a strategy on more than one
dataset. The function btest in package PMwR
provides a convenient way to do this.
Start with the data. We create two datasets (but it
could be more than two): industry indices from Kenneth
French's Data Library. Both are multivariate zoo
objects, and we call them prices.17 and prices.48.
library("NMOF") library("zoo") prices.17 <- French("~/Downloads/French", "17_Industry_Portfolios_daily_CSV.zip", price.series = TRUE) prices.17 <- window(zoo(prices.17, as.Date(row.names(prices.17))), start = as.Date("2000-1-1")) prices.48 <- French("~/Downloads/French", "48_Industry_Portfolios_daily_CSV.zip", price.series = TRUE, na.rm = TRUE) prices.48 <- window(zoo(prices.48, as.Date(row.names(prices.48))), start = as.Date("2000-1-1"))
Suppose that we wanted to see the performance of
equally-weighted portfolios, rebalanced quarterly.
The signal function, which btest requires, could be
written as follows.
ew <- function() { k <- ncol(Close()) rep(1/k, k) }
In fact, the function could be simpler: we know the number of assets in the portfolios – 17 and 48 –, so there would be not need to compute them from the data. Instead, we could pass them as arguments. But we want the code to be as simple as possible, and the speedup would be minuscule.
It is easy enough to call btest two times now.
library("PMwR") bt.17 <- btest(list(coredata(prices.17)), signal = ew, initial.cash = 100, convert.weights = TRUE, timestamp = index(prices.17), do.signal = "lastofquarter") bt.48 <- btest(list(coredata(prices.48)), signal = ew, initial.cash = 100, convert.weights = TRUE, timestamp = index(prices.48), do.signal = "lastofquarter") bt.17 bt.48 plot(bt.48, col = "darkgreen") lines(bt.17, col = "blue")
initial wealth 100 => final wealth 464.62 Total return 364.6% initial wealth 100 => final wealth 536.67 Total return 436.7%
Even easier would be to call btest only once. (In
particular if we had more than two datasets.)
prices <- list(list(coredata(prices.17)), list(coredata(prices.48))) bt <- btest(signal = ew, initial.cash = 100, convert.weights = TRUE, do.signal = "lastofquarter", timestamp = index(prices.17), variations = list(prices = prices), variations.settings = list(labels = c("sec17", "sec48"))) bt
$sec17 initial wealth 100 => final wealth 464.62 Total return 364.6% $sec48 initial wealth 100 => final wealth 536.67 Total return 436.7%
All we had to do was package both datasets together
into one list, and then place that list into a named
list: this latter list, in turn, we passed to argument
variations.
str(list(prices = prices))
List of 1 $ prices:List of 2 ..$ :List of 1 .. ..$ : num [1:4967, 1:17] 2920 2871 2894 2932 3031 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:4967] "2000-01-03" "2000-01-04" "2000-01-05" ... .. .. .. ..$ : chr [1:17] "Food" "Mines" "Oil" "Clths" ... ..$ :List of 1 .. ..$ : num [1:4967, 1:48] 388 378 389 390 401 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:4967] "2000-01-03" "2000-01-04" "2000-01-05" ... .. .. .. ..$ : chr [1:48] "Agric" "Food" "Soda" "Beer" ...
This list-of-list structure may seem complicated at
first. But it's simple, actually: variations expects
as input a named list. The names of this list are
matched against the arguments names of btest. In our
case, these variations have an element named
prices. Now, btest loops over all elements in
prices, in effect calling
btest(prices = variations$prices[[1]], ...) btest(prices = variations$prices[[2]], ...)
and so on.
Looping is the default way in which btest evaluates
variations. Alternatively, we could instruct the
function to run the backtests in parallel.
bt <- btest(signal = ew, initial.cash = 100, convert.weights = TRUE, timestamp = index(prices.17), do.signal = "lastofquarter", variations = list(prices = prices), variations.settings = list(labels = c("sec17", "sec48"), method = "multicore")) bt
$sec17 initial wealth 100 => final wealth 464.62 Total return 364.6% $sec48 initial wealth 100 => final wealth 536.67 Total return 436.7%
If you run the examples, you will find they are quickly computed, and so distribution does not offer much of an advantage. But for more-expensive models, running them in parallel can save quite some time.
When you look at the example closely, you will find
that we passed the same timestamp information for
both datasets. For these particular datasets, this is
OK because the timestamps are indeed identical.
all.equal(index(prices.17), index(prices.48))
[1] TRUE
A longer, but safer version would have been this:
btest(signal = ew,
initial.cash = 100,
convert.weights = TRUE,
do.signal = "lastofquarter",
variations = list(prices = prices,
timestamp = list(index(prices.17),
index(prices.48))),
variations.settings = list(labels = c("sec17", "sec48"),
expand.grid = FALSE))
$sec17 initial wealth 100 => final wealth 464.62 Total return 364.6% $sec48 initial wealth 100 => final wealth 536.67 Total return 436.7%
This version would be equivalent to calling
btest(prices = variations$prices [[1]],
timestamp = variations$timestamp[[1]], ...)
btest(prices = variations$prices [[2]],
timestamp = variations$timestamp[[2]], ...)
But note that we have set
variations.settings$expand.grid to FALSE. If we
hadn't, btest would have computed all combinations of
prices and timestamps.