System Testing and Optimization Friend or Foe?
The
verdict of the world is final.
ST.
AUGUSTINE
The days of untested systems are gone
forever. In fact, the pendulum is now swinging in the other direction. While
unscrupulous operators once sold systems and methods for which they claimed
fantastic results, today's unethical operators use statistics as a tool of deception.
These individuals who, paradoxically, will benefit from the trend toward the
statistical validation of systems can easily dupe the public. Manipulating
statistics is not difficult. Just as Archimedes once said, "Give me a
place to stand on and I can move the earth," the modern systems
promoter would likely say, "Give me enough statistics and I can prove
anything."
This sermonette on system validation makes
the point that merely testing a system and generating highly favorable hypothetical
results does not guarantee success with that system. Nor should such statistics
be used as a security blanket or crutch by traders. Statistics can easily be
manipulated, systems can be (and are) curve-fitted, and results, unless
realistic, will not reflect actual performance when the system is implemented.
While many systems are developed to show
optimum performance, it is imperative that systems be tested to show the worst-
case performance.
Why Test Trading Systems?
Traders test systems for various reasons.
Some test a system merely to say they've done so, only to disregard the outcome
or to accept mediocre results, rationalizing the negative aspects of their
system. Other traders test systems in order to sell them to the public—their
goal is to optimize systems in order to show maximum performance. Then there's
the serious futures trader who tests systems to achieve several goals, including
but not limited to the following:
- To determine whether a theory or hypothetical
construct is valid in historical testing
- To summarize the overall hypothetical performance
of a system and to analyse its various aspects in order to isolate its strong
and weak points
- To determine how different timing indicators
interact with one another to produce an effective trading system
- To explore the interaction of risk and reward
variables (i.e., stop loss, trailing stop loss, position size, etc.) that would
have returned the best overall performance with the smallest draw-down
Test Your Trading System
While it may seem that the last item listed
above refers to optimization, you will see from the discussion of optimization
later in this chapter that it is not optimization according to my definition of
the term. The purpose of testing systems is simply to find what will work best
for you based on what appears to have worked best in the past. In so doing, we
must remember that what worked in the past in hypothetical testing may not
necessarily work in the future.
A thorough test of your trading system should
include at least the following information:
Number
of Years Analysed.
Although it is desirable to test as much data as possible, many trading systems
and indicators do not withstand the test of time. The further back you test,
the less effective most systems will be. Many system developers test only 10
years of historical data, since that best shows their systems. You must make
your own decision regarding the length of your test.
Number
of Trades Analysed. More
important than the number of years analysed is the number of trades. You need
not analyse many years of data if you have a large sample size of trades. I
recommend at least 100 trades, provided your system will generate this number
of trades in back-testing. If you are truly interested in determining the
effectiveness of your system, the more trades you test, the better. Remember
that there will always be a tendency to test fewer trades when you realize that
the system is not holding up under back-testing. Some traders argue that the
factors underlying futures market trends 25 years ago were distinctly different
from those during the past 10 years. They feel that testing 25 years of data
distorts the picture. If they were correct, how would we know when the current
market forces change and that we must therefore change our trading systems? We
are much better off finding systems that work in all types of markets.
Maximum
Drawdown. This is one of the
most important aspects of a trading system. A very large drawdown is a negative
factor, since it eliminates most traders from the game well before the system
would have turned in its positive performance. Because most traders are not
well capitalized, they cannot withstand a large drawdown. However, drawdown is
a function of account size. Obviously, a $15,000 drawdown in a $100,000 account
is not unusual; however, the same drawdown in a $35,000 account is serious. You
may decide to risk large drawdown in order to achieve outstanding performance,
but this is your decision.
Consider also the source of the drawdown by
examining the largest losing trade. If the majority of the drawdown occurred on
only one trade, you will be better off than if the drawdown was spread out over
numerous successive losses.
Maximum
Consecutive Losses. This
performance variable is more psychological than anything else is. An otherwise
excellent trading system may have lost money on many trades in succession. Few
traders can maintain their discipline through four or more successive losing
trades. Even after the third loss, many traders are ready to either abandon
their system or to find ways of changing it. However, at times it is necessary
to weather the storm of 10 or more successive losses. If you know ahead of time
what the worst-case scenario has been, you will be prepared. That's why it's
important for your system test to give you this information.
Largest
Single Losing Trade. This
important piece of information indicates how much of the maximum drawdown is
the result of a single losing trade. And this allows you to adjust the initial
stop loss in retesting the system so as to see how large the average losing
trade has been. If the average losing trade, for example, was $1055 and the
largest single loser was $8466, you can readily see that a good portion of the
average losing trade was a function of the largest loser. This shows that if
you had a better way of managing the large loser (in hindsight, of course),
your overall system performance would have been considerably better.
I strongly recommend close examination of the
trade that resulted in the single largest loss if this loss is clearly much
higher than the average losing trade. Another question to ask is "Why
was the largest single losing trade so much larger than the stop loss
selected?" A single largest losing trade that is several times larger
than your selected stop loss points to a potential problem, perhaps with the
system test. You must investigate further in such cases.
Largest Single Winning Trade. Perhaps more
important than the largest single losing trade is the largest single winning
trade. If, for example, your hypothetical profits total $96,780, and $33,810 of
this is attributed to only one trade, you have a distorted average trade
figure. It's often a good idea to remove this one trade from the overall
results and re compute them in order to show the performance without this
extraordinary winner. You may find that the system you have tested is mediocre,
perhaps even a loser, when the single largest trade has been eliminated from
the performance summary. If you can wait 10 years for the one big trade, then
use the system—but do so against my advice. What you're looking for in any
system with regard to average winning and losing trades is consistency—far more
important than one or two extremely large winning trades that give a distorted
performance picture.
On occasion only several trades may account
for a considerable portion of the net system profits. While some traders feel
that this somehow diminishes the value of the system, I disagree. As long as at
least one-half of the overall system performance is due to trades other than
the largest single winning long and short trade combined, the system is valid.
As far as numbers are concerned, I would not use any system that, after
deducting reasonable slip-page and commission as well as the largest single
long and short winners, does not show at least $100 average profit per trade.
More importantly, because a large portion of
profits in many systems derives from a very small number of trades, it is
imperative that you follow each and every trade as closely to the rules as
possible. Trading systems are not money machines; they don't grind out one
profit after another. Trading systems make their money on the bottom line. There
are many losers and few winners. The losers are kept in check by using money
management stop losses that must, in most cases, be reasonably large.
And the winners, only a few of which are very
large, make the game worth the candle. The trader who can't stick with a
position, or let it ride, is the trader who will be surely disappointed with
the results, because the big winners will be cut short.
Later in this book I will make a case for
systematic market entry and less rigid market exit. Bear in mind, however, that
when this procedure is followed, you must stick with the original system as
closely as possible for market entry. Such adaptations are recommended for the
skilled trader only!
Percentage
Winning Trades. This
statistic is not nearly as important as one might think. In actuality, few
systems have more than 65 percent winning trades, and the more trades in your
sample, the smaller this figure will be. Systems that are correct as little as
30 percent of the time can still be good systems, and systems that are accurate
as much as 80 percent of the time can be bad systems. It's easy to see that
even a high degree of accuracy with a large average losing trade and small
average winning trade does not make a good system.
Average
Trade. This statistic
will tell you what the average hypothetical trade has been. You must make
certain that when you test your system, you deduct slippage and commission from
your average trade. Commissions add up, even discount commissions. And slippage
is an important factor when determining system performance. As a rule of thumb,
I recommend deducting between $75 and $100 per trade for slippage and
commission.
Once this has been done, you will often
significantly reduce the average trade figure. As I pointed out earlier, you
must also pay close attention to the largest winning trade and the largest
losing trade when evaluating the average trade. The average trade figure is
important, since it considers all profits, all losses, slippage, and
commission.
Optimization
There has been considerable controversy about
trading system optimization. What exactly is wrong with optimizing systems? Can
you go too far? Is there a happy medium?
The real issues in system optimization are complex,
and they've been exacerbated by the tendency of systems developers to optimize
their programs above and beyond any reasonable degree. To optimize a system is
to discover the parameters that provide the best results in hypothetical
back-testing. In other words, an optimization is a form of discovering what would
have produced the best results using numerous if-then scenarios.
Before affordable computer hardware and
software were available, optimization was a long and laborious procedure. To
discover the best fit, the systems developer would need to repeatedly backtrack
and test several variables. If the system parameters were numerous, the process
was virtually impossible. Obviously, computers have made this a quick and
efficient task. Now any trader with several thousand dollars can develop
optimized systems.
Such ease of testing and optimizing is both
good and bad. On the one hand, it allows traders to develop, test, and refine
(i.e., optimize) systems much more rapidly. On the other hand, it has opened
the door to what is called curve-fitting. The simple fact is that the powerful
system-testing programs now available allow traders as well as systems vendors
to repeatedly test a host of timing variables, stop losses, and other risk
management schemes in order to determine which combinations would have produced
the best results. In effect, this procedure fits the best parameters on past
history to produce the best hypothetical results. However, the conclusions
reached by such methods are often specious.
The trader who tests and retests to find the
best fit will eventually reach his or her goal, but the goal itself may be
nothing more than a reflection of the curve-fitted results. Tests tell us what
has worked in the past but may not reveal anything worthwhile about the future.
Since the past is not a carbon copy of the future, it is doubtful that the
optimized parameters will work in the future. The more parameters in the
decision-making model, the less likely they are to work in the future.
Overly optimized results lead to false
conclusions. The result will likely mean losses. For those who develop and sell
futures trading systems as a business, optimization is an amazing tool that
allows the creation of outstanding hypothetical performance results that, in turn,
allow systems developers to make incredible claims. And claims sell systems.
Time will tell if I am wrong about overly
optimized systems. Vast personal experience, however, strongly validates my
conclusions. I recall recent developments regarding several popular trading
systems sold by a software developer. The advertised claims were fantastic.
Systems were sold for T-bond futures, S&P futures, and currency futures.
The outstanding performance claims provided a strong media campaign.
Naturally, all of the proper disclaimers were
made to comply with the then-current regulatory requirements. There were no disclaimers
regarding optimized results, however, nor was it disclosed that not all buyers
of the systems would be using the same system parameters. Because the systems
were continually optimized for best results, the hypothetical track records
were truly impressive. However, the results did not jibe with results experienced
by those who had old versions of the software—versions that did not reflect the
new optimized parameters. This is high-tech deception. Recognizing that there
might be legal liability, the systems developers eventually disclosed this fact
in small print. Few buyers understood the meaning of the disclosure and even
fewer cared, given the impressive hypothetical performance record. Naturally,
buyers of the software felt that they could match the hypothetical performance.
In many cases, these traders did well
initially. A customer in my brokerage firm purchased one of these programs and
began trading it strictly according to the rules. The results were impressive.
I began to watch intently every time a trade was made. It was uncanny how well
the system entered and exited trades. It was as if the system had internalized
a sixth sense about the market.
Then, after several months and excellent
results, the system began to unravel. Numerous large losses occurred and performance
deteriorated more rapidly than it had climbed. The dangers of an overly
optimized system became apparent once again.
A Rational Approach to System Development
I do not totally oppose optimizing trading
systems; however, I do favor a rational approach to this procedure. My rule of
thumb is simple: Your trading system should have no more than four to six
variables. You should search for the best combination of entry and exit
variables, as well as a reasonable combination of stop loss and trailing stop
loss amounts. But this is where the optimization should end. The more variables
you build into the system, the less likely will be the future performance of
the parameters.
Another aspect of system development relates
to market personality a topic that has received little attention by most
traders and market analysts. Rather than heavily optimizing a system, I recommend
tailoring your system to the personality characteristics of the individual
markets, provided that such characteristics exist and that they are
sufficiently stable.
Summary
The development and testing of trading
systems is perhaps one of the three most important issues in trading. System
results can be specious if the developer uses faulty rules, optimizes
excessively, or fails to understand the differences between reality and
fantasy. Armed with a computer, historical data, and a few ideas, a trader can
easily fall into the trench of highly optimized systems that look good on paper
but that fail to produce results commensurate with the back-test.
In addition, guidelines for effective system
development were presented with the proper caveats. I defined a number of terms
and gave you some ideas of how to differentiate systems that were likely to go
forward with similar results to their back-tests and systems that were unlikely
to perform as expected.