Dec 08, · Bitcoin Code is a bitcoin trading robot that claims to help ordinary traders make huge returns on their bitcoin investment with an investment as little as . The Bitcoin market’s ﬁnancial analog is, of course, a stock market. To maximize ﬁnancial reward, the ﬁeld of stock market prediction has grown over the past decades, and has more recently exploded with the advent of high-frequency, low-latency trading hardware coupled with robust machine learning algorithms. Thus, it makes sense that. Apr 27, · The purpose of this series of articles is to experiment wi t h state-of-the-art deep reinforcement learning technologies to see if we can create profitable Bitcoin trading bots. It seems to be the status quo to quickly shut down any attempts to create reinforcement learning algorithms, as it is “the wrong way to go about building a trading.
Machine learning bitcoin trading bottrading-bot · GitHub Topics · GitHub
In a nutshell,. Bayesian optimization is a technique for efficiently searching a hyperspace to find the set of parameters that maximize a given objective function. In simpler terms, Bayesian optimization is an efficient method for improving any black box model.
It works by modeling the objective function you want to optimize using a surrogate function, or a distribution of surrogate functions.
That distribution improves over time as the algorithm explores the hyperspace and zones in on the areas that produce the most value.
How does this apply to our Bitcoin trading bots? Essentially, we can use this technique to find the set of hyper-parameters that make our model the most profitable. We are searching for a needle in a haystack and Bayesian optimization is our magnet.
Optimizing hyper-parameters with Optuna is fairly simple. A trial contains a specific configuration of hyper-parameters and its resulting cost from the objective function. We can then call study. In this case, our objective function consists of training and testing our PPO2 model on our Bitcoin trading environment.
The cost we return from our function is the average reward over the testing period, negated. We need to negate the average reward, because Optuna interprets lower return value as better trials. The optimize function provides a trial object to our objective function, which we then use to specify each variable to optimize. The search space for each of our variables is defined by the specific suggest function we call on the trial, and the parameters we pass in to that function. For example, trial.
Further, trial. The study keeps track of the best trial from its tests, which we can use to grab the best set of hyper-parameters for our environment. I have trained an agent to optimize each of our four return metrics: simple profit, the Sortino ratio, the Calmar ratio, and the Omega ratio.
Before we look at the results, we need to know what a successful trading strategy looks like. For this treason, we are going to benchmark against a couple common, yet effective strategies for trading Bitcoin profitably. Believe it or not, one of the most effective strategies for trading BTC over the last ten years has been to simply buy and hold.
The other two strategies we will be testing use very simple, yet effective technical analysis to create buy and sell signals.
While this strategy is not particularly complex, it has seen very high success rates in the past. RSI divergence. When consecutive closing price continues to rise as the RSI continues to drop, a negative trend reversal sell is signaled. A positive trend reversal buy is signaled when closing price consecutively drops as the RSI consecutively rises.
The purpose of testing against these simple benchmarks is to prove that our RL agents are actually creating alpha over the market. I must preface this section by stating that the positive profits in this section are the direct result of incorrect code.
Due to the way dates were being sorted at the time, the agent was able to see the price 12 hours in advance at all times, an obvious form of look-ahead bias. This has since been fixed, though the time has yet to be invested to replace each of the result sets below. Please understand that these results are completely invalid and highly unlikely to be reproduced.
That being said, there is still a large amount of research that went into this article and the purpose was never to make massive amounts of money, rather to see what was possible with the current state-of-the-art reinforcement learning and optimization techniques. So in attempt to keep this article as close to the original as possible, I will leave the old invalid results here until I have the time to replace them with new, valid results. This simple cross validation is enough for what we need, as when we eventually release these algorithms into the wild, we can train on the entire data set and treat new incoming data as the new test set.
Watching this agent trade, it was clear this reward mechanism produces strategies that over-trade and are not capable of capitalizing on market opportunities. The Calmar-based strategies came in with a small improvement over the Omega-based strategies, but ultimately the results were very similar.
Remember our old friend, simple incremental profit? If you are unaware of average market returns, these kind of results would be absolutely insane. Surely this is the best we can do with reinforcement learning… right? When I saw the success of these strategies, I had to quickly check to make sure there were no bugs. Instead of over-trading and under-capitalizing, these agents seem to understand the importance of buying low and selling high, while minimizing the risk of holding BTC.
Regardless of what specific strategy the agents have learned, our trading bots have clearly learned to trade Bitcoin profitably. Now, I am no fool. I understand that the success in these tests may not [read: will not] generalize to live trading. It is truly amazing considering these agents were given no prior knowledge of how markets worked or how to trade profitably, and instead learned to be massively successful through trial and error alone along with some good old look-ahead bias.
Add a description, image, and links to the trading-bot topic page so that developers can more easily learn about it. Curate this topic. To associate your repository with the trading-bot topic, visit your repo's landing page and select "manage topics.
Describe the enhancement freqtrade generates the output of max-drawdown, which is very useful, but lacks an important information.. Backtest Enhancement Good first issue. Open Ability to set a custom fee for dry-run trading. Star 3. Updated Sep 29, TypeScript. Star 3k. Updated Nov 9, Jupyter Notebook.
Our observation space could only even take on a discrete number of states at each time step. However, by randomly traversing slices of the data frame, we essentially manufacture more unique data points by creating more interesting combinations of account balance, trades taken, and previously seen price action for each time step in our initial data set.
Let me explain with an example. At time step 10 after resetting a serial environment, our agent will always be at the same time within the data frame, and would have had 3 choices to make at each time step: buy, sell, or hold. Now consider our randomly sliced environment. At time step 10, our agent could be at any of len df time steps within the data frame. While this may add quite a bit of noise to large data sets, I believe it should allow the agent to learn more from our limited amount of data.
For example, here is a visualization of our observation space rendered using OpenCV. The first 4 rows of frequency-like red lines represent the OHCL data, and the spurious orange and yellow dots directly below represent the volume.
If you squint, you can just make out a candlestick graph, with volume bars below it and a strange morse-code like interface below that shows trade history. Whenever self. Finally, in the same method, we will append the trade to self.
Our agents can now initiate a new environment, step through that environment, and take actions that affect the environment. Our render method could be something as simple as calling print self. Instead we are going to plot a simple candlestick chart of the pricing data with volume bars and a separate plot for our net worth.
We are going to take the code in StockTradingGraph. You can grab the code from my GitHub. The first change we are going to make is to update self. Next, in our render method we are going to update our date labels to print human-readable dates, instead of numbers.
Finally, we change self. Back in our BitcoinTradingEnv , we can now write our render method to display the graph. And voila! We can now watch our agents trade Bitcoin. The green ghosted tags represent buys of BTC and the red ghosted tags represent sells.
Simple, yet elegant. One of the criticisms I received on my first article was the lack of cross-validation, or splitting the data into a training set and test set.
The purpose of doing this is to test the accuracy of your final model on fresh data it has never seen before. While this was not a concern of that article, it definitely is here. For example, one common form of cross validation is called k-fold validation, in which you split the data into k equal groups and one by one single out a group as the test group and use the rest of the data as the training group. However time series data is highly time dependent, meaning later data is highly dependent on previous data.
This same flaw applies to most other cross-validation strategies when applied to time series data. So we are left with simply taking a slice of the full data frame to use as the training set from the beginning of the frame up to some arbitrary index, and using the rest of the data as the test set.
Next, since our environment is only set up to handle a single data frame, we will create two environments, one for the training data and one for the test data. Now, training our model is as simple as creating an agent with our environment and calling model. Here, we are using tensorboard so we can easily visualize our tensorflow graph and view some quantitative metrics about our agents.