We know there is a lot of skepticism around building models that predict prices in the stock market and the crypto world . But, given the reservoir of data, this dilemma, poses an interesting challenge for data aficionados like us at menervasoftware.com. So, we decided why not put data science to work here as well? And what might we find? As the Law of Serendipity goes, Lady Luck favors the one who tries, and our problem statements revolved around the following:
- Since values of cryptocurrencies and tokens have dropped significantly the past year, given the volatile nature of this world, which ERC-20 tokens are doomed to be extinct soon?
- Which tokens are going to defy chance and go up in prices?
- Is there a glimmer of hope for ones whose current values are > $0?
Our approach was simple and here are the steps we took.
Step 1 – Data Gathering
For a data exercise such as this, instead of rambling through the entire universe of crypto, we decided to focus on the 2nd most interesting and stable cryptocurrency, Ethereum, and more specifically, its blockchain, the ERC-20 token eco-system. This “mini-universe” helped us to focus on the life of some 800+ tokens listed on the Ethereum Blockchain Explorer hosted at etherscan.io. Using the names of the tokens listed here, we used the API supported by the Ethereum Tokens Explorer hosted at https://ethplorer.io/ and downloaded all the historical data for the 800+ tokens. Some tokens did not have any data to download and, in the end, we collected tokens’ historical data going back 1 to 3 years (from March 12th, 2019, the date on which we downloaded the data). As we know, one of the ERC-20 tokens represented by the symbol DGD is just under 3 years old during the writing of this article in March of 2019.
Step 2- Data Modeling
Given that this is time-series data, we had to figure out which machine learning algorithm we could consult to model the ERC-20 Token values. To keep things simple, our focus was on the close price.
The second focus was to ensure that the tokens under consideration had enough data to be modeled. This requirement was satisfied only by 515 tokens, we proceeded with these.
Our first iteration used the Moving Average (MA), Autoregressive Integrated Moving Average (ARIMA) and Prophet (the forecasting library from Facebook) to model the data. These yielded Root Mean Square Error (RMSE) values of 22%, 10%, and 12% respectively. We quickly abandoned it and decided to turn to Long short-term memory (LSTM) instead.
Our second iteration using the Long short-term memory (LSTM) method yielded a Root Mean Square Error (RMSE) value of 1.8%. Too good to be true? You bet it was! We were guilty of the dreaded “overfitting” phenomenon.
Our third iteration meant taking a fresh look at what might work best with our data which is non-stationary, seasonal and unpredictable, dependent on multiple latent variables or variables you could not observe directly. What could model such a scenario?
Enter the “Hidden Markov Model”! According to Wikipedia, “Hidden Markov Model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (i.e. hidden) states” . To explain further, HMM is a specific case of the state space model in which the latent variables are discrete and multinomial variables. From a graphical representation, you can consider HMM to be a double stochastic process consisting of a hidden stochastic Markov process (of latent variables) that you cannot observe directly and another stochastic process that produces a sequence of the observation given the first process.
Given the above, we tried to fit the HMM model with an optimum number of transactions for all the ERC-20 tokens.
Taking AOA an example, the plot below shows the actual and the predicted price of Token AOA from 3rd December 2018 to 12th March 2019.
We used MAPE (Mean Absolute Percent Error) to measure the model accuracy and found that 70% of the tokens, so 375 out of 515 tokens’ predictions had 90% confidence or above, as shown in the following graph.
Our results were encouraging and it did seem like our third time had the charm! We decided to forecast the next 60 observations for all the tokens. This meant data would be forecast from March 13th to May 10th, 2019.
Shown next is the close price plot for the token AOA, depicting all of its historical data, the model predictions (which overlaps with a section of the historical) and the forecast data from March 13th to May 10th, 2019.
Following is the close price plot for the token MOC, for which the MAPE was 1%.
Following is the close price plot for the token WISH, whose MAPE was also 1%.
Following is the close price plot for the token EURS, for which the MAPE was 2%.
The token BNB had a MAPE of 6%. The following shows the close price plot for BNN.
Step 3 – Data Analysis
Using February 15th as the Reference Date and the close price for the token on that date as the Reference Price, we analyzed the following using the forecasted data
- The minimum value price of each token for the forecast period that went from March 13th for May 13th, 2019, the date when this occurred and the percent drop in price from the Reference Price
- The maximum value price of the token for the forecast period that went from March 13th for May 13th, 2019, and the date when this occurred and the percent increase in price from the Reference Price
We filtered the rows for which the minimum and maximum value prices occurred in April and May of 2019. For those whose minimum value prices are in the negatives, the question we are yet to answer is whether these are doomed to go extinct soon? Time will tell!
Though this is a buyer’s market, at least for the tokens that are hitting maximum value prices in April and May of 2019, maybe the trend is changing to a seller’s market.
To wrap up, the HMM model presented us with some very interesting results. It definitely seems to have the potential to be researched further and put to further use.
The one caveat was that in our case, the performance of the algorithm was tested by training HMMs on 12 different tokens over varying periods of time, and the models were assumed to independent of one another. In reality, though, these very tokens may be heavily correlated to each other. As a future step, it will be helpful to try and build a model which takes these correlations into consideration to achieve better results.
Another takeaway is to study other existing models used in traditional finance, as well as in the crypto world .
 Link to the article, “Can Math Beat Financial Markets?” by David Biello. Retrieved from https://www.scientificamerican.com/article/can-math-beat-financial-markets/ on March 15th, 2019.
 Link to the Ethereum Blockchain Explorer. Retrieved from https://etherscan.io/ on March 12th, 2019.
 Link to the Ethereum Token Explorer. Retrieved from https://ethexplorer.io/ on March 12th, 2019.
 Link to the definition of “Hidden Markov Model” in Wikipedia. Retrieved from https://en.wikipedia.org/wiki/Hidden_Markov_model on March 15th, 2019.
 Link to the article, “Cryptoasset Valuations” by Chris Burniske. Retrieved from https://medium.com/@cburniske/cryptoasset-valuations-ac83479ffca7 on March 15th, 2019.
Mahesh Divakaran, Eldho Shaji, Anish Mohammed, and Geetha Ramaswamy
We are not affiliated with any crypto agencies or have any financial incentive for publishing these findings. As developers, we are interested in the fluctuations of the crypto markets and will continue to study its data. At no point should the data or the analysis presented here be used for making real-life investments. We are not accountable or liable if this analysis is used for such unintended purposes.