Using Deep Learning to Estimate NFT Market Value!
Introduction: Estimated Value
At Chain Champs, we want you to pay less for your NFTs! The most common question we get asked is: “how can one marketplace be cheaper than another?”. Technically, it’s true that all WAX NFTs are sourced from the Atomic Market shared liquidity pool and hence the same price on all marketplaces. But at Chain Champs, we give you some HUGE advantages for getting the best price.
Firstly, our live feed is sub-second, so you see the sales that never even appear on other markets because the bots scoop them up.
Secondly, we use state-of-art Machine Learning technology to estimate the Market Value of each NFT so you can make that split second decision to buy deals (often before they appear on other markets).
We feel that Estimated Value (EV) is very important. Not only do we use this to populate the Deals page, it also helps our users determine if they’re getting a good value for their NFTs. Furthermore, on each card, we display the EV beside the USD price, colour-coded in either Red, Yellow, or Green. Good deals don’t last long so the EV needs to be reliable enough for you to be confident when making quick purchasing decisions.
Estimating NFT Value
Finding the Market Value for NFTs is complex since there are a lot of factors that contribute to the price of an NFT on the secondary market. Some of these factors include:
- The price history of the NFT
- How frequently it’s bought and sold
- How long it’s been since it has sold
- The minimum price for that NFT on the market
- The mint number (especially for collectables)
- The number of users holding the NFT
- The staking value for that NFT (when applicable).
- Crazy irrational behaviour like people paying 100 WAX for 0.1 WAX NFTs (It happens more often than you think).
NERD ALERT — If words like “Machine Learning”, “Deep Learning”, or “Recurrent Neural Networks” make your head spin, you might not want to continue.
Finding the right price given all these factors is complex, so here’s how we went about solving this problem:
1st attempt: Synthesize these variables into features and use a ‘regression’ method like LightGBM Regression (Light Gradient Boosted Decision Tree Regressor) to predict the next likely value for each NFT.
Where this model fell short: It’s very challenging to engineer time-based variables for these types of models. For example, the previous sale price is usually a good predictor, except when it’s not and then the model starts spitting out nonsense.
2nd attempt: Ridge Regression with outlier detection to remove outliers from the feature set. This approach produced a more stable model and was used on Chain Champs from November 2021 – Early January 2022.
Where this model fell short:
This model performed well for most cases. However, it failed on low liquidity NFTs, especially when an NFT had not sold for a long time. For example, if an NFT has not been sold for 3 months, it has likely lost considerable value during that time, especially if it is flooding the market (like Capcom Street Fighter). This model was really bad with these cases, and often hugely overestimated the value.
Until now, we focused on synthesizing time-series data into a set of time-invariant features. This worked, but not well. Ultimately, we want to answer the question: If an NFT were to sell right now, what would be the fair market value?
Here’s our limitations:
- We need to predict the value for multiple NFTs with a single model.
- NFTs are sold in irregular intervals
- Unlike a stock market, we can’t assume that buyers are informed/rational.
Given the limitations, any model we choose needs to have the following characteristics:
- Ability to price multiple NFTs given a single model
- Must allow for multiple variables (i.e. Multivariate)
- Ability to predict the value of an NFT given a set of conditions, (i.e. Mint Number, time since last sale, etc.).
- Needs to accept data on irregular intervals
- Needs to be trained on my MacBook Air M1 (Apple finally opened up a GPU core for training models on Mac and it runs impressively fast).
These are the desired modelling characteristics and the limitations. We ultimately settled on using a Deep Learning method known as Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) Cells.
Engineering the Features for the Model
To build a training set, we want to predict the Nth price in a sequence given the N-1 previous measurements. By making a sliding window, we can build a larger training set and hence improve the model performance.
To create a more stable training set, we had to remove outliers from the training data. We found the best approach was to remove NFTs that were purchased for more than 3 times the minimum available market price. For example, see the price for a Standard Shovel in Alien World over time, it is mostly less than a few cents, but can jump up 20x it’s value for no reason.
Below is a plot of multiple NFTs, with the outliers removed. (low mints are not considered outliers and do account for many of the spikes).
The data in this model has 3 features:
- Previous N sales in sequence
- Time between each sale
- Mint Number
The input dimensions for this model was a tensor with dimensions (~1M, N, 3). I.e. 1M Observations, N rows per observation and 3 measurements per row.
This particular problem is a Many-to-One sequence problem, i.e. many features / observations predict a single-value output (i.e. the estimated market value).
After testing a few architectures, we settled on a Bi-directional LSTM model feeding into a single hidden layer and a single value as output.
This model stopped improving after about 150 epochs and took about 6 hours to train on my MacBook Air M1.
This model is what we currently show to you on Chain Champs and is performing better than the previous model. However, we still think we can do better. In the coming weeks, we plan to add new features like rolling averages, NFT scarcity, collection performance, etc. and test new architectures including stacked architectures.
The current model only makes predictions for NFTs with 10 or more sales. Many NFTs have less than this. We would like to include Low-volume NFTs to the prediction.
We would also like to investigate some of the state-of-the-art time series forecasting models, such as N-Beats. However, we’re still waiting on a decent implementation for TensorFlow (and one that works on Apple Silicon).
Made it to the end? Congratulations!
We often discuss this and other interesting NFT data science topics in our community discord! We’ve made a #data-science channel so we can focus on deeper discussions there: https://discord.gg/xbZwqnebnc