June 20th 2023

Christian GaleaFollowTowards Data Science--ListenShareIn the world of gaming, companies strive not only to attract players but also to retain them for as long as possible, especially in free-to-play games that rely on in-game micro-transactions. These micro-transactions often involve the purchase of in-game currency, allowing players to acquire items for progression or customization, and funding the game’s development. Monitoring the churn rate, which represents the Number of players who stop playing, is crucial. This is because a high churn rate means a significant loss in income, which in turn leads to higher stress levels for developers and managers.This article explores the use of a real-world dataset based on data acquired from a mobile app, specifically focusing on the levels played by users. Leveraging machine learning, which has become an essential part of the technology landscape and forms the basis of Artificial Intelligence (AI), businesses can extract valuable insights from their data.However, building machine learning models typically demands coding and data science expertise, making it inaccessible for many individuals and smaller companies lacking resources for hiring data scientists or powerful hardware to handle complex algorithms.To address these challenges, low-code and no-code machine learning platforms have emerged with the aim of simplifying machine learning and data science processes, thereby mitigating the need for extensive coding knowledge. Examples of such platforms include Einblick, KNIME, Dataiku, Alteryx, and Akkio.This article uses one low-code machine learning platform to train a Model capable of predicting if a user will stop playing a game. Additionally, it delves into results interpretation and techniques that can be used to improve the model’s performance.The rest of this article is organised as follows:Full disclosure — I am a data scientist with Actable AI at the time of writing this article, so it is the platform that will be used in this article. I am also involved in implementing new features in the ML library and maintaining them, so I was curious to see how the platform would fare on a real-world problem.The platform provides a web application with a number of popular machine learning methods for the traditional applications of classification, regression, and segmentation. A number of less common tools are also available, such as time-series forecasting, sentiment analysis, and causal inference. Missing data can also be imputed, statistics of a dataset can be computed (such as correlation among features, Analysis of Variance (ANOVA), and so on), while data can be visualized using tools such as bar charts, histograms, and word clouds.A Google Sheets add-on is also available, enabling analyses and model training to be done directly within a spreadsheet. However, do note that newer features may not be available in this add-on.The core library is open-source and available on GitHub, and is composed of several well-known and trusted frameworks such as AutoGluon and scikit-learn that are also open-source and freely available. This is not dissimilar to other related platforms, which also take advantage of existing open-source solutions.However, this begs the question: why would you use such platforms at all, if most of the tools are already available and free to use?The main reason is that these tools require knowledge of programming languages such as Python, so anyone who may not be familiar with coding in general may find it hard or impossible to use. Hence, these platforms aim to provide all the functionalities in the form of a Graphical User Interface (GUI), rather than as a set of programming commands.More experienced professionals could also potentially benefit by saving time through a graphical interface that is easy to use and that may also provide informative descriptions of the available tools and techniques. Some platforms could also present tools which you may not have been familiar with, or provide potentially helpful warnings (such as the presence of data leakage, which is when the model has access to features that will not be available when deployed to production on unseen data) when working with your data.Another reason to use these kinds of platforms is that the hardware on which to run models is also provided. Hence, one does not need to buy and maintain their own computers and components such as Graphical Processing Units (GPUs).The dataset, provided by a gaming company using the platform, can be viewed here and has a CC BY-SA-4 license associated with it, allowing sharing and adaptation as long as the appropriate credit is provided. It has a total of 789,879 rows (samples), which is quite substantial and should help to reduce effects such as model over-fitting.The dataset contains information about each level that a person has played in a mobile app. For example, there’s information on the amount of time played, whether the player won or lost the level, the level number, and so on.The user IDs have also been included, but they have been anonymised so as not to reveal the original players’ identities. Some fields have also been removed. However, it should provide a solid basis to see if the tools provided by the ML platform considered in this article can be useful in trying to predict whether a player will churn.The meaning of each feature is as follows:The first step before training is to get an understanding of the data through Exploratory Data Analysis (EDA). EDA is a data analysis approach that involves summarizing, visualizing, and understanding the main characteristics of a dataset. The goal is to gain insights into the data and identify any patterns, trends, anomalies, or any issues (e.g. missing values) that may be present, and which can help inform the features and models to be used.Let’s start by checking out the main reasons for levels being ended:In the image above, we can see that the predominant cause of the level ending (represented by EndType) is due to the player losing the game (63.6%) versus 35.2% of players winning the game. We can also see that the UsedChangeCar column appears to be useless since it contains the same value for all rows.A very important observation is that our target value is highly imbalanced, with only 63 samples out of the first 10,000 rows (i.e. 0.6% of the data) having a churn value of 1 (i.e. a player has churned). This will need to be kept in mind, because it is likely that our models can be very easily biased to only predict a value of 0 for Churn. The reason is that the model can attain very good values for some metrics such as accuracy; in this case, if a dummy model that simply selects the most prevalent class, it will be right 99.4% of the time! I invite you to read more about this in two great articles by Baptiste Rocca and Jason Brownlee.Unfortunately, Actable AI does not yet offer any way to handle imbalanced data, such as via the Synthetic Minority Oversampling Technique (SMOTE), or by using class weights or different sampling strategies. This means that we would need to be careful when it comes to the metric chosen for optimisation. As mentioned above, accuracy would not be the best choice given that a high rate can be achieved even if the samples of one class are never labeled correctly.Another useful type of analysis is the correlation between features, especially those of the predictor features with the target feature. This can be done using the ‘Correlation Analysis’ tool, the results of which can be viewed directly on the Actable AI platform here:In the chart above, the blue bars indicate positive correlation of a feature with the Churn when the value is equal to 1, while the orange bars indicate negative feature correlations. It should be noted that correlation lies between -1 and 1, where positive values represent that both features tend to change in the same direction (e.g. both increase or both decrease), whereas negative correlation simply indicates that when one of the features increases or decreases, the other does the opposite. As such, the magnitude of the correlation (ignoring the negative sign) is perhaps the most important thing of note.There are a number of takeaways, such as players that lose a level being more susceptible to churning (top-most blue bar), and conversely that players who win a level tend to keep on playing (third orange bar). However, it should also be noted that the values are fairly low, indicating that these features are quite weakly correlated with the target. This means that it will probably be necessary to perform feature engineering, whereby the existing features are used to create new ones that capture more salient information that would enable a model to perform more accurate predictions. Feature engineering will be discussed in further detail later in this article.However, before creating new features, it is worth seeing what sort of performance we can achieve using just the original features in our dataset. The next step will thus probably be more exciting — training a model to see what sort of performance can be attained.Since we would like to predict if a user will stop playing or not, this is a classification problem where one of a number of labels needs to be selected. In our case, the problem involves assigning one of two labels (‘1’ corresponding to ‘Churn’, and ‘0’ corresponding to ‘No Churn’), which further makes it a binary classification problem.This process is done primarily via the AutoGluon library, which automatically trains a number of models to then select the one attaining the best performance. This avoids having to manually train individual models and compare their performance.A number of parameters need to be set in the Actable AI platform, with my choices shown below:The metric to use for optimisation of the models can also be chosen. I used the Area under the Receiver Operating Characteristics (AUC ROC) Curve, since it is much less sensitive to the class imbalance issue discussed earlier. Values range from 0 to 1 (the latter being a perfect score).After some time, the results are generated and displayed, which can also be viewed here. A number of different metrics are computed, which is not only good practice but pretty much necessary if we truly want to understand our model, given that each metric focuses on certain aspects of a model’s performance.The first metric that is displayed is the optimisation metric, with a value of 0.675:This is not great, but recall that the features were quite weakly correlated with the target during our EDA, so it is unsurprising that performance is unremarkable.This result also highlights the importance of understanding the results; we would normally be very happy with an accuracy of 0.997 (i.e. 99.7%). However, this is largely due to the highly imbalanced nature of the dataset, as discussed earlier, so it shouldn’t be given much importance. Meanwhile, scores like the precision and recall are based on a threshold of 0.5, which may not be the most suitable for our application.ROC and precision-recall curves are also shown, which again clearly show that the performance is a bit poor:These curves are also useful to determine what threshold we could use in our final application. For example, if it is desired to minimize the number of false positives, then we can select a threshold where the model obtains a higher precision, and check what the corresponding recall will be like.The importance of each feature for the best model obtained can also be viewed, which is perhaps one of the more interesting results. This is computed using permutation importance via AutoGluon. P-values are also shown to determine the reliability of the result:Perhaps unsurprisingly, the most important feature is EndType (showing what caused the level to end, such as a win or a loss), followed by MaxLevel(the highest level played by a user, with higher numbers indicating that a player is quite engaged and active in the game).On the other hand, UsedMoves (the number of moves performed by a player) is practically useless, and StartMoves (the number of moves available to a player) could actually harm performance. This also makes sense, since the number of moves used and the number of moves available to a player by themselves aren’t highly informative; a comparison between them would probably be much more useful.We could also have a look at the estimated probabilities of each class (either 1 or 0 in this case), which are used to derive the predicted class (by default, the class having the highest probability is assigned as the predicted class):Explainable AI is becoming ever more important to understand model behaviour, which is why tools like Shapley values are increasing in popularity. These values represent the contribution of a feature on the probability of the predicted class. For instance, in the first row, we can see that a RollingLosses value of 36 decreases the probability of the predicted class (class 0, i.e. that the person will keep playing the game) for that player.Conversely, this means that the probability of the other class (class 1, i.e. that a player churns) is increased. This makes sense, because higher values of RollingLosses indicate that the player has lost many levels in succession and is thus more likely to stop playing the game due to frustration. On the other hand, low values of RollingLosses generally improve the probability of the negative class (i.e. that a player will not stop playing).As mentioned, a number of models are trained and evaluated, following which the best one is then selected. It is interesting to see that the best model in this case is LightGBM, which is also one of the fastest:At this point, we can try improving the performance of the model. Perhaps one of the easiest ways is to select the ‘Optimize for quality’ option, and see how far we can go. This option configures several parameters that are known to generally improve performance, at the expense of a potentially slower training time. The following results were obtained (which you can also view here):Again focusing on the ROC AUC metric, performance improved from 0.675 to 0.709. This is quite a nice increase for such a simple change, although still far from ideal. Is there something else that we can do to improve performance further?As discussed earlier, we can do this using feature engineering. This involves creating new features from existing ones, which are able to capture stronger patterns and are more highly correlated with the variable to be predicted.In our case, the features in the dataset have a fairly narrow scope since the values pertain to only one single record (i.e. the information on a level played by the user). Hence, it might be very useful to get a more global outlook by summarizing records over time. In this way, the model would have knowledge on the historical trends of a user.For instance, we could determine how many extra moves were used by the player, thereby providing a measure of the difficulty experienced; if few extra moves were needed, then the level might have been too easy; on the other hand, a high number might mean that the level was too hard.It would also be a good idea to check if the user is immersed and engaged in playing the game, by checking the amount of time spent playing it over the last few days. If the player has not played the game much, it might mean that they’re losing interest and may stop playing soon.Useful features vary across different domains, so it is important to try and find any information pertaining to the task at hand. For example, you could find and read research papers, case studies, and articles, or seek the advice of companies or professionals who have worked in the field and are thus experienced and well-versed with the most common features, their relationships with each other, any potentially pitfalls, and which new features that are most likely to be useful. These approaches help in reducing trial-and-error, and speed up the feature engineering process.Given the recent advances in Large Language Models (LLMs) (for example, you may have heard of ChatGPT…), and given that the process of feature engineering might be a bit daunting for inexperienced users, I was curious to see if LLMs could be at all useful in providing ideas on what features could be created. I did just that, with the following output:ChatGPT’s reply is actually quite good, and also points to a number of time-based features as discussed above. Of course, keep in mind that we might not be able to implement all of the suggested features if the required information is not available. Moreover, it is well-known that it is prone to hallucination, and as such may not provide fully accurate answers.We could get more relevant responses from ChatGPT, for example by specifying the features that we’re using or by employing prompts, but this is beyond the scope of this article and is left as an exercise to the reader. Nevertheless, LLMs could be considered as an initial step to get things going, although it is still highly recommended to seek more reliable information from papers, professionals, and so on.On the Actable AI platform, new features can be created using the fairly well-known SQL programming language. For those less acquainted with SQL, approaches such as utilizing ChatGPT to automatically generate queries may prove useful. However, in my limited experimentation, the reliability of this method can be somewhat inconsistent.To ensure accurate computation of the intended output, it is advisable to manually examine a subset of the results to verify that the desired output is being computed correctly. This can easily be done by checking the table that is displayed after the query is run in SQL Lab, Actable AI’s interface to write and run SQL code.Here’s the SQL code I used to generate the new columns, which should help give you a head start if you would like to create other features:In this code, ‘windows’ are created to define the range of time to consider, such as the last day, last week, or last two weeks. The records falling within that range will then be used during the feature computations, which are mainly intended to provide some historical context as to the player’s journey in the game. The full list of features is as follows:It is important that only the past records are used when computing the value of a new feature in a particular row. In other words, the use of future observations must be avoided, since the model will obviously not have access to any future values when deployed in production.Once satisfied with the features created, we can then save the table as a new dataset, and run a new model that should (hopefully) attain better performance.Time to see if the new columns are any useful. We can repeat the same steps as before, with the only difference being that we now use the new dataset containing the additional features. The same settings are used to enable a fair comparison with the original model, with the following results (which can also be viewed here):The ROC AUC value of 0.918 is much improved compared with the original value of 0.675. It’s even better than the model optimized for quality (0.709)! This demonstrates the importance of understanding your data and creating new features that are able to provide richer information.It would now be interesting to see which of our new features were actually the most useful; again, we could check the feature importance table:It looks like the total number of losses in the last two weeks is quite important, which makes sense because the more often a player loses a game, it is potentially more likely for them to become frustrated and stop playing.The average maximum level across all users also seems to be important, which again makes sense because it can be used to determine how far off a player is from the majority of other players — much higher than the average indicates that a player is well immersed in the game, while values that are much lower than the average could indicate that the player is still not well motivated.These are only a few simple features that we could have created. There are other features that we can create, which could improve performance further. I will leave that as an exercise to the reader to see what other features could be created.Training a model optimized for quality with the same time limit as before did not improve performance. However, this is perhaps understandable because a greater number of features is being used, so more time might be needed for optimisation. As can be observed here, increasing the time limit to 6 hours indeed improves performance to 0.923 (in terms of the AUC):It should also be noted that some metrics, such as the precision and recall, are still quite poor. However, this is because a classification threshold of 0.5 is assumed, which may not be optimal. Indeed, this is also why we focused on the AUC, which can give a more comprehensive picture of the performance if we were to adjust the threshold.The performance in terms of the AUC of the trained models can be summarised as follows:It’s no use having a good model if we can’t actually use it on new data. Machine learning platforms may offer this ability to generate predictions on future unseen data given a trained model. For example, the Actable AI platform allows the use of an API that allows the model to be used on data outside of the platform, as is exporting the model or inserting raw values to get an instant prediction.However, it is crucial to periodically test the model on future data, to determine if it is still performing as expected. Indeed, it may be necessary to re-train the models with the newer data. This is because the characteristics (e.g. feature distributions) may change over time, thereby affecting the accuracy of the model.For example, a new policy may be introduced by a company that then affects customer behaviours (be it positively or negatively), but the model may be unable to take the new policy into account if it does not have access to any features reflecting the new change. If there are such drastic changes but no features that could inform the model are available, then it could be worth considering the use of two models: one trained and used on the older data, and another trained and used with the newer data. This would ensure that the models are specialised to operate on data with different characteristics that may be hard to capture with a single model.In this article, a real-world dataset containing information on each level played by a user in a mobile app was used to train a classification model that can predict whether a player will stop playing the game in two weeks’ time.The whole processing pipeline was considered, from EDA to model training to feature engineering. Discussions on the interpretation of results and how we could improve upon them was provided, to go from a value of 0.675 to a value of 0.923 (where 1.0 is the maximal value).The new features that were created are relatively simple, and there certainly exist many more features that could be considered. Moreover, techniques such as feature normalisation and standardisation could also be considered. Some useful resources can be found here and here.With regards to the Actable AI platform, I may of course be a bit biased, but I do think that it helps simplify some of the more tedious processes that need to be done by data scientists and machine learning experts, with the following desirable aspects:That said, there are a few drawbacks while several aspects could be improved, such as:In other future articles, I will consider using other platforms to determine their strengths and weaknesses, and thereby which use cases best fit each platform.Until then, I hope this article was an interesting read! Please feel free to leave any feedback or questions that you may have!Do you have any thoughts about this article? Please feel free to post a note, comment, or message me directly on LinkedIn!Also, make sure to Follow me to ensure that you’re notified upon publication of future articles.The author was a data scientist with Actable AI at the time of writing this article.----Towards Data SciencePost-Doctoral Researcher, Computer Vision and Machine Learning (esp. biometrics) | Lover of cars, gaming, movies, TV shows | https://www.linkedin.com/in/cgalea/HelpStatusWritersBlogCareersPrivacyTermsAboutText to speechTeams

The Ultimate Guide to Cloud Gaming: D…
best projectors for home

This post first appeared on VedVyas Articles, please read the originial post: here

People also like

The Ultimate Guide to Cloud Gaming: Discover the Best Services

best projectors for home

Player Churn Rate Predictionâ€Šâ€”â€ŠData Analysis and Visualisation (Part 1)

Related Articles

Player Churn Rate Predictionâ€Šâ€”â€ŠData Analysis and Visualisation (Part 1)

Related Articles

Share the post

Subscribe to Vedvyas Articles

Thank you for your subscription