Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Behavior-based approaches for detecting cheating in online games
(USC Thesis Other)
Behavior-based approaches for detecting cheating in online games
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Behavior-Based Approaches for Detecting Cheating in
Online Games
Hashem A. Alayed
A dissertation submitted to the faculty of
the University of Southern California
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
Clifford B. Neuman, Chair
Michael Zyda
Aiichiro Nakano
Yan Liu
Dennis Wixon
Department of Computer Science
University of Southern California
May 2015
Copyright © 2015 Hashem A. Alayed
All Rights Reserved
i
Dedication
To my beloved parents for their continuous support.
To my wife Layla, and my newborn son, Sultan.
To my brothers and sisters, and all the members of my great family.
ii
Acknowledgment
I would never have been able to finish my dissertation without the guidance of my
committee members, support from my friends and family, and especially my wife.
I would like to express my gratitude to my advisor Professor Clifford Neuman for his
guidance, support, and patience through the entirety of my PhD years. He guided me in my
research: starting with finding proper problems to work on, supporting me with researching and
analyzing solutions for these problems, and finally ending by writing this thesis and attaining my
PhD. Also, I would like to thank my colleagues Arun Viswanathan and Anas Almajali for all their
support.
I thank all the GamePipe lab members in the University of Southern California for their
great support and help. I want to dedicate special thanks to Professor Michael Zyda, the director
of GamePipe, for his continues guidance. In addition, a special thanks for Fotos Frangoudes for
his great help in building the game “Battle Trojans” during his busy schedule. Also, I would like
to thank Balakrishnan (Balki) Ranganathan, Marc Spraragen, PowenYao, Chun (Chris) Zhang and
Mohammad Alzaid.
I would like to thank both Professor Yan Liu and Mohammad Taha Bahadori for their
valuable suggestions and discussions during my thesis work.
I would like to thank King Saud University (Saudi Arabia) for sponsoring my graduate
studies during my Masters and PhD years.
I would like to thank my friends Dr. Sattam Alsubaiee, Hotham Altwaijry, Yassir
Altowaim, Dr. Bader Alzahrani, Hasan Alghabari, Karam Alyateem, Mohannad Aldughaim, John
iii
Morse, Christian Potthast, Franziska Meier, Tara Klamrowski, Matt Ahle, Sultan Alshamri,
Ahmad Alonizi, Dr. Mubarak Alhajri, Dr. Abdulaziz Alhussien, Dr. Mishari Almishari, Dr. Majed
Alresaini, Abdulrahman Alzaid, Osama Alfalah, Nasser Alrayes, Bassam Alnemer, Nasser
Alhammdan and Abdulrahman Alsaudi for their support and friendship over the years.
iv
Table of Contents
Dedication ...................................................................................................................................................... i
Acknowledgment .......................................................................................................................................... ii
List of Figures .............................................................................................................................................. vi
List of Tables .............................................................................................................................................. vii
Abstract ........................................................................................................................................................ 1
Chaper 1. Introduction ............................................................................................................................... 2
1.1. Motivation ..................................................................................................................................... 2
1.2. Problem ......................................................................................................................................... 3
1.3. Thesis Statement ........................................................................................................................... 4
1.4. Document Organization ................................................................................................................ 6
Chaper 2. Literature Review ...................................................................................................................... 7
2.1. Online Games Security ................................................................................................................. 7
2.1.1. Types of Attackers ................................................................................................................ 7
2.1.2. Types of Attacks ................................................................................................................... 8
2.1.3. Building a secure game ....................................................................................................... 10
2.2. Non-Behavior Based Research ................................................................................................... 11
2.3. Behavior Based Research............................................................................................................ 17
2.3.1. Relation to Non-Games Research (IDS) ............................................................................. 17
2.3.2. Cheat Modeling ................................................................................................................... 19
2.3.3. Player Modeling .................................................................................................................. 22
2.4. Conclusion .................................................................................................................................. 28
Chaper 3. Cheat Modeling ....................................................................................................................... 29
3.1. Model Description ...................................................................................................................... 30
3.2. Case Study: Detecting AimBots in First Person Shooters .......................................................... 33
3.2.1. Design and implementation ................................................................................................ 34
3.2.2. Experiments and Results: .................................................................................................... 40
3.2.3. Features Ranking................................................................................................................. 46
3.2.4. Analysis ............................................................................................................................... 48
3.3. Statistical Significance of the Cheat Modeling Data .................................................................. 51
v
Chaper 4. Player Modeling ...................................................................................................................... 54
4.1. Model Description ...................................................................................................................... 54
4.2. Case Study: Detecting AimBots in First Person Shooters .......................................................... 57
4.2.1. Design and Implementation ................................................................................................ 58
4.2.2. Experiments and Results ..................................................................................................... 61
4.2.3. Analysis ............................................................................................................................... 70
4.3. Statistical Significance of the Player Modeling data .................................................................. 72
Chaper 5. Decider .................................................................................................................................... 76
5.1. Model Description ...................................................................................................................... 76
5.1.1. The Decider’s Input and Configurations ............................................................................. 77
5.1.2. The Decision Making Process and the Output ....................................................................... 79
5.2. Experiments and Results ............................................................................................................. 82
5.2.1. Long-time player’s results .................................................................................................. 82
5.2.2. New player’s results ............................................................................................................ 85
5.3. Analysis....................................................................................................................................... 88
Chaper 6. Discussion ............................................................................................................................... 90
6.2. Significance of the Results .......................................................................................................... 90
6.2. Dealing with new players’ incoming data and Long-Time Players Self-Improvements ............ 92
6.3. Cost Effects ................................................................................................................................. 93
6.4. Setting the Threshold Values ...................................................................................................... 94
Chaper 7. Conclusion and Future Work .................................................................................................. 95
7.1. Summary ..................................................................................................................................... 95
7.2. Contributions ............................................................................................................................... 96
7.3. Future Work ................................................................................................................................ 97
7.4. Concluding Remarks ................................................................................................................. 100
References ................................................................................................................................................. 101
vi
List of Figures
Figure 1: System Design .................................................................................................................. 4
Figure 2: Cheat Modeling Training Phase ..................................................................................... 30
Figure 3: Cheat Modeling Application Phase ................................................................................ 32
Figure 4: Trojan Battles in-game screenshot ................................................................................. 33
Figure 5: Lobby screen with customization options ...................................................................... 34
Figure 6: In-game AimBot toggle screen ....................................................................................... 35
Figure 7: Illustration of example #2 ............................................................................................... 50
Figure 8: Bootstrapping histogram for “Lock-Based vs. AF” model on frame rate = 60 seconds 52
Figure 9: Bootstrapping histogram for “Lock-Based vs. AF” model on frame rate = 30 seconds 53
Figure 10: Player Modeling Training Phase .................................................................................. 55
Figure 11: Player Modeling Application Phase ............................................................................. 56
Figure 12: Epsilon Values Plotted against different measurements ............................................... 60
Figure 13: Plots of different measurement results vs. different sizes of input data. ...................... 62
Figure 14: Histogram for bootstrapping results (for overall accuracy) on frame rate = 30 ........... 74
Figure 15: Histogram for bootstrapping results (for overall accuracy) on frame rate = 60 ........... 74
Figure 16: Histogram for bootstrapping results (for F1-Score) on frame rate = 60 ....................... 75
Figure 17: Histogram for bootstrapping results (for F1-Score) on frame rate = 30 ....................... 75
Figure 18: An Illustration of the Decider's Configurations and Inputs. ......................................... 77
vii
List of Tables
Table 1: Multi-class Classification ................................................................................................ 41
Table 2: Locked-Based vs AF Classification ................................................................................. 42
Table 3: Two-class Classification .................................................................................................. 43
Table 4: Separated Cheats Classification ....................................................................................... 44
Table 5: Application Phase Results ............................................................................................... 45
Table 6: Top Five Features in Each Part of the Training Phase..................................................... 47
Table 7: Mean, Standard Deviation, Error Margin and 95% Confidence Interval for “Lock-Based
vs. AF” model ................................................................................................................................ 52
Table 8: Best Results Achieved by each Measurement for P1 ...................................................... 65
Table 9: Results achieved using the best Overall-Accuracy-Based Epsilon for P1 ....................... 65
Table 10: Results achieved using the best F1-Score-Based Epsilon for P1 ................................... 65
Table 11: Best Results Achieved by each Measurement for P2 .................................................... 66
Table 12: Results achieved using the best Overall-Accuracy-Based Epsilon for P2 ..................... 66
Table 13: Results achieved using the best F1-Score-Based Epsilon for P2 ................................... 66
Table 14: Best Results Achieved by each Measurement for Avg-P .............................................. 67
Table 15: Results achieved using the best Overall-Accuracy-Based Epsilon for Avg-P ............... 67
Table 16: Results achieved using the best F1-Score-Based Epsilon for Avg-P............................. 67
Table 17: Application Phase Results using the Overall-Accuracy-Based Epsilon as a threshold . 68
Table 18: Application Phase Results using the F1-Score-Based Epsilon as a threshold ............... 69
Table 19: Mean, Standard Deviation, Error Margin and 95% Confidence Interval for Player 1’s
model ............................................................................................................................................. 73
1
Abstract
The online games industry has grown rapidly over the last decade. As a result of
this rapid growth, many techniques have been created in response to the process of game
development. One of the important aspects to consider is the prevention of cheating by the
games players. Cheating in online games comes with many consequences for both players
and companies. Therefore, cheating detection and prevention is an important part of
developing a commercial online game. Over the years, there have been many anti-cheating
solutions that have been developed by gaming companies. However, many companies use
cheating detection measures that may involve breaches to a user’s privacy.
In this thesis, we provide a server-side, anti-cheating, system-generic method that
uses only game logs. The system consists of three main parts: cheat modeling, player
modeling, and the decider. The cheat-modeling segment focuses on defining a cheating
behavior using classification techniques. This is achieved by building a model for each
type of cheat. When considering the opposing side, the player modeling part focuses on
defining a player’s behavior using anomaly detection techniques, which is done by building
a model for each player. Each part will then give a probability of detection to the decider.
The decider will give an overall probability based on different criteria discussed in the
document. Many researchers have done work on the analysis of online game data, however,
few of them focused on the problem of cheating detection.
2
Chaper 1. Introduction
1.1. Motivation
Security of online games is an important issue for any game to gain revenue. This
makes cheating detection and prevention an important part of developing a commercial
online game. Cheating in online games comes with many consequences for both players
and companies. Cheaters will reduce the fun of playing the game, and they will gain unfair
advantages over honest players. Conversely, companies will be affected monetarily in
different ways. One way this occurs is through the exodus of honest players from the game
if cheaters are not policed. This leads to a company losing the honest player’s subscription
money. Perhaps, more importantly, the game will gain a bad reputation, leading to a
plummet in sales. Alternately, another way a company could lose revenue is through
building cheat detection methods, and that includes time spent and employees hired to stop
cheating.
Anti-cheating solutions have been developed by gaming companies to counter these
problems. However, most of the solutions use cheating detection measures that may
involve breaches to users’ privacy, such as The Warden of “World of WarCraft” [1], and
PunkBuster [2]. Several companies hire people to analyze game logs for the purpose of
game improvements as well as cheating detection. However, manual monitoring is
expensive, since they need to hire more employees. Moreover, it is a time consuming job.
3
For these reasons, there is an essential need to develop behavioral-based cheating
detection methods that automate game monitoring. These methods can either be on-or-
offline after defining a generic way of cheating behavior. Furthermore, they will protect
user’s privacy since they run on the server side, and only handle gameplay log files.
1.2. Problem
Developing an online game cheating detection system is not a trivial task. This task
becomes more difficult when trying to generalize specific cheating detection techniques.
Our goal is to develop a system that will detect a player’s malicious behavior based on his
log data, and by using machine-learning techniques. For example, detecting AimBots in
online First Person Shooter (FPS) games.
In general, to develop a machine learning based system that detects online games
cheats we have to address these questions:
- Can we build an Online-Game Security system without the need to modify (or with
minimal modification) the game client or state server? I.e. By using the log traces
of the gameplay only.
- Can this system cause no or minimal performance overhead to the gameplay or the
network bandwidth compared to other techniques?
- Can this system be accurate with low error rate, false negatives, and false positives?
- Can it discover new cheats in the same category with a fairly high accuracy?
- Is the system centralized on the server, or distributed between the server and
clients?
4
1.3. Thesis Statement
Cheating detection can be achieved in many ways. However, most of the techniques
used in commercial games require visibility into other activities on the client machine;
thereby, invading players’ privacy. Therefore, I propose a model of cheating detection that
only uses player’s in-game data in the server. To detect malicious behavior in an online
game using only the game logs on the server, player’s data can be used to design a system
that uses two different modeling techniques (as shown in figure 1):
Client 1
Client 2
Client k
. . .
Server
Log
Player
Modeling
Cheat
Modeling
Pre-Process
Pre-Process
% %
Decider
Figure 1: System Design
5
1. Cheat Modeling: In this part of the system, detailed data of the player’s
gameplay is used to build a model for each type of cheat using classifiers, and
then uses these models on unseen data to detect cheaters using similar category
cheats.
2. Player Modeling: In this part of the system, I use player’s gameplay data first
to design a model for each player. Then, I use anomaly detection techniques to
detect any malicious behavior of a certain player.
3. Each modeling technique (two parts above) will give the system a probability
of malicious behavior result. This probability should tell the system how
confident and certain they were in detecting this malicious behavior. The
system will then combine those two probabilities to create the overall level of
confidence based on a threshold. The value of the threshold depends on
different criteria such as Long-time players, new players, the performance of
the system, and so on.
These two techniques are effective in detecting malicious behavior. The design is
system-generic, but I will apply it to an online First Person Shooter game that was
developed in the GamePipe lab at USC. The system does not only detect cheating
behaviors, but it also detects any unusual (malicious) behavior by a player or a group of
players.
6
1.4. Document Organization
The document will be organized as follows: chapter 2 will cover the literature
review, Chapter 3 will explain the cheat modeling part of the system with a case study of
a published paper on this part. I will then describe the design and the experiments of the
player modeling part in chapter 4. After that, I will explain the decider in chapter 5. Chapter
6 will contain the discussion of our work. Finally, I will conclude the dissertation in chapter
7.
7
Chaper 2. Literature Review
In this chapter, I discuss current and established work conducted in the online games
security field. First, I explain the types of attackers and attacks, as well as how to build a
secure game. After that, I show work done on non-behavioral based cheat detection, and
then I discuss the research on behavioral based player’s data analysis.
2.1. Online Games Security
2.1.1. Types of Attackers
While playing an online game for an extended period of time, people tend to lose
interest in the original game experience and try to find other ways to improve the game
experience by modifying/hacking the game. In the early days of gaming, cheaters used to
hack games merely to show off their hacking skills. As time progressed, cheaters started to
hack for other reasons such as: improving game experience, getting attention, fun, glory,
and (most dangerously) money and compromising other’s computers.
8
2.1.2. Types of Attacks
Several papers talked about what kind of attacks are available in the Online Games
world. However, each author has his or her own classification, which I conclude in this
section.
According to [3], there are six basic types of programmable attacks on online
games. First, Bots—a client-side attack—that vary from a simple bot that applies some
common action, to a complex AI bot that perform advanced actions on behalf of the player.
Bots can be harmless if they are used to do some repetitive actions—like the fishing bot in
final fantasy XIV—however, there are many dangerous bots that aim for outplaying other
humans, or even gaining money (like poker bots). In either case, all the well-known game
companies do not allow them, and players who use them usually get banned by the
developers. The second type of programmable attacks is using the game’s UI such as keys,
buttons, or mouse clicks; which is also a client-side attack. In general, these are another set
of bots that are used to perform boring repetitive actions in a game (such as farming in
WoW or FFXI). The third type of programmable attacks is manipulating the game state by
operating as a proxy between the client and the server. This is a man-in-the-middle attack,
where the cheater will intercept the packets, analyze them, and then modify them for his
own benefit. An obvious countermeasure is to use encryption. However, the type of
encryption used should not affect the performance of the game. Moreover, the client of the
game should know how to decrypt the packets. This often leads to the use of reverse
engineering by cheaters to discover the decrypting algorithm. This type of attack is
9
commonly used in First Person Shooter games to manipulate aiming and death situations.
This method is also used by MMOGs to gain more information about the world state.
The fourth type is the manipulation of the game state by playing with the memory.
This is a client-side attack, and is also common in FPS games. These attacks usually target
the drivers—especially the graphics drivers—or even intercept DLL calls and/or outside
libraries. The fifth type is Kernel-level debugger breakpoints. This is another client-side
attack where the cheater sets certain breakpoints to look for certain messages or functions.
The final type is predicting a random number generator. This attack is commonly used for
games that rely on chance and randomness, such as online poker. In poker, cheaters try to
discover the random function that is used for the shuffling algorithm, and by doing so,
innocent players are stolen from without noticing. Many security measurements were used
to improve randomness in general.
In [4], the authors mentioned 15 types of cheating, and I summarized most of the
important types in this paragraph. First, the developer’s misplaced trust in the client is
usually exploited. Since some developers put too much trust in the client software, cheaters
will modify and tamper with it. Moreover, since the cheater is in control of the client
software, it is hard to protect against this kind of cheating. Another type of cheating is
Collision, or Rank Boosting. This has potential to happen in any game with a ranking
system. Unfair players will collide with each other to increase their ranks among fair
players. Thus, climbing illegally to the top in the ranking system. A third type is game
procedure abuse, such as escaping from a losing situation by disconnecting. Trading virtual
10
items for real money is another type of cheating as well, and it is considered illegal in most
of the famous games. In addition, cheaters might steal other players by offering an item,
getting the money, and not deliver the item. The fifth type of cheating is exploiting Game
AI, which is common in board games such as Chess. Cheaters use an AI program to help
calculate the next move against a human player. Timing cheating is also common. One
particular method, the “Look-ahead cheat,” in which cheaters try to know the opponent’s
next move by delaying their own moves—is one such example. Another common type of
cheating is exploiting a bug in the game without tampering with the software code. This is
a very common cheat in MMOGs. The authors also mentioned other types of cheating that
are not exclusive for online games, but can be used in other network software. Such as DoS
attacks, compromising passwords, compromising servers, authentication exploits, insider
misuse, and social engineering. In addition, many of the 15 cheats mentioned above can be
used together to form a complex cheat.
2.1.3. Building a secure game
A McAfee white paper on securing online games [5] presented many hints for
developing secure online games such as: limiting scripting persistency, auto-execution and
permissions, or even preventing them. Moreover, data transmission should be encrypted
and authenticated to prevent tampering. Another tip for securing online games is to avoid
sending unnecessary data, and/or sensitive data, to clients such as debugging information.
Furthermore, developers should prevent active code from being used during in-game
communication. In addition, any input to the client software should not be trusted, as well
as the environment the software is running on. The server cannot trust the client software.
11
Developers need to always use secure backup and logging techniques, and deal with in-
game DoS attacks.
2.2. Non-Behavior Based Research
In this section, I present some online game security measures that are not based on
the behavioral study of the player, i.e. they were not based on using machine-learning
techniques.
Server-Side Bot Detection [6]:
Many companies detect bots using some sort of client-side protection mechanism,
such as “The Warden” in WoW (mentioned above). Programs like “The Warden”
efficiently detect bots; however, it cannot be trusted since the player loses his privacy.
Therefore, the authors of this paper suggest a server-side bot detection technique that is
based on the character movement only [6].
Since bots store waypoints that will be used to move the character, the proposed
technique will use this feature to detect them. It is difficult to detect bots through the
mechanism of finding an exact path using the exact waypoints of a character. Therefore, a
route will be constructed using a client’s saved coordinates through use of a Douglas-
Peucker line simplification algorithm [6]. Then, by using high-density dots, waypoints are
created around those coordinates (Clustering). After that, a route representation is created
12
using the set of waypoints, and then, it will find a repetition using extended suffix array by
calculating the LCP table of route ways. Finally, an average LCP value is calculated to find
repetition. If it is high, a bot is detected; if it is low, then it is a human player. The detection
usually takes between 12-60 min, which is considered a good number. However, this
technique can be avoided simply by using long paths, or by moving the character in a
random fashion.
Enforcing Semantic Integrity on Un-trusted Clients [7]:
This is a technique implemented in online games, or what they call in the paper:
“Networked virtual environments (NVEs)” [7]. It is a technique that detects modifications
of the clients that violate the rules of the NVE, i.e. invalid commands. It is an optimistic
approach that uses a fully trusted audit server for validation of clients.
The paper introduced what they call “the semantic gap,” which is defined as the difference
between the server’s abstract representation of the world, and the client’s concrete
representation [7]. The client can use this gap to send updates to the server; those updates
are valid in the abstract view, but invalid in the concrete view. For example, imagine a
vehicle is inevitably heading towards crashing into a wall, but it is then modified to make
an unusual U-turn back to the safety of the road. Put another way, the goal of this technique
is to detect those unauthentic updates.
13
The paper used the term “client cycle” to explain the client-server interaction for
each update. A client cycle is initiated by the client for every update requested (a change)
by sending this change to the server. The server then validates this change and sends it back
together with other client’s changes. The Audit server has cycles called “Audit cycles”
between the client and the Audit server. Each audit cycle consists of x number of client
cycles, and at the beginning of each audit cycle, the client must send a hash of its concrete
full state to the Audit server. Also, at each client cycle, the client must send the change
(update) to the audit server. The Audit server will then use the protocol Audit to check if
the client’s concrete updates received thus far can be considered valid and consistent with
NVE rules.
The problem with this technique is that it will introduce extra cost to the client,
because it requires the sending of concrete updates (and calculating the hash) to the audit
server each audit cycle. In addition, it will require extra storage to save up to 3 full states
in the client’s buffer (for the audit cycles) [7].
Server-Side Verification of client behavior [8]:
This technique will detect invalid commands by the client in a client-server game
model. It will validate any game client by checking the consistency of the targeted client
software, and whether its execution matches the consistency of legitimate client software.
The technique will detect a certain type of cheats, in which the player modifies the client’s
executable (or in-game memory data) to allow some in-game actions that are not allowed
14
by a legitimate game client. For example, the player will give the game character extra
power, extra speed, or extra health. The verification in this technique can be done in some
selected moments, for example, after someone wins a match. Note that the technique will
detect behaviors that are valid in some cases, but invalid in the current case based on the
history of the client’s previous behaviors noticed by the server.
This technique assumes that the client software is structured as an event loop that
processes inputs, and it updates the server about its status (client’s health, location, etc) [8],
The technique will require some changes in the client source code to make it a symbolic
execution. The symbolic execution will allow the server to verify the validity of an
execution path by checking client updates with a disjunction of round constraints and see
if any of them will pass. Those updates could be sent at each loop iteration, or it could
accumulate them after a certain amount of iterations (or at selected moments, as mentioned
above).
This technique is good for detecting invalid commands. However, it will not detect
invalid commands that are considered valid by a bugged server. Also, modifications that
make a tough (but legal) behavior easier will go undetected by the server.
Mobile Guards [9]:
In [9], the authors present a technique that protects from both game client
modification (invalid commands) and secret information exposure. This technique is
15
applied using a segment of code transferred from the server to the clients; this piece of code
is called “Mobile Guards” [9]. Mobile guards prevent cheating by using checksums to
validate the integrity of the client; as well as using masking functions to protect and hide
secret information from getting exposed.
Based on server-client architecture, the game server will create a mobile guard—
which contains a different set of protection mechanisms—then force the clients to
download it on demand or after a certain interval. Therefore, each mobile guard will have
a lifespan, and the server will need to send a new mobile guard with a new set of protection
techniques to all clients.
To enforce the clients to download and execute a mobile guard, a specific checksum (they
used RCCA [9]) must be calculated to get each game data update from the server. This is
done by using the result of the RCCA as a key to decrypt the game data needed.
To ensure the execution of protection mechanisms, they “spatially entangled” them
with the enforcement mechanism used above [9]. There are several protection mechanisms
that can be used within mobile guards. The authors gave us three of them: data hiding
(which is masking and unmasking the secret information), relocation of code by randomly
assigning secret code to memory locations, and verifying the integrity of the game by using
the RCCA checksum.
While this technique might not perfectly prevent cheating, it will make it harder for
cheaters to modify the client. This occurs through mobile guards receiving periodic updates
16
with new protection mechanisms. On the other hand, developing those mobile guards is
not an easy job. In addition, the protection mechanisms will add an extra cost on the clients.
EnsureIT [39]:
Arxan is a security company that provides security solutions for PC, mobile, and
console (PS3) games using the EnsureIT products. They provide protection against several
gaming attacks such as tampering and reverse engineering. They provide a client-side
solution tool that manually applies small pieces of code—called “Guards”—on critical
code section in the game code. Those guards work in parallel with the game binaries. In
addition, they add guards to protect existing guards to provide multiple layers of protection.
They claim their performance overhead as only 2-5% when guards are applied only in
critical sections.
Fides [40]:
Fides is a monitoring system that is based on an anomaly detection approach. The
system has both client and server functionalities. The client side monitor is called the
Auditor, and the server side monitor is called the Controller. Both sides work in parallel
with the game, and have little performance overhead (as the authors claim). The system
works by emulating the game client by the Controller, i.e. mapping the player’s virtual
memory during his login process to learn his execution patterns. If any deviation is
17
detected, the player is then considered as a cheater. The Auditor accepts the Controllers
commands to audit the client’s data, memory, trace stack, and debugger, and then send that
information to the Controller. The authors also claim that Fides does not invade the client’s
privacy like other monitoring systems mentioned before.
Like any other monitoring system, Fides is vulnerable to modifications since it is
at the client control. Also, it adds performance and network overhead to both client and
server. Finally, the scalability of the system was not mentioned or tested in the paper.
2.3. Behavior Based Research
It is well known that research in online game security is limited due to the secrecy
of gaming companies; however, several papers were published in academia that dealt with
the analysis of online gamers’ behavior. In this subsection, I divided work in this area into
two categories: one related to cheat modeling, and the other related to player modeling.
However, before I provide related work to these two categories, I provided the relation
between this work and similar work outside the field of online games.
2.3.1. Relation to Non-Games Research (IDS)
An Intrusion Detection System (IDS) is security software that attempts to detect
attacks on a network (network-based IDS) or an information system (host-based IDS). The
research on IDSs began two decades ago, and several techniques have been presented
towards designing an ideal IDS. There are two main approaches to building an IDS: Misuse
Detection, and Anomaly Detection [41, 42].
18
In a Misuse IDS [41, 42] (sometimes called a signature based IDS), well known
attacks are stored in a database to compare them with current flow of data. If a similarity
is found, an attack will be flagged. There are several ways of designing the attack types;
some systems use rule based approaches, and others use neural networks (NN) to train the
system on attack behaviors. The NN approach is similar to the cheat modeling part of our
system, in which I train the system on well-known types of cheats. Then, the system should
detect same cheats or similar cheats with some variety. The drawback of this approach is
that new attacks will go undetected if the database is not updated regularly, or the system
is not trained to new attack types. In my approach, I define a general behavior of an unusual
game play type, and the system should detect new cheats that have some similarity with
the trained behavior.
On the other hand, in anomaly based IDS [29, 41, 42], the system must be taught
the normal behavior of users. Then, the IDS will detect any deviation from normal behavior
as an attack. There are several ways for designing an anomaly based IDS. Some of them
use Neural Networks, in which they train the system on normal behavior data only (semi-
supervised training). The NN approach is similar to the player modeling part of our system,
in which I build models for normal behavior for general game play, or even for each player
separately. One major drawback of anomaly detection is the high false alarm rate. This can
be avoided by applying clustering methods to compare the game play of a certain player
with general game play for a group of players (skill clusters). Moreover, I am using a hybrid
system that uses both misuse and anomaly detection techniques. Then, I use the decider to
give a final decision regarding the current behavior of the player.
19
2.3.2. Cheat Modeling
Few papers have been published regarding the detection of cheating using server
logs to study player’s behavior patterns. However, there were some notable works by
different people such as:
Game Bot Detection Using Manifold Learning (via Avatar Trajectory analysis) [17,18]:
In this paper, the authors provided a bot detection method using Manifold Learning
based on the avatar’s movement trajectory only. They claimed that the method is general
for any movement bot and has a detection accuracy of 98%. This method also worked on
the server side only. They used the game Quake 2 (First Person Shooter) as their case study,
since it has many traces available on the internet.
The authors collected several types of traces, including various human traces and
three of the most popular “Quake 2” bots on the internet. All the traces for humans and
bots were collected for a single map, and all of them were under 600 seconds. For bot
detection, they used three types of methods: k Nearest Neighbors (kNN), Smooth Support
Vector Machine (SSVM), and Isomap. Both kNN and SSVM gave good results; however,
they had poor performance due to the “curse of dimensionality” [17]. To solve that
problem, they used Isomap to reduce the dimensionality of the original space in the feature
extraction process, and then use either kNN or SSVM for classification.
20
The performance of the method showed high results on detecting these bots. They
added noise to the step size data in the bots [18], and the error rate was still low (though
not as low as 2%). They also tested the method on different maps in [18] using traces for
humans and the same bots for each map. The results kept scoring a low 1-2% error rate
with each map. This method was very effective and had a high performance. However, it
was based on character’s movement positions only (simplified), and thus it only worked
for detecting a moving bot.
Using Dynamic Bayesian Network to detect Aiming Robots [15]:
In this paper, the authors provided a scalable method that uses Dynamic Bayesian
Network (DBN) to detect Aiming Robots (AimBots) in First Person Shooter (FPS) games.
The authors used a game called Cube as their test-bed [15]. The basic idea of the method
is to detect whether the player is cheating or not by calculating the player’s accuracy within
a period of time. The accuracy is calculated by considering several parameters (player’s
movement, target’s movement, the distance between the player and the target, and so on).
Then, depending on the height of accuracy, the probability of cheating is computed. The
probability of cheating at time t also depends on the probability of cheating at time t-1.
Finally, they will compute the cheating probability of a player over a certain period of time
and check if it is above the threshold of a cheater.
21
The authors applied this method to three types of AimBots: basic AimBot, auto-
switching AimBot, and intentional misses AimBot. While their methods provided good
results, perhaps if more features were used, better results could have been obtained.
A Cheating Detection Framework for Unreal Tournament III: a Machine Learning
Approach [16]:
In this paper, Galli et al. implemented a very interesting cheat detection framework.
The framework was based on a game developed using Unreal Development Kit (UDK)
that uses the Unreal Tournament III game engine. To implement cheating, they build an
advanced cheating system that includes a triggerbot, three AimBots, a radar system, and a
warning system. The detection framework had three components [16]: A game server to
collect game logs, a processing backend to create features and apply classifiers (they used
five classifiers here), and a frontend to monitor and analyze results.
The results were very promising. Though it could be argued that they needed to add
more training and testing data. They had only two players in each match, and one of them
was AI. They only collected around 250 training samples. In addition, they should have
tried different frame set sizes to collect features from game logs. Finally, some features
relied on the fact that there are only two players in a match, which is not the case in
commercial online games. For example, they calculate distance between a player and his
target all the time, even if the target is not visible.
22
2.3.3. Player Modeling
Several interesting papers were related to analyzing human player behavior using
machine learning techniques, and using those techniques to differentiate/ detect a human
player from a bot. In this section, I present some of the work related to developing an AI
bot using the analysis of a human player’s behavior. Then, I present a work that analyzes
player’s behavior to detect cheating in a racing game. Afterwards, I provide an example of
a new server-side commercial anti-cheating system. Finally, I present work related to
analyzing player’s behavior in general.
Analyzing human player movements [10, 11, 12, 13]:
Thurau et al. did some of the earliest work in the field of machine learning
application to online games. Thurau and his team tried different approaches to develop a
human-like AI agent using the analysis of human player behavior, such as:
Learning Human-Like Movement Behavior in Computer Games [10]:
Due to poor implementation of artificial agents (AI bots) in Computer games, the
gaming world felt an increasing need to find a better way to create those bots. Thurau et al.
developed an intelligent AI bot based on human behavior learning. They focused their
development and experiments on First Person Shooter (FPS) games, namely: Quake II.
They used demo files to collect traces of human players.
23
To implement their idea, the authors divided player’s actions into strategies and
tactics [10]. The concentration of the paper was on game strategies, which include
movements, guarding certain areas, and collecting important items. The implementation of
the idea was by using the Neural Gas Waypoint Learning algorithm to represent the virtual
world. The use of this algorithm was based on goal-oriented nodes that are based on
different items locations.
The idea was a great addition toward using machine learning to detect human
player’s behavior. However, they needed to use more ‘well-known’ training examples.
Also, the idea was focused on player’s movements only.
Using Self-Organizing Maps [11]:
In an earlier work [11], Thurau et al. introduced a neural network based bot that
learns human behavior. They used self-organizing maps to reduce dimensionality of data.
They only focused on the Velocity and the View angle of players to train AI bots.
Although, the training was not perfect (and lacking an adequate number of features), it was
a good way to analyze and create an AI bot based on human behavior.
Applying Manifold Learning [12]:
Thurau et al. also proposed a performance improvement [10] through application
of Manifold Learning as a method to solve the curse of dimensionality [12]. They used the
Locally Linear Embedding (LLE) method to reduce the 3D space to a 2D embedded space.
However, the experiments showed good results only on smaller maps.
24
Bayesian Imitation Learning Game Bots [13]:
In this paper, in order to improve the Bots movement behavior in [10], Thurau et
al. used Bayesian Imitation Learning instead of Neural Gas Waypoint Learning algorithm.
They produced bots that have smoother and more human-like movements than in [10].
Analytical Approach for Bot Cheating Detection in a Massive Multiplayer Online
Racing Game [44]:
In this paper, the author used player analysis to detect potential cheaters in a racing
game. He used data that consisted of 2.2 million instances for over 45,000 players, and the
game mode used in this research was Player vs. AI enemies (PVE).
The focus of this research is to detect players using Bots by catching the repetitive
behavior of these Bots. The author used different features to define a cheater, such as (per
player): Mean time between races, Number of total races, Number of total races for a
certain track, mean duration per race per track, and so on. After removing garbage data, the
author defines suspicious players in two separate steps: (a) Catching players with low “in-
between races” time and high number of races played. (b) Catching players with short and
similar race times per track. After that, he will select players who appears in the intersection
from both steps (a) and (b). Finally, he chooses a player with a winning ratio above 70%
and assigns them weight depending on the winning ratio and other criteria using decision
trees.
25
The author did a great job detecting cheaters in this online racing game. However,
the mode selected (PVE) has offline game characteristics, and thus, was easier to detect. In
addition, the detection could be deceived if the Bot used random “in-between races” times.
FairFight [50]:
FairFight is a commercial server-side anti-cheating system that is used in several
games such as the FPS game BattleField 4, and the MOO game Infestation: Survivor
Stories. The system uses two different methods that are used in parallel to detect cheating:
“algorithmic analysis of player statistics and server-side cheat detection” [50]. The
“Algorithmic Analysis” part is an anomaly detection approach that checks the player’s
match statistics (such as Kill/Death Ratio in FPS games) and analyzes them with the
average statistics of all players. If an anomaly is found then the player is suspected as a
cheater depending on the other part of the system. The second part is the “server-side cheat
detection”, which checks for invalid (or impossible) commands and actions done by the
player (such as a faraway kill by a close-range weapon in FPS games), and validates it with
the first part of the system. If both parts validate the same cheat, then an action is done.
The system runs on the server side only with no interaction with the player’s client
system. Moreover, it runs in real time while the player is inside the match. It is
customizable and works with every game type by adjusting the data used for detection. The
system showed some problems in the blockbuster game BattelField 4, where many
excellent players were detected as cheaters several times.
26
Player behavior analysis:
Several papers discussed the analysis of a player’s data and modeling. A book
called Game Analytics was just published in 2013 that discussed game data analysis. As
for the papers, many discussed player data analysis in general. One of the more interesting
papers written by Drachen et. al. [27], presented a method to cluster the behavior of players
in two different online games. A MMORPG called Tera, and an Online FPS called
Battlefield 2: Bad Company 2 (BF2BC2). They collected data for a total of 260,000 players
for both games. However, each player has only one record that contains a summary of his
gameplay until the moment of collection. They used two different unsupervised clustering
methods: K-Means and Simplex Volume Maximization (SIVM). For Tera, they divided
players into four level ranges: 1-10, 11-20, 21-31, 32. Then, they extracted 18 features for
clustering. Finally, for each level range, they created 6 clusters of players using K-Means,
and 6 different clusters using SIVM. As for BF2BC2, they extracted 11 gameplay features
for clustering. They then created 7 clusters using K-Means, and 7 different clusters using
SIVM.
Drachen et. al. also published an earlier paper [32] that analyzed the behavior of
1365 players from the game Tomb Raider: Underworld (TRU). The player’s data was
aggregated for the whole game, and they extracted six features from the data. Using
Emergent Self-Organizing Maps (ESOM), they then created four clusters that were based
on how players die in the game.
27
Another paper was then published on the same game (TRU) by Mahlmann et. al.
[33], in which they used several classification techniques (such as Linear Regression and
Decision Trees) to predict if a player will stop playing, or how long it will take him/her to
finish the game. They extracted 11 features; six of these features were used in their study
[32]. In [34], Sifta et al. improved the work done in [32] by adding more players and
collecting more detailed data. The number of players collected in this paper is 62,000
players. The data collected was aggregated per level instead of aggregated for the whole
game. Here, they extracted 10 features that were all used in [33]. They used SIVM to
generate clusters and—out of the seven levels in the game—they generated six clusters for
each level.
Several other papers focused on player’s modeling. Missura et. al. [35] used K-
means clustering and SVM to predict the “just right” difficulty for players dynamically. By
“Just right,” they meant that the game should not be too easy (boring) or too difficult
(burdening). They did their experiments on a simplified 2D game that they created, and
used three features only. Their method is to cluster existed players data into three clusters
based on the selected features (including the difficulty set by players manually in the
training phase). Then, they predict the difficulty setting for new players by learning the
cluster they belong to in the game. Another paper [36] used random forests to predict a
player’s skill. Using the information for better matchmaking in FPS games. They defined
the skill as how many FPS games a player played, and how many total hours he/she spent
on games. The features used to predict these skills were based on a questionnaires plus
mouse and key inputs from players. Drachen et. al. [37] suggested that using spatial outlier
28
detection techniques could help in detecting anomalies in open world games (e.g. finding
design weaknesses in some map locations).
The book [38] covered most of the studies on the analysis of games data. Some of
the studies that held particular interest were focused on available data collection tools,
game data mining (variable, metrics, features, and models), and some currently available
visualization tools.
2.4. Conclusion
As we can see from the literature, academia is struggling with online games
research. Many of the papers published were done on home developed games, old games,
or open source games. The academic community is pushing game publishers to release
their data for more useful research. In the field of cheat detection, more work needs to be
done, especially in the player’s modeling area. There should be work toward utilizing the
techniques done in that area for the purpose of malicious behavior detection.
29
Chaper 3. Cheat Modeling
In chapter 2, I reviewed work that had been done in the online game security field, as
well as related work in the cheat detection field. In this chapter, and the upcoming chapters,
I discuss my work regarding the behavioral-based cheat detection analysis and how I
modeled my system. This chapter discussed the Cheat Modeling part of our system. Cheat
Modeling is similar to Signature-Based Intrusion Detection Systems, in which you build
models (train the system) for malicious behaviors. Then you detect new cheats (attacks)
based on the trained models. In this section, I explain the Cheat Modeling part of the system
in detail. Going on to describe the model first, with the phases it includes, I will then present
our work for this model on an online FPS game.
30
3.1. Model Description
This part of the system focuses on building models for cheating (malicious)
behavior. Therefore, labeled data are collected for both honest players and cheaters. Then,
using classification techniques, I built a detection model for each type of cheat available.
This part of the system consists of two phases, the training phase, and the application phase
(Shown in Figures 2 and 3).
In the Training phase (Figure 2), data (game records) are collected from the server’s
log for both honest and cheating players. This requires built-in cheats that represent a
cheating behavior for each specific cheat type. For example, in First Person Shooting (FPS)
games, we can create different types of AimBots to represent the unusual aiming cheating
Log
Features
Extractor
Records
Data
Analyzer
Features
Files
Models
Depository
Models
Figure 2: Cheat Modeling Training Phase
31
behavior. I then preprocessed the game records for these players (both honest and cheaters)
to create labeled features files using the feature extractor. The feature extractor is a very
important component in a behavior-based cheating detection technique. I need to extract
the most useful features that define a cheating behavior of an unfair player from a normal
behavior of an honest player. These features depend on the data sent from the client to the
server. In online games, data sent between clients and servers is limited due to available
bandwidth and the smoothness of the gameplay. Therefore, finding proper balance between
game performance and accuracy of data collection is key, as it leads to increased accuracy
of detection. The features are extracted in terms of time frames. To illustrate this: if the
frame size equals 60 seconds it means that every 60 seconds in the game records, or log
files, will be represented as one record in the features file.
The Data Analyzer will take the generated features files and apply a selected
machine learning classifier (such as SVM) on them to generate detection models. Then,
these models will be used on a testing set of data to confirm the confidence of the cheat
detection model. The models will be stored in a depository to be used later on new, unseen
data.
32
In the Cheat Modeling Application Phase (Figure 3), new and unseen data (game
records) are collected from the server’s logs. For the purpose of this study, we assumed
that data collection occurs offline. Though, it should be noted that data could be collected
online or offline depending on the developer’s policy for detection. The feature extractor
then extracts the same features used in the training phase, and passes those files on to the
data analyzer. The data analyzer uses the models created in the training phase on the
features files, and classifies each features record. Finally, it produces a number of cheating
records that will be used to decide if the player was cheating or not. The decision is made
by specifying a threshold to identify a cheater. The threshold is based on the number of
flagged records, that when reached, can determined a player as a cheater. The threshold
value depends on the detection accuracy of the model and on the developer’s policy for
Log
Features
Extractor
Records
Data
Analyzer
Number of captured
cheating records
Features
Files
Models
Depository
Models
Figure 3: Cheat Modeling Application Phase
33
cheat detection. It also depends on when the detection is taking place: online while the
cheater is playing (real-time detection), or offline after a match is finished.
3.2. Case Study: Detecting AimBots in First Person Shooters
In August 2013, we published a paper [14] on “Behavioral-Based Cheating
Detection in Online First Person Shooters using Machine Learning Techniques.” The paper
focused on applying the cheat-modeling method above on FPS games to detect a common
type of cheating in those games, AimBots. We chose to apply our method on an FPS game
that we developed using the Unity3D game engine [19]. The game was called Trojan
Battles (Figure 4). The Game is fully online using a client-server architecture, and game
logs were collected on the server side. After collecting game logs, they were pre-processed
and then several supervised Machine Learning techniques were applied—such as Support
Vector Machines and Logistic Regression—in order to create detection models. In the
Figure 4: Trojan Battles in-game screenshot
34
training phase, we provided different experimental organizations for creating the detection
models: Multi-class classification, in which each cheat will be represented as a class. Then,
Two-Class classification, in which all cheats will be classified as “yes.” After that, similar
cheats were grouped together as one class. Finally, we created a model for each cheat
separately. In the application phase, the resulted detection models were used on new
unlabeled data to classify cheats. We used different unique features to classify cheats.
Finally, we ranked features and analyzed the results; then suggested proper solutions for
developers on how to use the resulting models.
3.2.1. Design and implementation
Our system consisted of four components: a game client with access to an AimBot
system, the game server that handles communication between all the clients and logging,
the feature extractor that pre-process log data, and the data analyzer that was responsible
for training classifiers and generating models for cheats.
Figure 5: Lobby screen with customization options
35
1) The Game Client: The game client was developed using Unity3D and is used by
players in order to login on the server, using their unique names. Players can then
create a new game or find an existing game on the lobby screen (Figure 5), join,
and play a timed-based death-match. Each game can be customized by using a
different map, changing the game, and altering the network protocol to be used
(TCP or UDP) and the number of heartbeat messages being sent each second.
Changing the number of heartbeat messages per second gave us the flexibility of
choosing the best balance between performance and accuracy of data collection.
Game clients give the player an AimBot utility as well, which enables them to
activate or deactivate any cheats they want during each match. The AimBots we
implemented were similar to commercial AimBots [20], each cheat is implemented
as a feature that can be triggered in-game (shown in Figure 6). These features can
be combined to apply a certain AimBot, or to create more complicated and harder
to detect cheats. We implemented five different types of AimBots in our game:
• Lock (L): A typical AimBot that, when enabled causes the crosshairs to aim on
a visible target continuously and instantly.
Figure 6: In-game AimBot toggle screen
36
• Auto-Switch (AS): Must be combined with Lock to be activated. It will switch
the Lock on and off to confuse the cheat detector. This AimBot was used in
[15].
• Auto-Miss (AM): Must be combined with Lock to be activated. It will create
intentional misses to confuse the cheat detector. This AimBot was also used in
[15].
• Slow-Aim (SA): Must be combined with Lock to be activated, and it could also
be combined with Auto-Switch or Auto-Miss. This is a well-known cheat that
works as Lock, but it will not aim instantly to the target, but rather ease in from
the current aim location.
• Auto-Fire (AF): Also called a TriggerBot. When a player crosshair aims on a
target, it will automatically fire. Could be combined with Lock to create an
ultimate cheat.
2) The Game Server: The game server is a non-authoritative server, mainly
responsible for communication between clients and for logging game data. The
server also manages game time, ensures all clients are synchronized, and notifies
all players when each match ends. Communication is event-based and achieved by
sending specific messages on in-game actions, like when a player spawns, dies,
fires, etc. It also receives a heartbeat message on intervals that are set before the
start of each match, which includes position and state data.
37
3) Features Extractor: In the features extractor, we used time frames of 10, 30, 60,
and 90 seconds to observe their effect on the detection. Most of the features are
extracted from the HeartBeat messages logged into our system. If a feature depends
on a target, then it will be calculated only when a target is visible. In the case of
multiple visible targets, it will consider the closest target as the current target. For
example, ‘Mean Distance’ calculates the distance between the player and his closest
visible target. We will explain the features extracted below, and what kind of data
they need from the logs. Note that features which are specific to FPS games will be
marked as (FPS), and Generic Features that can be used in other types of games
will be marked as (G):
- Number of Records (G): Contains the total number of log rows during the current
time frame. It uses the data from the HeartBeat messages.
- Number of Visible-Target Rows (G): Contains the number of rows in which a target
was visible. It also uses the data from the HeartBeat messages.
- Visible-To-Total Ratio (G): Calculates the ratio of the number of rows where a
target was visible, to the total number of rows in the current frame.
- Aiming Accuracy (FPS): Tracks the aiming accuracy based on the visible targets at
each frame. If a target is visible, and the player is aiming at it, the aiming accuracy
increases exponentially by each frame. When the player loses the target then the
accuracy starts decreasing linearly. This feature will use aiming target ID and
visible targets list from the logs.
38
- Mean Aiming Accuracy (FPS): While a target is visible, simply divide the number
of times a player aimed on a target by the number of times a target was visible. This
feature will use aiming target ID, and visible target lists from the logs.
- Player’s Movement (G): While a target is visible, this feature will define the effect
of a player’s movement on aiming accuracy. It will use player’s movement
animation from the logs.
- Target’s Movement (G): While a target is visible, this feature will define the effect
of a target’s movement on a player’s aiming accuracy. It will look up the target’s
movement animation from the logs at the current timestamp.
- Hit Accuracy (FPS): It will simply divide the number of hits over the number of
total shots within the specified time frame. Head shots will be giving higher points
than body shots. It will use the target ID from the ‘Actions’ Table in the logs (target
will be zero if it is a miss).
- Weapon Influence (FPS): This complex feature uses Weapon type, Distance, and
Zooming to define it. While a target is visible, it will calculate the distance between
the player and the closest target. Then, it will calculate the influence of the weapon
used from this distance, and whether the player is zooming or not. This feature will
use weapon type, player’s position, target’s position, and zooming value from the
logs. It will look up the target’s position from the logs at the current timestamp.
- Mean View Directions Change (G): This feature defines the change of the view
direction vectors during the specified time frame. It will be calculated using the
mean of the Euler Angles between vectors during that time frame. Note that players
39
will change view directions drastically when the target dies; therefore, we put re-
spawn directions into account.
- Mean Position Change (G): This feature will define the mean of distances between
each position and the next during the specified time frame. Similar to View
Directions above, this feature will put re-spawns into account.
- Mean Target Position Change (G): This feature will define the target’s position
changes only when a target is visible.
- Mean Distance (G): While a target is visible, it calculates the distance between the
player and the target. Then, it will calculate the mean of these distances. It uses a
player’s position, and its closest target’s position, from the logs.
- Fire On Aim Ratio (FPS): This feature will define the ratio of firing while aiming
at a target. It will use the firing flag, as well as the aiming target, from the logs.
- Fire On Visible (FPS): This feature will define the ratio of firing while a target is
visible. It will use the firing flag from the logs.
- Instant On Visible (FPS): This feature will define the ratio of using instant weapons
while a target is visible. It will use the Instant-Weapon flag from the logs.
- Time-To-Hit Rate (FPS): This feature will define the time it takes a player from
when a target is visible, until the target gets a hit. It will use client’s timestamps
from the HeartBeat Table in addition to Hit Target from the Action Result Table.
- Cheat: This is the labeling feature that will specify the type of cheat used. The cheat
that is used more than 50% of the time frame will be labeled “cheat.” If no cheats
were used over 50% of the time frame, it will be labeled “normal.”
40
4) Data Analyzer: The Data Analyzer will take the generated features files as an input,
and then apply two classifiers on them to detect the cheats used in our game. The
classifiers that we used were Logistic Regression, which is simple, and Support
Vector Machines (SVM), which is complex but more powerful. We used a data-
mining tool called Weka [21] to apply the classifiers to our data. To apply SVM in
Weka, we used the SMO algorithm [22]. For SVM, we used two different Kernels,
namely: Linear (SVM-L) Kernel, and Radial Bases Function (SVM-RBF) Kernel.
SVM contains different parameters that could be changed to obtain different results.
One of those parameters is called the Complexity Parameter (C), which controls the
softness of the margin in SVM, i.e. larger ‘C’ will lead to harder margins [23]. For
Linear Kernel, we set the complexity parameter (C) value to 10. The complexity
parameter of RBF Kernel was set to 1000. We trained the classifiers and created
the models using cross-validation using 10 folds [24].
3.2.2. Experiments and Results:
To produce the training data, we played eighteen different death-matches. The
match lengths were between 10 and 15 minutes long. Each match consisted of two players
playing over the network. We collected around 460 minutes of data. During each match,
one player used one of the cheats in section 3.2.1 above, while the other player played
normally (except for two matches, which were played without cheats). The messages
between the clients and the server were exchanged at a rate of 15 messages/second. This
41
rate was selected to accommodate delays, and to avoid over usage of bandwidth. In
addition, this rate gave us the best balance between performance and accuracy of data
collection. In the application phase, we collected data for a new match that contained three
different players. One player was using the Lock cheat, another player was using the Auto-
Fire cheat, and the third player was playing without any cheats (honest).
3.2.2.1. Training Phase:
To display the results of the training phase more clearly, we illustrate them in four
different organizational parts:
1) All cheats together, using multi-class classification: First, we analyzed data using
multi-class classification, in which we had all of the instances mixed together using
specific labels for each cheat class. The accuracy obtained using such methods was
shown in Table 1 (a), and the best accuracy was obtained using Logistic Regression
with frame size 60. The confusion matrix of the best classification test is shown in
Table 1 (b). In this table, it can be observed that there were many misclassifications
within the four “Lock-Based” cheats (shown in gray): L, AS, AM, and SA. That is
due to the similarity between those four cheats. They are all based on Locking, with
(a) Accuracy values for each classifier using different time frames
(b) Confusion matrix for Logistic and frame size 60
Table 1: Multi-class Classification
42
some confusion added in AS, AM, and SA. Therefore, in part 2 of the results
organizations below, we combined those four cheats into one cheat to observe the
difference in the resulting accuracy.
2) Lock-Based cheats classified together versus Auto-Fire (Three-Class): Once we
combined Lock-Based cheats together, we achieved a 17% jump in overall
accuracy compared to Part 1 above. As you can see in Table 2 (a), the best value
was obtained by using SVM-RBF when frame size is 60. The confusion matrix of
this model is shown in Table 2 (b).
3) All cheats combined, using two classes (“yes” or “no”): We analyzed data by
combining all the cheats together as a single cheat; i.e. label a cheat as “yes” if it
occurred. As you can see in Table 3, instead of posting the confusion matrix for
each classifier, we showed different accuracy measurements. The measurements
were Overall Accuracy (ACC), True Positive Rate (TPR), and False Positive Rate
(FPR). There are two reasons for showing TPR and FPR in addition to overall
accuracy:
(a) Accuracy values for each classifier using different time frames
(b) Confusion matrix for SVM-RPF and frame size
60
Table 2: Locked-Based vs AF Classification
43
a. Developers who care about detecting cheaters harshly will focus on a model
with a very high TPR. Therefore, they might detect some expert players as
cheaters, but they can add an extra level of detection.
b. Developers who care about “not detecting” any (or very few) honest
players as cheaters will focus on a model with a very low FPR.
The best values were distributed between all the classifiers. Therefore, it depends
on what the game developers care about most.
4) Each cheat classified separately: By separating each cheat, we achieved the
highest accuracy. However, each cheat will have its favorite classifier as shown in
Table 4. Again, the choice of a certain classifier depends on which accuracy
measurement the game developer cares about the most. We should clarify that the
larger the frame size, the less number of instances of cheats. This happened because
we were using the same data for different frame rates.
Table 3: Two-class Classification
44
(b) Auto-Switch
(c) Auto-Miss
(a) Lock
(d) Slow-Aim
(e) Auto-Fire
Table 4: Separated Cheats Classification
45
3.2.2.2. Application Phase
After we generated the models in the previous four parts, we played a 30-min long
death-match that contained three players: one honest player, and two cheaters (Lock and
Auto-Fire). By looking at the results in Parts 1-4 above, we decided to choose classifiers
SVM-L and SVM-RBF with frame sizes 30 and 60. Therefore, we used the models created
by the previous trained data on the current new unseen data (the 30-min, three-player
gameplay). We presented the results in Table VI as follows: Table 5 (a) shows the best
results obtained using models from parts 1-3 above, and Table 5 (b) shows the best results
obtained by using the models from part 4. In this table, we show the detection accuracy as
TPR for each cheat type and for normal gameplay. We chose TPR to capture the detection
accuracy for each model when we provided unknown cheat type.
Table 5: Application Phase Results
(b) Accuracy results for Part 4
(a) Accuracy results for Parts 1, 2 and 3
46
3.2.3. Features Ranking
In the paper, we also uncovered the features that were important and useful for the
detection process. We provided the ranking by using only the SVM-L models on all the
training parts above. To rank the features, we calculated the squares of the weights assigned
by the SVM classifier [25]. The feature with highest weight (squared) will be the most
informative feature. Table 6 shows the ranking of the features for each part in the training
phase above. We only show the top five features since there is a big difference in weights
between the first 3-5 and the others. In addition, we noticed that there were some
differences in the rankings between different frame sizes. However, the top feature is
always the same for different frame sizes. The most common top feature is Mean Aim
Accuracy, and this is the most informative feature in the detection of AimBots. For the
Auto-Fire cheat, Fire-on-Visible and Fire-on- Aim (both got very high weights) were the
most informative features since we were looking for detecting instant firing when a target
is in the crosshairs view.
47
(a) Top Five Features for Part 1 (b) Top Five Features for Part 2
(c) Top Five Features for Part 3
(d) Top Five Features for Part 4: Lock
(e) Top Five Features for Part 4: Auto-Switch
(f) Top Five Features for Part 4: Auto-Miss
(g) Top Five Features for Part 4: Slow-Aim (h) Top Five Features for Part 4: Auto-Fire
Table 6: Top Five Features in Each Part of the Training Phase
48
3.2.4. Analysis
As we can observe from the results above, to apply supervised machine learning
methods we need full knowledge about our data. In our context of FPS AimBots, we
noticed that awareness of the type of cheat could act to increase the accuracy of the
classifier. However, using a multi-class classifier did not prove to be as accurate as we had
hoped because of our confusion between the Lock-Based cheats. On the other hand, by
separating the cheats and having a classifier for each type of cheat, we achieved higher
accuracy in the first four parts (training phase). This shows that separated cheat detection
can help developers with cheat detection by applying all the models simultaneously on
each instance. In the application phase, we noticed that the models obtained from parts 2
and 4 gave the highest accuracy. However, when these models are applied to an online
game, there is no labeled set (i.e. unseen data). There needs to be a method confirming the
presence of cheating players. Because of that necessity, we need to specify the value of the
threshold as mentioned in section 3.1 (Model Description).
The following is an example of how this detection technique can be used by
changing the threshold based on the data from the application phase above. For the
purposes of this example, the model from Part 2 is going to be used with frame size 60.
The threshold is going to be set to 50%, which provides a loose detection to avoid some
false positives. Then, by using the following formula, the actual number of cheating records
that will flag a player as a cheater can be determined:
=
ℎℎ
49
Then,
ℎ
= ∗ ℎ
ℎ!
In this example:
=
1800
60
=30
Therefore,
=30∗0.5=15
On the other hand, if we look at the results from the application phase for applying
the models from part 4, we notice that all Lock-Based models detected the Lock cheat
accurately; however, only the Auto-Fire model detected the Auto-Fire cheat accurately.
Therefore, using these models—and applying them simultaneously on the gameplay data—
can provide the required detection mechanism. The threshold is set again at 50%; i.e. NCR
= 15 using the previous formula, however the following formula needs to be added to detect
a cheater (See figure 7 for illustration):
)*
ℎ
*
=+, , . , . , . /
Then,
0.!! =* + .
50
The reason for using an overall threshold is to detect players that use different
cheats during the same match. Note that the value of the threshold depends on the
developer’s policy for cheat detection. It could be low (harsh) or high (loose). However, it
should not be too harsh or it will capture many experienced honest players as cheaters.
During the analysis of game data, we noticed that some results were not accurate
enough. There are many reasons for this inaccuracy. First, in online games, delays (lags)
are frequent and can cause the message exchange rate to decrease (in some cases, reaching
less than 5 messages/sec). This delay reduces the accuracy of data collection and causes
some features to give odd values. In this case, low accuracy is expected. Another reason
for inaccuracy is the number of cheating data collected compared to normal behavior data.
In some cases—especially with separated cheats—the ratio between cheats and normal data
is higher than 40:60, which is the reasonable rate between normal and abnormal data for
classification in our opinion. Finally, improving the feature set—either by adding new
features or by modifying some of the existing ones—can also help increase accuracy.
Figure 7: Illustration of example #2
51
Overall, the accuracy that was achieved is very high, especially with the AimBots
confusions added (Auto-Switch and Auto-Miss). Frame sizes affect the accuracy; the frame
size 60 caused highest achievable accuracy over most of the experiments. However, as
mentioned before, larger frames will result in a fewer number of instances. In addition, we
assume that any frame larger than 60 is too large, since smart cheaters will activate cheats
only when they need them. Therefore, larger frame size might not be accurate to identify a
cheating behavior. In our opinion, we suggest using any frame size between 30 and 60
seconds.
3.3. Statistical Significance of the Cheat Modeling Data
In this section, I show the statistical significance of the data used during the Cheat
Modeling training phase. To test statistical significance, I used bootstrapping with
resampling [45] on my data; I then derived the 95% confidence interval [48]. The
resampling is based on data for the model from section 3.2.2.1 part 2 (Lock-Based cheats
classified together versus Auto-Fire), and using the SVM-RBF kernel. Moreover, I show
only the overall accuracy-based bootstrapping results with frame rates 30 and 60 seconds.
The resampling is applied on the original training data without changing the data
set size. To resample the data, I repeated some records and omitted other records. Then, I
trained the model on this sample using 10-fold cross validation to produce the overall
accuracy value for the current sample. The data was resampled 10,000 times, and the
overall accuracy values were stored to calculate the 95% confidence interval at the end.
52
The confidence interval includes the true population parameter, which is the Cheat
Modeling cheating probability, 95% of the time [48].
Table 7 below shows the Mean, Standard Deviation, Error Margin, and the
Confidence Interval for bootstrapping the “Lock-Based vs. AF” model during the training
phase. Figures 8 and 9 show the histogram for bootstrapping results on frame rates 30 and
60 respectively.
Table 7: Mean, Standard Deviation, Error Margin and 95% Confidence Interval for “Lock-Based vs. AF” model
Model
Frame
Size
Mean
Standard
Deviation
Error
Margin
95% Confidence Interval
Lock-Based
vs. AF
30 95.85 0.0058 0.000114 95.8491 - 95.8493
60 98.34 0.0055 0.000107 98.3415 - 98.3417
Figure 8: Bootstrapping histogram for “Lock-Based vs. AF” model on frame rate = 60 seconds
53
Figure 9: Bootstrapping histogram for “Lock-Based vs. AF” model on frame rate = 30 seconds
54
Chaper 4. Player Modeling
The goal here is to study the player's behavior and try to find the best way to detect
abnormalities in his/her gameplay. There are many ways to study a player’s behavior. One
way is to use anomaly detection techniques, in which we study individual player’s
behavior, build a model for each, and find anomalies within the player’s behavior. In this
section, I explain the player modeling part of the system in detail. I describe the model first
with the phases it includes. Then, I show my experiments and results for this model on an
online FPS game. The same FPS game in the section on Cheat Modeling.
4.1. Model Description
This part of the system focuses on studying the individual player’s behavior in order
to detect player abnormalities. Therefore, there is no labeled data for each player
differentiating normal gameplay from cheating gameplay. However, a model is built using
a player’s regular gameplay with enough data for that player. I define “enough” as the
amount of game logs (normal gameplay data) collected that will provide acceptable
accuracy values in the training phase. Any more game logs collected will yield the same
accuracy value. I used the assumption that the model represents normal gameplay, and not
cheating gameplay.
55
Similarly to Cheat Modeling, Player Modeling consists of a training phase and an
application phase (shown in Figures 10 and 11 below). In the training phase (Figure 10),
data (game records) are collected from the server’s log for honest players only (normal
gameplay). Then, I used the feature extractor to generate features files. The features
selected in this part of the system are different from the features selected in the Cheat
Modeling part. The features of the Player Modeling part will serve the purpose of anomaly
detection and building player’s models—such as means, standard deviations, and ratios of
Log
Features
Extractor
Records
Data
Analyzer
Features
Files
Models
Depository
Player 1
Model
Player 2
Model
Avg Players’
Model
. . .
Figure 10: Player Modeling Training Phase
56
certain values. The features here also depend on the data sent between client and server, as
well as extraction in terms of time frames.
The data analyzer takes the generated features files for a certain player with enough
(as defined previously) data and builds a model for this player using statistical anomaly
detection techniques, such as the Gaussian Model. Then, I select a threshold value Epsilon
for this player that maximizes the accuracy values and lowers the error rate. Finally, I store
this player’s model in the Models Depository. To avoid new player’s lacking enough data,
I built an average model that contains data from all the players with enough data stored,
and saved this model in the depository.
In the application phase (Figure 11), a certain player’s new and unseen data (game
records) are collected from the server’s logs. The data may contain both cheating and
normal gameplay records. The feature extractor then extracts the same features used in the
Log
Features
Extractor
Records
Data
Analyzer
Number of captured
cheating records
Features
Files
Models
Depository
Current Player
Model
Average
Player Model
OR
Figure 11: Player Modeling Application Phase
57
training phase and passes the files to the data analyzer. The data analyzer grabs the player’s
model from the Models Depository (if it exists), or it grabs the average player’s model if it
is a new player (or a player without enough data). Then, it applies the model and produces
a number of cheating records that will be used to decide if the player was cheating or not.
The decision in this part of the system is also made by specifying a threshold to identify a
cheater.
Similarly to Cheat Modeling, the threshold is based on the number of flagged
records that, when reached, can determine a player as a cheater. The threshold value
depends on the detection accuracy of the model, the developer’s policy for cheat detection,
and whether the player is old or new.
4.2. Case Study: Detecting AimBots in First Person Shooters
Using the same FPS game that we developed for the paper we published in 2013
[14], I collected more data for the purpose of player modeling and anomaly detection. After
collecting game logs, they were pre-processed and then a Gaussian distribution model were
built for each player (or Average players) for the purpose of detection. In the training
phase, I provided results for creating the detection models for each different player and for
new players. In the application phase, the resulted detection models were used on new
unlabeled data to detect cheaters. I used different unique features to detect anomalies
(cheats). Finally, I analyzed the results, my model design, and my choices in this section.
58
4.2.1. Design and Implementation
As I am using the same game from [14], it would follow that the game client and
server are the same. The other two components are the feature extractor and the data
analyzer, which are different from those in the Cheat Modeling part of the system.
1) Feature Extractor:
The features here mostly consist of means and ratios. In this features extractor, I
use time frames of 30 and 60 seconds to observe their effect on detection. Same as
with the Cheat Modeling part, most of the features are extracted from the HeartBeat
messages logged into our system. If a feature depends on a target, then it will be
calculated only when a target is visible. In case of multiple visible targets, it will
consider the closest target as the current target. The features extractor is explained
below, as well as types of data needed from the logs.
- Mean Aiming Accuracy (FPS): While a target is visible, simply divide the number
of times a player aimed on a target, by the number of times a target was visible.
This feature will use aiming target ID, and visible target lists from the logs.
- Mean Hit Accuracy (FPS): This divides the number of hits over the number of total
shots within the specified time frame. Headshots will be given higher points than
other body shots. It will use the target ID from the ‘Actions’ Table in the logs.
- Mean View Directions Change (G): This feature defines the change of the view
direction vectors during the specified time frame. It will be calculated using the
mean of the Euler Angles between vectors during that time frame. Note that players
59
will change view directions drastically when the target dies; therefore, we put re-
spawn directions into account.
- Mean Position Change (G): This feature will define the mean distance between each
position and the next during the specified time frame. Similar to View Directions
above, this feature will account for re-spawns.
- Mean Target Position Change (G): This feature will define the targets position
changes only when a target is visible.
- Mean Distance (G): While a target is visible, this will calculate the distance
between the player and the target. Then, it will calculate the mean of these
distances. It will use a player’s position and its closest target’s position from the
logs.
- Distance Standard Deviation (G): This feature will calculate the standard deviation
of the distances within the current frame using the mean distance calculated above.
- Fire On Aim Ratio (FPS): This feature will define the ratio of firing while aiming
on a target. It will use the firing flag as well as the aiming target from the logs.
- Fire On Visible (FPS): This feature will define the ratio of firing while a target is
visible. It will use the firing flag from the logs.
- Time-To-Hit Rate (FPS): This feature will define the time it takes a player from
when a target is visible to when the target gets a hit. It will use the client’s
timestamps from the HeartBeat Table in addition to Hit Target from the Action
Result Table.
60
2) Data Analyzer: The data analyzer from the Player Modeling section uses the
Gaussian model based statistical anomaly detection method [29, 43]. After
generating the features files, I used Matlab [46] to analyze the data and build a
model for each player. First, I needed to define the threshold value (Epsilon) for
each player’s model. Then, I tested the accuracy results for different values of
Epsilon and select the value that maximized the preferred accuracy measurement.
Figure 12 shows the results of using different values of Epsilon versus overall
accuracy, TPR, FPR, and F1-Score for a certain player (the figure shows the
Epsilon indices in a sorted list). After finding the best Epsilon value for a certain
player (or for the average model), I saved it as the threshold probability testing
value for this player. Finally, we used the stored model and the chosen threshold
Figure 12: Epsilon Values Plotted against different measurements
61
value (Epsilon) on new data for the selected player, in order to detect any future
behavioral abnormalities.
4.2.2. Experiments and Results
In this part of the system, I focus on collecting enough data for a player that can
characterize his behavior accurately. Therefore, I was able to collect enough data for two
players: 376 minutes for player 1, and 347 minutes for player 2 (the rest of players had 30
minutes or less). The data is comprised of more normal gameplay than cheating. In this
section, I first show how we selected and defined “enough” data. Then, I explain the
training phase that contains normal data only. After that, I explain the testing phase, which
contains both normal and cheating test cases for long-time and new players.
4.2.2.1. Selecting “Enough” value:
There are two factors that affect this selection. First, checking the elbow plot for
applying anomaly detection on different input data sizes. The second factor is the
developer’s decision regarding what “enough” data collected means. The second factor
cannot be achieved until I check the first factor. In my case, for the first factor, I applied
the following:
1. Shuffle the order of input data rows for a certain player.
2. Start with a small size portion of the input data.
3. Apply training on the data and generate a model.
4. Test the model and obtain the results (Accuracy, F1-Score, TPR, and FPR).
5. Store the results.
62
6. Increase the portion size of the input data and repeat steps 3 to 6 until you reach the
maximum amount of data you have for this player.
7. Plot the results against the different sizes of input data.
Figure 13 shows an example of the elbow plots for different measurements for
frame rate = 30 on a certain player’s data. Note that the size of the input is a multiple of 5
starting from an input size of 15 rows, i.e. 7.5 minutes.
Figure 13: Plots of different measurement results vs. different sizes of input data.
63
From figure 13, it can be derived that after 50 minutes the results are almost
identical, especially for the overall accuracy and the F1-scores. Now comes the second
factor, the decision-making. I selected 4500 seconds, or 75 minutes, to be considered
enough data for each player to detect cheats via the anomaly detection process.
4.2.2.2. Training Phase:
In Player Modeling there are two types of players: long-time players that satisfied
the “enough” amount of normal gameplay threshold, and new players with amount of data
still under the threshold. Therefore, the training data collection for each type will be
different, but the model building will be the same.
a. For long-time players, I collected 75 minutes of normal gameplay, and stored it in
a matrix. Then, we calculated the mean and the covariance matrix (Sigma) of the
normal data matrix. After that, the Matlab function ‘mnvpdf’ [47] was applied
using the calculated mean and Sigma on the test data matrix. The test data matrix
contains normal and cheating data records for the current player. Using an Epsilon
list, I then looped over this list to select the Epsilon that produced the best accuracy
value (Overall Accuracy, TPR, FPR, or F1-Score), as shown in Figure 12. This
Epsilon is the threshold value for this current player. Finally, I stored the model and
the selected Epsilon for the current player.
b. Typically, a new player’s data consists of a lower number of minutes (less than the
threshold) of gameplay. For these cases, I collected all the normal data for all the
players that satisfied the “enough” threshold. Then, I shuffled the row order of the
data, and selected a portion of the shuffled data, which is 150 minutes (“Enough”
64
* 2). Then, similar to (a) above, I calculated the Mean and Sigma, and applied
‘mnvpdf’ on the test data. Finally, I selected the threshold value (Epsilon) in the
same way as in (a) above, and stored the Average-Model and Average-Epsilon.
Next, I show the results of training and testing the system. First, I show the
maximum achieved result for each measurement. I then display the results by selecting an
overall-accuracy-based epsilon. Finally, I show the results by selecting an F1-Score-Based
Epsilon. Note that I ignored FPR-based and TPR-based epsilons. They show poor results
(less than 70%) for other measurements (visible in figure 12).
The training phase used the normal gameplay data for Player1 (P1), Player2 (P2),
and the average for both players (Avg-P) to apply new player’s test data. In addition, I
show the results for time frames = 30 and 60 seconds only, since 10 and 90 time frames
displayed poor results. The distribution of data was as such: the training set contains 60%
of normal gameplay data, the testing set contains 20% of normal gameplay and 50% of
cheating gameplay data, the application set contains 20% of normal gameplay and 50% of
cheating gameplay data.
1) Building a model for Player 1: This player had a total of 376 minutes of gameplay
data, including normal and cheating gameplay. First, Table 8 shows the best results
achieved by each measurement individually. I then used the epsilon (threshold) of
the best overall-accuracy to obtain the results of the other measurements, as shown
in Table 9 below. Similarly, I also used the Epsilon of the best F1-Score to obtain
the results of other measurements, as you can see in Table 10. From the results of
65
this player, I noticed that both frame sizes had advantages and disadvantages. For
example, frame size 30 had a better overall accuracy, but frame size 60 had a better
F1-Score.
Frame Size
Best Results for Each Measurement
Overall Accuracy TPR FPR F1-Score
30 91 % 100 % 0 % 90.9 %
60 92.4 % 100 % 0 % 92.9 %
Table 8: Best Results Achieved by each Measurement for P1
Frame Size
Accuracy-Based Epsilon Results
Overall Accuracy TPR FPR F1-Score
30 91 % 91.4 % 9.4 % 90.9 %
60 92.4 % 96.2 % 11.5 % 92.7 %
Table 9: Results achieved using the best Overall-Accuracy-Based Epsilon for P1
Frame Size
F1-Score-Based Epsilon Results
Overall Accuracy TPR FPR F1-Score
30 91 % 91.4 % 9.4 % 90.9 %
60 92.4 % 98.1 % 13.5 % 92.9 %
Table 10: Results achieved using the best F1-Score-Based Epsilon for P1
2) Building a model for Player 2: This player had total of 347 minutes of gameplay
data, including normal and cheating gameplay. First, Table 11 shows the best
results achieved by each measurement individually. Then, I used the Epsilon
(threshold) of the best overall-accuracy to obtain the results of the other
measurements as shown in Table 12 below. In the same manner, I also used Epsilon
of the best F1-Score to obtain the results of the other measurements, as you can see
66
in Table 13. From the results of this player, I noticed that the 60 seconds’ frame
size had better results in general.
Frame Size
Best Results for Each Measurement
Overall Accuracy TPR FPR F1-Score
30 90.8 % 100 % 2.5 % 92.7 %
60 96.4 % 100 % 2.4 % 97.1 %
Table 11: Best Results Achieved by each Measurement for P2
Frame Size
Accuracy-Based Epsilon Results
Overall Accuracy TPR FPR F1-Score
30 90.8 % 92.9 % 12.5 % 92.6 %
60 96.4 % 98.6 % 7.1 % 97.1 %
Table 12: Results achieved using the best Overall-Accuracy-Based Epsilon for P2
Frame Size
F1-Score-Based Epsilon Results
Overall Accuracy TPR FPR F1-Score
30 90.8 % 94.5 % 15 % 92.7 %
60 96.4 % 98.6 % 7.1 % 97.1 %
Table 13: Results achieved using the best F1-Score-Based Epsilon for P2
3) Building a model for New Players: I used an average model that is built using P1
and P2’s normal gameplay data. The test set is based on P1 and P2’s data as well,
since I am in the training phase still, and I do not have new player’s data. In a similar
way to parts 1 and 2 above, tables 14, 15, and 16 show the best results, overall
67
accuracy-based Epsilon results, and F1-Score-Based results respectively. In this
case, I noticed that the 60 seconds frame rate had the better results in general.
Frame Size
Best Results for Each Measurement
Overall Accuracy TPR FPR F1-Score
30 90.2 % 100 % 0.5 % 91.1 %
60 93.1 % 100 % 4.3 % 94 %
Table 14: Best Results Achieved by each Measurement for Avg-P
Frame Size
Accuracy-Based Epsilon Results
Overall Accuracy TPR FPR F1-Score
30 90.2 % 90.9 % 10.7 % 91.1 %
60 93.1 % 96.7 % 11.7 % 94 %
Table 15: Results achieved using the best Overall-Accuracy-Based Epsilon for Avg-P
Frame Size
F1-Score-Based Epsilon Results
Overall Accuracy TPR FPR F1-Score
30 90.2 % 90.9 % 10.7 % 91.1 %
60 93.1 % 96.7 % 11.7 % 94 %
Table 16: Results achieved using the best F1-Score-Based Epsilon for Avg-P
4.2.2.2. Application Phase:
After building a model for each player and the average model, I applied them to
new unseen data. Player 1 had a total of 105 minutes of new data, Player 2 had 105 minutes
as well. I had two new players to apply the average model; new player 1 had 27 minutes of
normal and cheating gameplay data, and new player 2 had 30 minutes or normal and
68
cheating gameplay data (but mostly cheating data). I show the results for using models for
time frames 30 and 60 seconds in Table 17 and Table 18 below. Table 17 shows the results
by using the Overall Accuracy-Based Epsilon as the threshold. On the other hand, Table
18 shows the results by using the F1-Score-Based Epsilon as a threshold.
Frame Size
Results for Player 1
Overall Accuracy TPR FPR F1-Score
30 86.7 % 89.2 % 15.7 % 86.7 %
60 89.4 % 92.3 % 13.5 % 89.7 %
Frame Size
Results for Player 2
Overall Accuracy TPR FPR F1-Score
30 87.8 % 88.9 % 13.9 % 90 %
60 90.5 % 93.7 % 14.3 % 92.2 %
Frame Size
Results for New Player 1 (using Average Model)
Overall Accuracy TPR FPR F1-Score
30 94.3 % 100 % 12.5 % 95.1 %
60 96.3 % 100 % 7.7 % 96.6 %
Frame Size
Results for New Player 2 (using Average Model)
Overall Accuracy TPR FPR F1-Score
30 81.5 % 80 % 11.1 % 87.8 %
60 90 % 100 % 60 % 94.3 %
Table 17: Application Phase Results using the Overall-Accuracy-Based Epsilon as a threshold
Frame Size
Results for Player 1
Overall Accuracy TPR FPR F1-Score
30 86.7 % 89.2 % 15.7 % 86.7 %
60 89.4 % 96.2 % 17.3 % 90.1 %
Frame Size
Results for Player 2
Overall Accuracy TPR FPR F1-Score
30 88.3 % 89.7 % 13.9 % 90.4 %
60 90.5 % 93.7 % 14.3 % 92.2 %
69
Frame Size
Results for New Player 1 (using Average Model)
Overall Accuracy TPR FPR F1-Score
30 94.3 % 100 % 12.5 % 95.1 %
60 96.3 % 100 % 7.7 % 96.6 %
Frame Size
Results for New Player 2 (using Average Model)
Overall Accuracy TPR FPR F1-Score
30 81.5 % 80 % 11.1 % 87.8 %
60 90 % 100 % 60 % 94.3 %
Table 18: Application Phase Results using the F1-Score-Based Epsilon as a threshold
As you can see from tables 17 and 18, the results were the close since both
thresholds (Epsilons) were the close (or same in most of the cases). Moreover, the results
were good in general except for new player’s 2 results with frame size 60; the FPR was
high because he had very low number of normal gameplay (he was cheating most of the
time). Therefore, every mistake in the normal gameplay records will have a very high
weight on the FPR value (since FPR = number of falsely detected cheats / Actual Normal
gameplay).
70
4.2.3. Analysis
The observations in the Player Modeling part of the system are different from the
observations in the Cheat Modeling part. In this part, I only have one organizational part
for building the models, whereas in the Cheat Modeling part I had four different
organizational parts. However, within this single organizational part, each model here is
different from the other models since it relies on each player’s gameplay behavior. I noticed
from the results that there is a slight difference between using the overall-accuracy-based
threshold (Epsilon) and using the F1-Score-based threshold, and in most of the cases, the
threshold is the same. Therefore, deciding which threshold the developer should use will
not make a big difference. Moreover, I noticed that the results from both frame rates (30
and 60) were different. In most of the cases, frame rate of 60 seconds gives better results
than frame rate of 30 seconds. Though, in general, they both gave good results.
When I want to apply these models on new unseen data, the results I will get will
be in the form of number of cheating records. In a similar way to the Cheat Modeling part,
I will use a threshold that defines if a player is considered cheating or not. The threshold
calculation will use the same formula:
=
ℎℎ
Then,
ℎ
= ∗ ℎ
ℎ!
71
Once this threshold is reached, the player is considered a cheater. If the number of
cheating records is lower than the threshold, the current player will have a partial cheating
score. This will be explained later in Chapter 5.
In this part, I gave a lower interest for selecting an Epsilon that is based on TPR or
FPR. The reason being that, for TPR, the highest value reached is usually 100%. But, the
Epsilon value at that level will be very high (higher probability value); meaning, the model
will consider most of the players as cheaters. Thus, the overall Accuracy, F1-Score, and
FPR values will be very poor. For FPR, on the other hand, the lowest value is usually taken
from the lowest Epsilon (lower probability value). Therefore, the model will be very loose
and it will lower the values of overall Accuracy, F1-Score, and TPR as well. Figure 12
illustrates the discussion about selecting for TPR and FPR.
In this part of the system, the inaccuracy from most of the models became
noticeable. The reasons for the lack of accuracy are varied. Some are similar to those found
in Cheat Modeling. For example, delays and improving the feature set. Other reasons
include things like the ratio of data, which is considered an issue in this part. In Player
Modeling, I train the system on normal gameplay only (a few cheating records could be
thrown into the training set as well); therefore—in the training phase—the normal data to
cheating data rate must be very high. For example, in our case, I suggested that 60% of the
normal data should be used for training, 20% for validating the model, and 20% for testing
the model. On the other hand, 50% of the cheating data is used for validating the model,
and the other 50% is for testing the built model.
72
Another issue that could affect the accuracy of the models is the value of enough
data collected to consider a player a long-time player. In this case, a low value of “enough”
will cause under-fitting of the models, and thus low accuracy. Alternately, a very high
value of “enough” will cause over-fitting, leading to an inaccuracy of models built.
For new players, the average model will be expected to cause inaccuracies. The
main reason is that the model was built using other player’s data. This can be solved by
generating different average models based on the gameplay of groups of players (skilled,
good, poor, etc.). Then, we can fit each new player to a model depending on how close his
gameplay is to one of these groups. This solution will need an extensive amount of data
and players.
4.3. Statistical Significance of the Player Modeling data
In this section, I show the statistical significance of the data used during the training
phase of the Player Modeling segment. To test the statistical significance, I used
bootstrapping—coupled with resampling [45]—on my data. I then derived the 95%
confidence interval [48]. This is similar to what I have done in the Cheat Modeling part.
The resampling is based on the data for Player 1 model using the F1-Score-Based Epsilon
(during the training phase). Moreover, I only showed the overall accuracy and F1-Score
bootstrapping results with frame rates 30 and 60 seconds.
The resampling is applied on the original training data without changing the data
set size. To resample the data, I repeated some records and omit others. Then, I trained the
model on this sample, selected the best Epsilon using a test set, and produced the overall
73
accuracy and F1-Score values for the current sample. The data was resampled 10,000
times. The overall accuracy and F1-Score values were then stored to calculate the 95%
confidence interval at the end. The confidence interval includes the true population
parameter, which is the Player Modeling cheating probability, 95% of the time [48].
Table 19 below shows the Mean, Standard Deviation, Error Margin, and the
Confidence Interval for bootstrapping Player 1’s model during the training phase. Figures
14 and 15 show the histograms for bootstrapping results (overall accuracy) on frame rates
30 and 60 respectively. Moreover, figures 16 and 17 show the histograms for bootstrapping
results (F1-Score) on frame rates 30 and 60 respectively.
Model
Frame
Size
Mean
Standard
Deviation
Error
Margin
95% Confidence Interval
Player 1
(ACC)
30 0.9008 0.0105 0.000206 0.9006 - 0.9010
60 0.9172 0.0109 0.000214 0.9170 - 0.9174
Player 1
(F1-Score)
30 0.9023 0.0097 0.0001903 0.9021 - 0.9025
60 0.9226 0.0098 0.0001920 0.9224 - 0.9228
Table 19: Mean, Standard Deviation, Error Margin and 95% Confidence Interval for Player 1’s model
74
Figure 14: Histogram for bootstrapping results (for overall accuracy) on frame rate = 30
Figure 15: Histogram for bootstrapping results (for overall accuracy) on frame rate = 60
75
Figure 17: Histogram for bootstrapping results (for F1-Score) on frame rate = 30
Figure 16: Histogram for bootstrapping results (for F1-Score) on frame rate = 60
76
Chaper 5. Decider
The Decider is the third and last part of the system. It is considered the final step toward
cheating detection in our system. The Decider summarizes the results obtained from both
the Cheating Modeling and Player Modeling parts of the system. Then it gives a final result
based on certain criteria that will be discussed in 5.1 below. In this chapter, I describe the
Decider model first, including the input and output. Then, I explain the experiments
performed and analyze their results.
5.1. Model Description
From its name, the Decider is responsible for making the final decision for a certain
player’s data—whether s/he is cheating or not. The Decider receives results from the
Cheating Modeling part and the Player Modeling part as an input. Then, it produces a final
decision, in a form of a cheating probability, as an output. Figure 18 shows what kind of
configurations and input that the Decider expects.
77
5.1.1. The Decider’s Input and Configurations
As I explained before, the Decider receives its input as the result from the Cheating
Modeling and Player Modeling parts of the system—regardless of the size of the time
frame and type of model used. The Cheating Modeling part sends the probability of
cheating achieved by applying the preferred model. It also sends the results acquired from
the training phase, such as the overall accuracy, TPR, FPR and the F1-Score for the current
model. I call these results the “level of confidence” of the model, because they show how
well the model performed during the training phase, and how confident we are in this
model. Moreover, the Cheating Modeling part sends the selected threshold that will be used
to define a cheater.
Figure 18: An Illustration of the Decider's Configurations and Inputs.
78
The Player Modeling part, on the other hand, sends the probability of cheating by
applying the current player’s model, or by applying the average player’s model in case of
a new player. It also sends the level of confidence for each accuracy measurement (overall
accuracy, TPR, FPR, F1-Score). Note that the level of confidence is based on each player’s
model. However, for new players, it will be based on a fixed level of confidence. It also
sends the threshold selected for the Player Modeling part to define a cheater. Finally, it
sends a flag that indicates if this current player is a new player or not. This flag is based on
the player’s total number of minutes he/she played.
After receiving all of the results, the users of the system can then configure the
Decider based on their preference. They can select their favorite type of measurement,
which will affect the level of confidence probability. Note that the FPR level of confidence
is reversed. For example, if the model gave an FPR of 4.5 %, then the level of confidence
based on FPR will be 95.5 %.
Another important configuration for the Decider is the weight of each part of the
system. This weight depends on two factors: the level of confidence for each part, and the
experience of the player (long-time vs. new). The first factor is determined by which model
had the better results in the training phase. Therefore, based on this factor, the weight is
calculated as follows:
I. Add both Levels of Confidences (LCs) from the Cheating Modeling and Player
Modeling parts as the “Total Level of Confidence” or TLC.
II. Calculate the weight of the Cheat Modeling LC on the TLC.
79
III. Assign the resulting value for the Cheat Modeling weight, and the Player
Modeling weight is automatically equals to 100 – Cheat Modeling Weight.
The second factor, the player total experience, affects the weight by assigning a
higher value for the Cheat Modeling weight. The value depends on the results of the
average model on new player’s data, and the developer’s decision. I suggest that the
Cheat Modeling weight in this case should be between 70% and 90% depending how
the average model performed during training and testing. In my system, the Cheat
Modeling weight (CMw) was set to 75% since the average model performed slightly
well in the application phase.
5.1.2. The Decision Making Process and the Output
When the input is received, and the configurations are set, the Decider is ready to
make the final decision for the current player’s gameplay data. The decision making
process depends on eight different variables, as follows:
- The probability of cheating records from the Cheating Modeling part (CM%).
- The Threshold value for the Cheating Modeling part (CMth).
- The Level of Confidence from the Cheating Modeling part (CMLC).
- The Weight of the Cheating Modeling part (CMw).
- The probability of cheating records from the Player Modeling part (PM%).
- The Threshold value for the Player Modeling part (PMth).
- The Level of Confidence from the Player Modeling part (PMLC).
80
- The Weight of the Player Modeling part (PMw).
To calculate the final cheating probability (i.e. the final decision), several steps are
calculated based on the variables above. First, the threshold value from each part
defines when to flag a player as a cheater. Therefore, any probability of cheating record
over the threshold will be useless, since he is considered 100% cheating based on the
current model results. As a result, I convert any value of CM% and PM% between zero
and the thresholds (CMth and PMth) to be between 0% and 100% (normalization), and
thus, any value over the thresholds is set to 100%. To do so, I apply the following
formulas:
%=
%
ℎ
ℎ 0! % ℎ %>ℎ…1
In the same way,
5%=
5%
5ℎ
ℎ 0! 5% 5ℎ 5%>5ℎ…2
Now, I convert the values of CM% and PM% so they do not exceed the level of
confidence achieved from each modeling part (clamping):
%=% × … 3
Also,
5%=5% ×5 … 4
81
Finally, using the weights (CMw and PMw), the overall cheating probability (OC%)
value is calculated as follows:
%=% ×9+5% ×59 … 5
The value of OC% will be the final decision of the system, and the output of the
Decider.
82
5.2. Experiments and Results
In this section, I performed the experiments using some built models from both
Cheat Modeling and Player Modeling parts of the system. As a result, there was no need
for training and testing of the models. The goal for the experiments here is to show the way
the Decider performs using the formulas from section 5.1 above, and the models from
chapters 3 and 4.
I show results for Long-time players with enough data, as well as for new players.
I also show results for Player 1, Player 2, New Player 1, and New Player 2 using the unseen
data from the application phases in chapters 3 and 4. The model I selected from the Cheat
Modeling part is the one in section 3.2.2.1 part 2 (Lock-Based cheats classified together
versus Auto-Fire), and using the SVM-RBF with frame rate of 60 seconds. The threshold
selected for this part (CMth) is 50%. Alternately, I used the models for each specific player
selected (and the average model for new players) with frame rate of 60 and using the F1-
Score-based Epsilon model. The threshold selected for this part (PMth) is 60%, because I
have a higher FPR in this part of the system. For Long-time players, I used the level of
confidence achieved from each player’s model during the training phase. As for New
players, I used level of confidence of the Average-Model.
5.2.1. Long-time player’s results
In this section, I show the results of players with enough data. These players are
Player 1 and Player 2. For Player 1, I applied the Cheat Modeling model to their data to
get a cheating records probability (CM%) of 53.57% from the gameplay data. Following
83
this, I applied their own model from the Player Modeling part to get a cheating records
probability (PM%) of 67.86%. I stored the results in an XML file to be used by the Decider.
After gathering the results from each part, I calculated the weights CMw and PMw using
the levels of confidence (CMLC and PMLC) achieved as follows:
I. Calculate :;<=<=;<+>=;<=98.2%+92.9%=191.1%
II. Calculate CMLC weight on TLC and round it:
9 =
=
98.2%
191.1%
= 51.4%→51%
III. 59 =100%−9 =49%
Next, I normalize the values of CM% and PM% using formulas (1) and (2); but
first, because both CM% and PM% are greater than the thresholds for each part, I cut them
to be: CM% = 50% and PM% = 60%. Then, I applied the formulas to get:
%=
0.5
0.5
=1 ..100%
In the same way,
5%=
0.6
0.6
=1 ..100%
Using formulas (3) and (4), I clamped the values of CM% and PM% so they do not exceed
the level of confidence of the overall accuracy in our case (CMLC = 98.2%):
%=1 × 0.982=0.982 ..98.2%
For PM%, the accuracy-based PMLC = 92.9%:
84
5%=1 × 0.929=0.929 ..92.9%
Finally, I calculated the overall cheating probability (OC%) using formula (5):
%=% ×9+5% ×59
=0.982 ×0.51+0.929 ×0.49=0.956
Therefore, the system is 95.6% confident that Player 1 was cheating during this match.
For Player 2—after applying the models from each part on his data—the achieved
CM% was 13.04%, while the achieved PM% was 13.04%. Next, I calculated the weights
CMw and PMw using the same way, which gives:
I. Calculate :;<=<=;<+>=;<=98.2%+97.1%=195.3%
II. Calculate CMLC weight on TLC and round it:
9 =
=
98.2%
195.3%
= 50.3%→50%
III. 59 =100%−9 =50%
Next, I normalized the values of CM% and PM% using formulas (1) and (2) to get:
%=
0.1304
0.5
=0.2608 ..26.08%
In the same way,
5%=
0.1304
0.6
=0.2173 ..21.73%
85
Using formulas (3) and (4), I clamped the values of CM% and PM% so they did not exceed
the level of confidence of the overall accuracy in my case (CMLC = 98.2%):
%=0.2608 × 0.982=0.2561 ..25.61%
For PM%, the accuracy-based PMLC = 97.1%:
5%=0.2173 × 0.971=0.211 ..21.1%
Finally, we calculated the overall cheating probability (OC%) using formula (5):
%=% ×9+5% ×59
=0.2561 ×0.5+0.211 ×0.5=0.2335
Therefore, the system was 23.35% confident that Player 2 was cheating during this match.
5.2.2. New player’s results
In this section, I show the results for players that did not reach the data threshold of
“enough”. These players are New Player 1 and New Player 2. For New Player 1, I applied
the Cheat Modeling model on their data to calculate a cheating records probability (CM%)
of 51.85% from the provided gameplay data. Then, I applied the average model from the
Player Modeling part to get a cheating records probability (PM%) of 55.56%. Since this
was a new player, the weights were set to be 75% for CMw and 25% for PMw%.
86
Next, I normalized the values of CM% and PM% using formulas (1) and (2) to get
(I set CM% to 50% since it is greater than CMth):
%=
0.5
0.5
=1 ..100%
In the same way,
5%=
0.5556
0.6
=0.926 ..92.6%
Using formulas (3) and (4), I clamped the values of CM% and PM% so they do not exceed
the level of confidence of the overall accuracy (CMLC = 98.2%):
%=1 × 0.982=0.982 ..98.2%
For PM%, the accuracy-based PMLC = 93.1%:
5%=0.926 × 0.931=0.862 ..86.2%
Finally, I calculated the overall cheating probability (OC%) using formula (5):
%=% ×9+5% ×59
=0.982 × 0.75+0.862 × 0.25=0.952
Therefore, the system is 95.2% confident that New Player 1 was cheating during this match.
For New Player 2, after applying the models from each part on their data, the
achieved CM% was 83.33% and the achieved PM% was 93.33% as well. Using the already
set weights for average models, I normalized the values of CM% and PM% using formulas
87
(1) and (2) to get (we set CM% to 50% and PM% to 60% since they both are greater than
CMth and PMth respectively):
%=
0.5
0.5
=1 ..100%
In the same way,
5%=
0.6
0.6
=1 ..100%
Using formulas (3) and (4), I camped the values of CM% and PM% so they do not exceed
the level of confidence of the overall accuracy in my case (CMLC = 98.2%):
%=1 × 0.982=0.982 ..98.2%
For PM%, the accuracy-based PMLC = 93.1%:
5%=1 × 0.931=0.931..93.1%
Finally, I calculated the overall cheating probability (OC%) using formula (5):
%=% ×9+5% ×59
=0.982 × 0.75+0.931 × 0.25=0.969
Therefore, the system is 96.9% confident that New Player 2 was cheating during this match.
88
5.3. Analysis
The Decider is an important part of our proposed cheating detection system. The
design I proposed is applicable to any online game. However, the configurations can be
changed depending on the developer’s policy. These configurations are based on the
business model of the game, and how (in the developer’s opinion) it should treat players.
For example, the weights can be adjusted manually instead of calculated automatically to
satisfy the needs of the developer’s detection policy. It depends on which modeling part
they prefer to rely on.
Another configuration that can be adjusted is the threshold (as we discussed in the
analysis of chapters three and four above). As I mentioned before, this value depends on
the detection policy of the developer. If they want harsh detection, they set a very low
threshold value. Alternately, if they want a loose detection, they need to set a very high
threshold value. The threshold value could also be affected by the detection accuracies
during the training phase. The better the model perform, the higher the threshold value
should be set. Moreover, the FPR value should be an important factor to select a threshold,
since high FPR enables detection of many honest players’ gameplay records as cheating
records. Thus, the threshold value in that case should be high.
The level of confidence is also an important configuration. Setting the probability
of cheating records to the level of confidence will give us a more accurate result
presentation. This setting assumes that this is the best the current model (or average of
models) performed, and it should not exceed this accuracy (or other measurements) value.
89
This configuration is also depends on the developer’s policy, and how they want to show
the results to the users of the system (or the business side developers).
90
Chaper 6. Discussion
In this chapter, I discuss the system in different criteria. First discussing the significance
of the results, and interpreting the meaning of the results. Then, I explain how to deal with
incoming data for new players, and how to decide if their data can be considered normal
or cheating data. In addition, I will explain how to deal with log-time player’s self
improvements. After that, I describe the cost effect in monetary and performance terms.
Finally, I explain the factors that usually affect the selection of thresholds in the system.
6.2. Significance of the Results
Each result (probabilities) from each part of the system gives a different meaning.
The meaning of the results during the training phase are different from the meaning of the
results during the application phase. The results of the training phases represent our level
of confidence for each model. While the results of the application phases represent our
certainty of whether the current player is cheating or not.
In the training phase, I am concerned for the results of the Cheat Modeling part, the
Player Modeling part with long-time players, and the Player Modeling part with new
players. For the Cheat Modeling part, a model training result of 95% means that the system
is 95% confident that this model will detect any AimBots, Auto-Fire Cheats, and any
91
similar behavior cheat for any player, using classification. On the other hand, for the Player
Modeling part, a long-time player’s model training result of 95% means that the system is
95% confident that this model will detect any abnormal behavior done by this certain
player, including cheating. Furthermore, a new player’s model training result of 95%
means that the system is 95% confident that the average model will detect any abnormal
behavior done by any new player, including cheating.
The application phase results (probabilities) have different meanings than the
training phase results. In this phase, I am concerned for the results of the Cheat Modeling
part, the Player Modeling part with long-time players, the Player Modeling part with new
players, and the Decider. For the Cheat Modeling part, a model result of 90% means that:
based on the classification model/s of cheats, the system is 90% certain that the player had
cheated during the match. For the Player Modeling part, a long-time player’s model result
of 90% means that: based on the player's normal behavior model, the system is 90% certain
that this player was playing abnormally (or cheating) during this match. Moreover, a new
player’s model result of 90% means that: based on the average normal gameplay model,
the system is 90% certain that this player was playing abnormally (or cheating) during this
match. Finally, for the Decider, I have an interpretation for the long-time players results,
as well as new players results. For long-time players, a result of 90% means that: based on
the level of confidence obtained from the training phase, the certainty probability obtained
from the application phase, and the weights assigned, the system is 90% certain that the
player cheated during the match. For new players, on the other hand, a result of 90% means
that: based on the level of confidence obtained from the training phase using the average
92
gameplay, the certainty probability obtained from the application phase and the weights
assigned, the system is 90% certain that the player cheated during the match.
6.2. Dealing with new players’ incoming data and Long-Time Players Self-
Improvements
In the Player Modeling part of the system, after finishing training the system and
building the models, I had a model for each player with enough of data (i.e. long-time
players). However, for new players that did not reach the threshold of enough, I kept
collecting their assumed normal gameplay data until they reach the enough threshold value
of gameplay minutes. Therefore, I need to define what is considered normal gameplay for
new players to be added for their model.
Since I am not 100% sure that the current new player is playing normally, I need to
set a probability of cheating records limit that defines a normal gameplay match from a
cheating match. In my case, this done by following the steps (using a 30% cheating record
limit) below:
- First, test the records as we do for new players, by using average model.
- If the probability of cheating records is less than 30%, add as normal data.
- If the probability of cheating records is greater than or equal 30%, do not add as
normal data (ignore these records and report cheating probability).
- Once we reach the enough data threshold for this player, build a model using the
assumed normal gameplay data, and store the model.
93
The selection of the cheating record limit depends on the developer’s policy. The
higher the value, the looser the developers are in defining a cheating behavior.
For Long-time players, their skills will improve by time. Then, after a period of
time, their stored normal gameplay model will detect abnormalities in their current
gameplay. Therefore, we need to recollect new data for these players periodically and build
a new (updated) model for them. This is done by collecting normal gameplay data in the
same way above every week. Then, build a new (updated) model at the end of the week.
6.3. Cost Effects
One of the advantages of building a behavioral based cheating detection system is
eliminating the reliance on a client-side third-party program to monitor the players.
Therefore, this acts to increase the revenue of the company by saving money used on
buying and maintaining outside monitoring programs such PunkBuster [2]. These third-
party programs are a single point of failure in many cases, and can cause an online game
to stop providing services in cause of failure. This happened to PunkBuster when their
servers crashed in 2013 causing many online games that depended on it to stop their
services [49]. In addition, my system will detect cheaters accurately and in a safe manor
from any privacy breaking legal issues.
Another type of cost is the performance effect of the system on the game. Since the
system is mostly used offline, it is considered an “after the fact” detection. The training is
done offline, and before applying any model on the new incoming data. The application of
94
the system and its models are assumed to be done offline, after the match is finished.
Therefore, the system should not affect any ongoing match, or even the current gameplay
stream. The only effect is the storage of additional data in case of using more complex
features.
6.4. Setting the Threshold Values
When I get the probability of cheating records from each part of the system, the
probability will be adjusted depending on a threshold value. The threshold values were
discussed during the analysis section of each part, and how they depend on the model
accuracy. However, I did not discuss what affects these values in detail.
One of the factors in choosing the thresholds is the False Positive Rate (FPR) of the
selected model during the training phase. When a model has a high FPR, it means that this
model will mostly detect many honest players as cheaters. Therefore, the probability of
cheating detection during the application phase will be high. Hence, I need to set the
threshold to a high value to accommodate the resulting high FPR.
Another factor that affects the threshold choice is the True Positive Rate (TPR) of
the selected model during the training phase. If a model gives a low TPR, it means that the
current model will fail in detecting many cheaters and will assign them as honest players
instead. Therefore, the probability of cheating detection during the application phase will
be low. Hence, I need to set the threshold to a low value to accommodate the resulting low
TPR.
95
Chaper 7. Conclusion and Future Work
In this chapter, I conclude my work and summarize it in 7.1. I then discuss the
contribution of our work in 7.2. Afterwards, I explain future work on the behavioral based
cheating detection research in 7.3. Finally, I give my final concluding remarks in 7.4.
7.1. Summary
Cheating Detection is a very important part in any online game development
process. Since cheating in online games comes with many consequences for both players
and companies. Cheating will cause reduction in gameplay enjoyment. Moreover, it will
affect a company’s revenue in many ways. Over the years, several anti-cheating solutions
have developed by gaming companies. However, most of these companies use cheating
detection measures that may involve breeches to a user’s privacy. Therefore, the need for
a server-side solution has risen up. This solution analyzes gameplay logs only without
invading the player’s privacy. In this dissertation, I first surveyed research performed
regarding online games cheating detection and data analysis in general. Some researchers
have published some non-behavioral based cheating detection techniques, while others
have published some behavioral-based studies.
96
My solution is a system generic approach that is based on the behavior of players.
It is a server side solution that consists of three parts: Cheat Modeling, Player Modeling,
and the Decider. The Cheat Modeling part builds a model for each cheat type, and then
detects players who have a behavior similar to that cheat’s behavior. On the other hand,
the Player Modeling part creates a model, and detects any anomalous behavior, for
individual players. Finally, the decider takes the probability of detection from both parts of
the system and produces an overall probability. This probability depends on the level of
confidence of each part, as well as other criteria such as the gameplay age of the player.
The system experiments show positive results for all three parts. Indicating
promising advancements in the development of behavioral-based cheating detection
system.
7.2. Contributions
I summarize the contributions of the system in the following points:
1. One of the major contributions of my work is providing developers the ability to
detect cheating without resorting to invading their clients’ machines, and thus no
privacy invasion. Since the system is designed as a server-based system, there is no
interaction with the players’ clients or the machine. The system relays only on the
gameplay data transmitted between each client and the game server. Therefore,
there is no invasion of a client’s privacy, similar to what happen in the Warden [1]
or PunkBuster [2].
97
2. The system requires minimal modifications on the game. These modifications, if
they exist, are related to the data exchanged between clients and server. This data
helps to build useful features for detection.
3. The performance overhead of the system is very low. My design of the system is
based on an “after-the-fact” detection. The server stores the gameplay data in the
log database. Then, my system works on the new data from the log database without
interacting with the server or the client. Therefore, the performance overhead is
going to be very minimal on the gameplay, and can even be neglected.
4. My system can serve as a first level of detection for companies that uses manual
data analysis of players. The system filters players with abnormal gameplay using
a large weight on the Player Modeling part of the system. Then, the developers can
have a lower number of players with abnormalities in their gameplay.
7.3. Future Work
Several additions can be made to the system in each part. For the Cheat Modeling
segment, I could collect more data using a mixture of cheats for each player instead of one
cheat the whole match. In addition, more features could be generated to improve
identifying cheats.
On the other hand, for the player modeling part, there was a limitation I faced, and
that is the amount of data. I was not able to obtain a huge amount of detailed gameplay
data due to the secrecy policies of online game companies. Therefore, more work could
98
have been done in the Player Modeling part of the system such as: clustering, and time-
series analysis.
Clustering techniques require a large amount of available data. After I collect
enough player data, I can cluster them into skill-based clusters using an appropriate
clustering algorithm for the data on hand. For example, I can try to cluster them into these
five different clusters: Very Skilled, Skilled, Average, Poor, and Very Poor. Clustering will
help the developers analyze the player behavior, and improve the anomaly detection
process. Several clustering techniques can be used such as K-means, EM clustering, and
hierarchical clustering to generate the most useful clusters for our system.
Detecting anomalies in the time-series representation of data is also called detecting
contextual anomalies (or conditional anomalies). In this type of detection, I can try to find
anomalies within a specific context using the contextual feature, which is the match number
in FPS games, in addition to the normal behavioral feature, which is the skill (or skill based
features) in my case. There are several techniques to detect contextual anomalies. Some of
them reduce a contextual anomaly problem to a point anomaly problem. This is achieved
by defining a context, and then using a regular anomaly detection technique on the data
within that context [29]. Other techniques utilize time-series modeling techniques to
extend them for anomaly detection purposes, such as regression-based techniques [29], and
Symbolic Aggregate approXimation (SAX) [30].
Clustering could also serve an anomaly detection layer by applying clustering-
based anomaly detection techniques. Some of these techniques work by finding data points
99
that do not belong to any cluster and define them as anomalies, such as using the DBSCAN
algorithm [29]. Moreover, other techniques define anomalies as the data points that lies far
from a cluster centroid, or data points that forms small clusters compared to the others [29].
In addition, another technique that could be used is the SIVM algorithm to find clusters in
extreme cases [27].
In case of a randomized farming bot that has human-like behavior, a natural
language processor can be added to detect whether the player controlling the avatar is
human or bot. This can be done by letting the detection system have a conversation with a
player, while in possession of an expected list of smart answers. If the player did not replay
or had some irregular answers, then it is considered a bot.
This system design could be used to predict the skill of a player. This can be done
by using all parts of the system to define skill levels first (say five levels), and then detect
the skill level of a new player. The classification part (Cheat Modeling) will be used to
train the system on certain types of skill levels, and build a model for each skill level. Then
on the application phase, the skill of a new player will be classified by all of the skill-level
models to produce the probability of the player’s relation to a certain skill.
On the other hand, the Anomaly part (Player Modeling) will be used to build models
for each skill-level during the training phase. Then, during the application phase, the skill
level of a new player will be compared against each model, and the model that does not
give an anomaly is considered the skill of the current player. The Decider will be used to
compare the results from each part of the system, and produce the final skill level of the
100
current player based on the weight of each model and the probability given by each skill-
level model from both parts.
The Player Modeling part could also be used to define player skill behavior
individually. This will help to detect any changes in the current player’s skill level. If a
change occurs, we can check the skill of the player’s friends in his friend list. If it matches
a friend’s skill, then the current player’s character is played by his/her friend. If not, then
the player might be cheating.
7.4. Concluding Remarks
In this dissertation, I proposed a system design for a behavioral-based cheating
detection method. This method preserves a player’s privacy by dealing only with the
server-side gameplay logs. The system was tested using a First-Person Shooter (FPS)
game, and proved to be accurate and useful. However, more designs would be implemented
and tested if gaming companies would cooperate more with researchers by providing their
detailed gameplay data.
101
References
[1] G. Hoglund and G. McGraw, Exploiting online games: cheating massively distributed
systems, 1st ed. Addison-Wesley Professional, 2007.
[2] S. Webb and S. Soh, “A survey on network game cheats and P2P solutions,” Australian
Journal of Intelligent Information Processing Systems, vol. 9, no. 4, pp. 34–43, 2008
[3] G. Hoglund and G. McGraw, Exploiting Online Games, Addison-Wesley, 2007.
[4] Jeff Yan and Brian Randell. 2009. An Investigation of Cheating in Online Games. IEEE
Security and Privacy 7, 3 (May 2009), 37-44. DOI=10.1109/MSP.2009.60
http://dx.doi.org/10.1109/MSP.2009.60
[5] Igor Muttik, Securing Virtual Worlds Against Real Attacks: White Paper, McAfee, Inc., 2008
[6] Stefan Mitterhofer, Christopher Kruegel, Engin Kirda, Christian Platzer, "Server-Side Bot
Detection in Massively Multiplayer Online Games," IEEE Security and Privacy, pp. 29-
36, May/June, 2009
[7] Hermann, U., Katzenbeisser, S., Schallhart, C., & Veith, H. (2005). Enforcing Semantic
Integrity on Untrusted Clients in Networked Virtual Environments.2007 IEEE
Symposium on Security and Privacy SP 07, 179-186. Ieee.
102
[8] Bethea, D., Cochran, R. A., & Reiter, M. K. (2010). Server-side Verification of Client
Behavior in Online Games. Work, (February), 1-27.
[9] Mönch, C., Grimen, G., & Midtstraum, R. (2006). Protecting online games against
cheating. Proceedings of 5th ACM SIGCOMM workshop on Network and system
support for games NetGames 06, 20. ACM Press.
[10] C. Thurau, C. Bauckhage, and G. Sagerer. Learning human-like movement behavior for
computer games. In In Proc. 8th Int. Conf. on the Simulation of Adaptive Behavior
(SAB’04), pages 315–323. IEEE Press, 2004.
[11] C. Thurau, C. Bauckhauge, and G. Sagerer. Combining self organizing maps and multilayer
perceptrons to learn bot-behavior for a commercial game. In Proceedings of the GAME-
ON03 Conference, pages 119–123, 2003.
[12] C. Thurau and C. Bauckhage. Towards manifold learning for gamebot behavior modeling.
In Proc. Int. Conf. on Advances in Computer Entertainment Technolog (ACE’05), pages
446–449, 2005.
[13] C. Thurau, T. Paczian, and C. Bauckhage. Is Bayesian imitation learning the route to
believable gamebots? In In Proc. GAME-ON North America, pages 3–9, 2005.
103
[14] H. Alayed, F. Frangoudes, C. Neuman. Behavioral-Based Cheating Detection in Online
First Person Shooters using Machine Learning Techniques. In Proc. of the IEEE
Conference on Computational Intelligence and Games (CIG 2013), Niagara Falls,
Canada, pages 33-40, 2013.
[15] S. Yeung and J. C. Lui, “Dynamic Bayesian approach for detecting cheats in multi-player
online games,” Multimedia Systems, vol. 14, no. 4, pp. 221–236, 2008.
[16] L. Galli, D. Loiacono, L. Cardamone, and P. Lanzi, “A cheating detection framework for
Unreal Tournament III: A machine learning approach,” in CIG, 2011, pp. 266–272.
[17] K. Chen, H. Pao, and H. Chang, “Game Bot Identification based on Manifold Learning,” in
Proceedings of ACM NetGames 2008, 2008.
[18] Pao, H. K., Chen, K. T., & Chang, H. C. (2010). Game Bot Detection via Avatar Trajectory
Analysis. Computational Intelligence and AI in Games IEEE Transactions on, 2(3), 162–
175. IEEE.
[19] Unity3d - game engine. [Online]. Available: http://www.unity3d.com/
[20] Artificial Aiming. [Online]. Available: http://www.artificialaiming.net/
104
[21] Weka 3: Data Mining Software in Java. [Online]. Available:
http://www.cs.waikato.ac.nz/ml/weka/
[22] J. Platt, “Advances in kernel methods,” B. Sch¨olkopf, C. J. C. Burges, and A. J. Smola,
Eds. Cambridge, MA, USA: MIT Press, 1999, ch. Fast training of support vector
machines using sequential minimal optimization, pp. 185–208.
[23] M. Rychetsky, Algorithms and Architectures for Machine Learning Based on Regularized
Neural Networks and Support Vector Approaches. Germany: Shaker Verlag GmbH, Dec.
2001.
[24] R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model
selection,” in Proceedings of the 14
th
international joint conference on Artificial
intelligence – Volume 2, ser. IJCAI’95. San Francisco, CA, USA: Morgan Kaufmann
Publishers Inc., 1995, pp. 1137–1143.
[25] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification
using support vector machines,” Mach. Learn., vol. 46, no. 1-3, pp. 389–422, mar 2002.
[26] USC GamePipe Laboratory. [Online]. Available: http://gamepipe.usc.edu/
105
[27] Drachen, A.; Sifa, R.; Bauckhage, C.; Thurau, C., "Guns, swords and data: Clustering of
player behavior in computer games in the wild," Computational Intelligence and Games
(CIG), 2012 IEEE Conference on , vol., no., pp.163,170, 11-14 Sept. 2012
[28] T. Warren Liao, “Clustering of time series data—a survey,” Pattern Recognition, Vol. 38,
No. 11. (November 2005), pp. 1857-1874.
[29] V. Chandola, A. Banerjee, and V. Kumar. 2009. Anomaly detection: A survey. ACM
Comput. Surv. 41, 3, Article 15 (July 2009), 58 pages.
[30] L. Wei, N. Kumar, V. Lolla, E. J. Keogh, S. Lonardi, and C. Ratanamahatana. 2005.
Assumption-free anomaly detection in time series. In Proceedings of the 17th
international conference on Scientific and statistical database
management (SSDBM'2005), James Frew (Ed.). Lawrence Berkeley Laboratory,
Berkeley, CA, US, 237-240.
[31] J.G. Dias and M.J. Cortinhal, The SKM algorithm: A K-means algorithm for clustering
sequential data, Advances in Artificial Intelligence IBERAMIA 2008, Lecture Notes in
Artificial Intelligence, pp. 173-182, Springer–Verlag, Berlin 2008.
106
[32] Drachen, A.; Canossa, A.; Yannakakis, G.N., "Player modeling using self-organization in
Tomb Raider: Underworld," Computational Intelligence and Games, 2009. CIG 2009.
IEEE Symposium on , vol., no., pp.1,8, 7-10 Sept. 2009.
[33] Mahlmann, T.; Drachen, A.; Togelius, J.; Canossa, A.; Yannakakis, G.N., "Predicting player
behavior in Tomb Raider: Underworld," Computational Intelligence and Games (CIG),
2010 IEEE Symposium on , vol., no., pp.178,185, 18-21 Aug. 2010.
[34] R. Sifa, A. Drachen, C Bauckhage, C. Thura, A. Canossa. “Behavior Evolution in Tomb
Raider Underworld,” In Proc. of the IEEE Conference on Computational Intelligence and
Games (CIG 2013), Niagara Falls, Canada, pages 161-168, 2013.
[35] O. Missura and T. Gartner, “Player modeling for intelligent difficulty adjustment,” In
Proceedings of the ECML–09 Workshop From Local Patterns to Global Models (LeGo–
09), J. F. Arno Knobbe, Ed., Bled, Slovenia, September 2009.
[36] D. Buckley, K. Chen, J. Knowles. “Predicting Skill from Gameplay Input to a First-Person
Shooter,” In Proc. of the IEEE Conference on Computational Intelligence and Games
(CIG 2013), Niagara Falls, Canada, pages 105-112, 2013.
107
[37] A. Drachen, M. Schubert. “Spatial Game Analytics and -Visualization,” In Proc. of the
IEEE Conference on Computational Intelligence and Games (CIG 2013), Niagara Falls,
Canada, pages 169-176, 2013.
[38] Seif El-Nasr, M., Drachen, A. and Canossa, A. 2013. Game Analytics – Maximizing the
Value of Player Data. Springer.
[39] Arxan Game Security Solutions. [Online]. Available:
http://www.arxan.com/solutions/gaming-protection/
[40] E. Kaiser, W. Feng, and T. Schluessler, “Fides: Remote anomaly-based cheat detection
using client emulation,” Proceedings of the 16th ACM Conference on Computer and
Communications Security, pp. 269-279, 2009.
[41] K. Jones and R. S. Sielken, Computer System Intrusion Detection: A Survey, Technical
Report, Computer University of Virginia, 2000.
[42] Sandip Sonawane, Shailendra Pardeshi and Ganesh Prasad, “A survey on intrusion detection
techniques”, World Journal of Science and Technology, 2012.
[43] Barnett V., Lewis T., Outliers in Statistical Data. John Wiley, 1994.
108
[44] A. Villanes, Analytical Approach for Bot Cheating Detection in a Massive Multiplayer
Online Racing Game. In Proc. SESUG 2013.
[45] Efron, B. & Tibshirani, R. J. An Introduction to the Bootstrap (Chapman & Hall, 1993).
[46] MATLAB® [Online], http://www.mathworks.com/
[47] Multivariate normal probability density function, MATLAB® [Online],
http://www.mathworks.com/help/stats/mvnpdf.html
[48] Cox D.R., Hinkley D.V. Theoretical Statistics, (Chapman & Hall, 1974).
[49] Punkbuster Service Goes Down, Hundreds of Online Games Offline [Online]
http://hothardware.com/News/Punkbuster-Service-Goes-Down-Hundreds-of-Online-
Games-Offline/
[50] FairFight® Anti-Cheating System [Online], http://gameblocks.com/
Abstract (if available)
Abstract
The online games industry has grown rapidly over the last decade. As a result of this rapid growth, many techniques have been created in response to the process of game development. One of the important aspects to consider is the prevention of cheating by the games players. Cheating in online games comes with many consequences for both players and companies. Therefore, cheating detection and prevention is an important part of developing a commercial online game. Over the years, there have been many anti‐cheating solutions that have been developed by gaming companies. However, many companies use cheating detection measures that may involve breaches to a user’s privacy. ❧ In this thesis, we provide a server‐side, anti‐cheating, system‐generic method that uses only game logs. The system consists of three main parts: cheat modeling, player modeling, and the decider. The cheat‐modeling segment focuses on defining a cheating behavior using classification techniques. This is achieved by building a model for each type of cheat. When considering the opposing side, the player modeling part focuses on defining a player’s behavior using anomaly detection techniques, which is done by building a model for each player. Each part will then give a probability of detection to the decider. The decider will give an overall probability based on different criteria discussed in the document. Many researchers have done work on the analysis of online game data, however, few of them focused on the problem of cheating detection.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Crisis and stasis: understanding change in online communities
PDF
Detecting anomalies in event-based systems through static analysis
PDF
Machine learning in interacting multi-agent systems
PDF
Defending industrial control systems: an end-to-end approach for managing cyber-physical risk
PDF
Predicting and planning against real-world adversaries: an end-to-end pipeline to combat illegal wildlife poachers on a global scale
PDF
Studies into computational intelligence approaches for the identification of complex nonlinear systems
PDF
Latent space dynamics for interpretation, monitoring, and prediction in industrial systems
PDF
Event detection and recounting from large-scale consumer videos
PDF
No-regret learning and last-iterate convergence in games
PDF
Vision-based and data-driven analytical and experimental studies into condition assessment and change detection of evolving civil, mechanical and aerospace infrastructures
PDF
Object detection and recognition from 3D point clouds
PDF
AI-enabled DDoS attack detection in IoT systems
PDF
Failure prediction for rod pump artificial lift systems
PDF
Dynamic graph analytics for cyber systems security applications
PDF
Towards combating coordinated manipulation to online public opinions on social media
PDF
Tag based search and recommendation in social media
PDF
When AI helps wildlife conservation: learning adversary behavior in green security games
PDF
Learning to optimize the geometry and appearance from images
PDF
Incorporating aggregate feature statistics in structured dynamical models for human activity recognition
PDF
Topics in algorithms for new classes of non-cooperative games
Asset Metadata
Creator
Alayed, Hashem Ali
(author)
Core Title
Behavior-based approaches for detecting cheating in online games
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
04/13/2015
Defense Date
01/13/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
anomaly detection,cheating detection,classification,hybrid system,machine learning,OAI-PMH Harvest,online games
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Neuman, Clifford B. (
committee chair
), Liu, Yan (
committee member
), Nakano, Aiichiro (
committee member
), Wixon, Dennis (
committee member
), Zyda, Michael (
committee member
)
Creator Email
alayed@usc.edu,hashem82@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-545519
Unique identifier
UC11297866
Identifier
etd-AlayedHash-3275.pdf (filename),usctheses-c3-545519 (legacy record id)
Legacy Identifier
etd-AlayedHash-3275.pdf
Dmrecord
545519
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Alayed, Hashem Ali
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
anomaly detection
cheating detection
hybrid system
machine learning
online games