Transcript
UNIVERSITY OF MINNESOTA
This is to certify that I have examined this copy of a master’s thesis by
Kristy Sue VanHornweder
and have found that it is complete and satisfactory in all respects, and that any and all revisions required by the final examining committee have been made.
Timothy R. Colburn ____________________________________________ Name of Faculty Advisor
____________________________________________ Signature of Faculty Advisor
____________________________________________ Date
Applying the Partial Order Bounding Method of Game Tree Search to Connect 4
A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA
BY
Kristy Sue VanHornweder
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE
Department of Computer Science University of Minnesota Duluth
June 2002
© Kristy Sue VanHornweder 2002
Abstract Game playing is of interest to artificial intelligence researchers because of the complexity involved in its exponential search. Standard exhaustive search algorithms available for graphs can be used for only the most trivial of games. The methods for dealing with this complexity are the use of heuristic evaluation functions and game tree pruning. Game tree pruning starts with the fundamental two-person game-playing algorithm called minimax and then applies pruning methods, such as Alpha-Beta pruning, to the resulting game tree. Researchers have devised new methods to enhance the minimax search algorithm. Two current approaches have emerged called Partial Order Bounding (POB), based on the theory of partially ordered sets and null window search, and game decomposition. POB compares partially ordered values in the leaves of a game tree, and backs up boolean values through the tree to the root. The advantage of POB compared to traditional weighted sum evaluation methods is that it avoids the problem of dealing with incomparable evaluations. So far, POB has only been applied, with some success, to the game of Go. In order to determine the general applicability of POB, this thesis tests methods for applying POB to the game Connect 4. The issues involved in partial order evaluation and partial order bounding in Connect 4 are discussed, and the performance of a Connect 4 game-playing program using POB is compared to one using traditional methods. Initial ideas on how to apply game decomposition to Connect 4 are also presented.
i.
Contents 1. Motivation for Game Playing Research………………………………..1 2. The Role of Evaluation Functions in Game Playing………………….. 2 2.1. Limitations of Weighted Sum Evaluation………………………. 2 2.2. Motivation for This Research…………………………………… 4 3. The Method of Partial Order Bounding………………………………. 5 3.1. Basic Definition of Partial Order Bounding…………………….. 5 3.2. Partial Order Evaluation: Vector Dominance……………………8 3.3. Partial Order Bounding Algorithm……………………………… 8 3.4. Efficiency of Partial Order Bounding: Null Window Search….. 10 4. The Application of Partial Order Bounding to the Game Connect 4.12 4.1. Weighted Linear Sum Evaluation for Connect 4……………….12 4.2. Static Evaluation in Connect 4………………………………….14 4.3. Partial Order Evaluation for Connect 4………………………... 16 4.4. Move Ordering and Pruning in Connect 4……………………...21 4.5. Initial Experiments for Connect 4………………………………27 4.6. Searching for Bounds…………………………………………...29 4.7. Searching to a More Shallow Depth…………………………... 41 4.8. Threshold Pruning…………………………………………….. 42 4.9. Summary of Results…………………………………………… 45 4.10. Observations and Conclusions……………………………….. 47 4.11. Future Work………………………………………………….. 48 5. The Method of Decomposition Search and its Application to the Game Connect 4……………………………………………………….. 50 5.1. Partitioning a Game……………………………………………. 50 5.2. Local Search…………………………………………………… 51 5.3. Decomposition Search Algorithm……………………………... 51 5.4. The Application of Decomposition Search to Connect 4……… 51 6. Related Work………………………………………………………….. 54 Appendix: Rules of Connect 4…………………………………………... 55 References………………………………………………………………....57 ii.
List of Figures Figure 1. Example of a partial order lattice………………………………….6 Figure 2. Example of an antichain………………………………………….. 6 Figure 3. Example of success and failure set for the antichain {E,G}…...….7 Figure 4. Partial order lattice for the game of Go…………………………... 8 Figure 5. Example game tree illustrating the POB search algorithm………..9 Figure 6. Example of the efficiency of null window search……………… 11 Figure 7. Connect 4 position illustrating block of four containing B1……. 13 Figure 8. Connect 4 position illustrating the concept of gravity…………...13 Figure 9. Connect 4 position illustrating the concept of gravity…………...14 Figure 10. Connect 4 position illustrating static evaluation………………..15 Figure 11. Connect 4 states illustrating the partial order evaluation……….19 Figure 12. Partial order lattice for Connect 4………………………………20 Figure 13. Connect 4 position illustrating 3 different types of moves……..22 Figure 14. Move ordering for Connect 4…………………………………...24 Figure 15. Move ordering and pruning for Connect 4 position in Figure 13……………………………………………………. 25 Figure 16. Process of relaxing bounds…………………………………….. 33 Figure 17. Process of tightening bounds…………………………………... 34 Figure 18. Effectiveness of tightening bounds……………………………..34 Figure 19. Further enhancements for pruning……………………………...37 Figure 20. Further enhancements for non-pruning…………………………38 Figure 21. Further enhancements for non-pruning…………………………38 Figure 22. Different choice of moves in Stage 1 and Stage 2 for Threshold Pruning…………..……………………………... 44 Figure 23. The partition of a 3 pile game of NIM………………………….50 Figure 24. Partitioning a Connect 4 position……………………………….52 Figure 25. The game Connect 4…………………………………………… 55
iii.
List of Tables Table 1. Results of initial tests for Connect 4……………………………... 28 Table 2. Results of naïve approach for pruning…………………………… 31 Table 3. Results of naïve approach for non-pruning……………………… 32 Table 4. Results of reusing bounds (Stage 1) for pruning………………….35 Table 5. Results of reusing bounds (Stage 1) for non-pruning……………. 36 Table 6. Results for further enhancements for pruning (Stage 2)…………. 39 Table 7. Results for further enhancements for non-pruning (Stage 2)……..40 Table 8. Node counts for ideal case for non-pruning………………………41 Table 9. Results of searching to depth 7 for non-pruning………………….42 Table 10. Results of Threshold Pruning method…………………..……….44 Table 11. Summary of total and average node counts for pruning………... 45 Table 12. Summary of total and average node counts for non-pruning…... 45 Table 13. Summary of total and average node counts for Threshold Pruning……………………………………………….46
iv.
Acknowledgements There are some acknowledgements I would like to make of people who have contributed to the success of this thesis. I would like to thank Dr. Tim Colburn, my thesis advisor, for giving me guidance throughout this project and for offering me encouragement and reassurance during the difficult times. I would like to thank Dr. Ted Pedersen and Dr. Robert McFarland for taking the time to carefully read a preliminary version of this thesis and to offer their suggestions for improvement. I am grateful to the UMD Department of Computer Science for giving me the opportunity to perform graduate research under them. Lastly, I would like to thank Dr. Martin Mueller, whose work I have extended, for providing a reference to his research papers so that I would have a foundation for my own research.
v.
1 Motivation for Game Playing Research For decades, game-playing researchers have attempted to devise game searching methods that are both intelligent and efficient. In general, there is a trade-off between intelligence of play and efficiency. Traditional methods such as minimax with Alpha-Beta Pruning, described in textbooks such as [8], have been able to improve efficiency by pruning unimportant portions of the game search tree, thus allowing the search to be taken deeper into the tree for the same cost.
In theory, deeper search should increase the
intelligence of play, as the search is able to examine game positions that are closer to the end of the game.
1
2 The Role of Evaluation Functions in Game Playing The major concern for building an intelligent game-playing program is the evaluation of a game position. A player will determine "how good" a game position is by applying an evaluation function to that position. In standard Alpha-Beta search, a common evaluation function used is a weighted linear sum, w1*f1 + w2*f2 + ... + wn*fn, where f1, f2, ..., fn are game features and w1, w2, ..., wn are weights associated with the features. The features are characteristics of a game, for example, in chess, the number of rooks on the board, the number of knights, etc. The features should be chosen so they represent the critical characteristics of a game, the ones that have a strong influence on a game's outcome. The weights chosen should be appropriate for the features. Features that are considered more important tend to have higher weights, whereas less significant features will in general have lower weights.
2.1 Limitations of Weighted Sum Evaluation The traditional method of Alpha-Beta search and the typical application of the weighted sum evaluation have proven to be successful in many cases. However, the weighted sum evaluation has several shortcomings.
The
problems with evaluation functions discussed here are loss of information, unstable positions, long-term vs. short-term plans, and near-terminal states.
2
2.1.1 Loss of Information In a standard weighted sum evaluation, there are several features that are added together and compared to one another. However, it does not always make sense to compare and add certain features, for example, when the relative importance of features changes during the course of the game, such as the case of an unstable position, discussed below. This combination approach results in a loss of information. Because of this, the estimate of the value of a game position may no longer be accurate.
2.1.2 Unstable Positions Another issue of evaluation is that of unstable positions.
An unstable
position is one that is likely to dramatically change very quickly. Such a position may appear favorable at one point, but only a move or two later, it could prove to be disastrous for a player. A position such as this and a more stable one that does not change so drastically could end up with the same evaluation.
2.1.3 Short-Term Plans vs. Long-Term Plans Another important consideration of evaluation is the need to maintain a balance between short-term plans and long-term plans. Sometimes a move may look good immediately, that is, there will be a considerable change in the outlook of a position if this type of move is chosen. In other cases, one move may have a lower evaluation value than another, but it may prove to
3
be a successful choice in the long run. A player will need to keep a balance between the status of the game immediately and the status of the game further into the future, i.e. tactics vs. strategy.
2.1.4 Near-Terminal Positions One last concern of evaluation is the case of near-terminal positions. In many such cases, a position can be evaluated statically rather than having to perform a search to determine a move. If a static analysis of a game position finds that a certain move will result in a sure win, the player certainly does not need to perform a search.
2.2 Motivation for this Research In an attempt to avoid these problems with weighted sum evaluation functions, a game search method called Partial Order Bounding (POB) has recently been developed by Martin Mueller[4]. This method has proven to be successful in applications of Semeai Races in the game of Go. To determine the general applicability of POB, this thesis develops methods for applying POB to the game Connect 4. To determine the success of the application of POB to Connect 4, the performance of a Connect 4 playing program using POB is compared with a Connect 4 playing program using a traditional Alpha-Beta search.
4
3 The Method of Partial Order Bounding Partial Order Bounding avoids the collapsing of several features into one number by only comparing features and game positions when it makes sense for them to be compared. The main goal of POB is to define an inequality between a fixed bound and a game position's evaluation.
3.1 Basic Definition of Partial Order Bounding Definition A partially ordered set (poset) P is a set with a binary relation ≥, with the three properties reflexive, anti-symmetric, and transitive. Two elements, x and y, of P, are considered comparable if the binary relation between them holds: that is, x ≥ y or y ≥ x. Otherwise, x and y are considered incomparable. A binary relation having the property of antisymmetric means the following: if x ≥ y and y ≥ x, then x=y. The definitions of the properties reflexive and transitive are straight-forward and are given in textbooks such as [7]. A poset is represented by a lattice diagram, as in Figure 1. To say that two elements are comparable means that there is a strictly downward leading path from one element to the other. In Figure 1, the elements I and D are comparable since there is a strictly downward leading path from I to D. The elements G and F, however, are incomparable since it is not possible to get from one to the other without heading both upward and downward.
5
Figure 1. Example of a partial order lattice (Taken from [4])
Definition An antichain B is a subset of P in which every element is incomparable to every other element. A subset containing only one element is an antichain. A bound is an antichain of a poset that will represent the minimal acceptance level for a player. The player will want evaluations that are at least as good as some element in the bound.
Figure 2. Example of an antichain {E,G} (Taken from [4])
6
Definition The success set of the antichain B, S(B), is the set of all elements in P which are ≥ to some element in B. All other elements in P are in the failure set of B.
Figure 3. Example of success set and failure set for the antichain {E,G} (Taken from [4])
The success set will represent all evaluations that are acceptable with respect to the bound. These are all of the evaluations that are considered "good enough" for the player. As an example of a poset in the context of game-playing, consider Figure 4. This figure contains the lattice used by Mueller[4] for the game of Go. Each element in the poset is a possible evaluation for a game position. Each evaluation is a vector containing two Go-specific game features. The elements (9,0) and (2,5) represent the bound chosen for the Semeai problems. The shaded region consists of all of the elements that are in the success set of the chosen bound. All other elements are in the failure set of the bound. 7
Figure 4. Partial order lattice for the game of Go (Taken from [4])
3.2 Partial Order Evaluation: Vector Dominance There are several different types of partial order evaluations. The type that is applied to Connect 4 in this thesis is called vector dominance. The definition is as follows: Given two m-dimensional vectors x = {x1, …, xm} and y = {y1, …, ym}, x ≥ y iff xi >= yi, for i = 1, …, m. In words, this states that if the relation holds between each corresponding element of both vectors, then the relation holds between the vectors themselves, and vice versa.
3.3 Partial Order Bounding Algorithm An attractive aspect of Partial Order Bounding is that it can be built on top
8
of an ordinary game search method, such as Alpha-Beta.
The POB
algorithm is as follows: Given a poset P, a bound B, and a minimax search method M: 1. The leaf nodes of the game tree are evaluated using the partial order evaluation, and the result is an element in P. 2. The evaluation is considered success (1) if it is in S(B), otherwise it is considered failure (0). 3. The boolean values of the leaf nodes are backed up the search tree using minimax method M. As an example, consider the game tree in Figure 5. The levels that contain
Figure 5. Example game tree illustrating the POB search algorithm (Taken from [4]) (Note that a “+” is used for success and a “-“ is used for failure)
9
square nodes denote that it is MAX's turn, and the levels of the tree that contain circular nodes denote that it is MIN's turn. The evaluations at the leaf nodes were obtained using an unspecified partial order evaluation. Consider the bounds B1={(5,7), (10,3)} and B2={(5,8), (6,4)}. Using vector dominance, the boolean success values are obtained for each leaf node for each bound. These success values are backed up the tree in minimax fashion. Thus the boolean value at node Q is the minimum of the boolean values at the two leaf nodes below Q. The boolean value at P is the maximum of the boolean values at nodes Q and R. There is a total ordering of success > failure. Based on this evaluation, the player is able to achieve B1 but fails to achieve B2 at node P.
3.4 Efficiency of Partial Order Bounding: Null Window Search The method of Partial Order Bounding is based on the notion of null window search[6]. In standard Alpha-Beta, the window of possible state evaluations is [-∞, ∞]. However, in POB, the window is [0,1] since the only possible evaluations for a game position are success (1) and failure (0). The window is as small as possible since no value can be inserted in between the upper and lower limits of the window; thus POB uses a null window. In theory, this explains why POB should be more efficient than standard Alpha-Beta search. As an illustration of POB's efficiency, consider Figure 6. Suppose at the top node it is MAX's turn. The nine nodes branching from the top are the
10
successors of the top node. The first four have been evaluated as failure, while the fifth successor evaluates as a success. Because of the null window, the highest possible evaluation in a game tree using POB is 1.
Figure 6. Example of the efficiency of null window search
Thus, it is not possible to find a higher value using POB as it could be using Alpha-Beta. Therefore, the rest of the successor branches can be pruned away. This results in many more branch cutoffs, and many fewer node expansions, making POB more efficient.
11
4 The Application of Partial Order Bounding to the Game Connect 4 Before discussing the application of partial order evaluation to Connect 4, general evaluation methods for Connect 4 will be presented. Readers not familiar with the game Connect 4 are referred to the Appendix for a description of the game.
4.1 Weighted Linear Sum Evaluation for Connect 4 The evaluation function applied to Connect 4 for Alpha-Beta search is a standard weighted sum. The function is defined as w1*B3 + w2*B2 + w3*B1 - (w1*R3 + w2*R2 + w3*R1), where B3, B2, B1, R3, R2, and R1 are features with B denoting black and R denoting red. The feature Bn is the number of blocks of four consecutive board squares that have n black pieces and no red pieces. The feature Rn is defined in a similar way. Every possible consecutive four squares on the board is considered for calculating these feature values. For example, consider the highlighted region in Figure 7. This area denotes one block of four, and it is considered a B1 since there is one black piece on the bottom, and no red pieces in the other three squares. There is a restriction on some of the features, however, and that is the element of gravity. For the features B2, B1, R2, and R1, a block of four containing one of these is only counted if the block of four can be affected one move later. Consider the highlighted region in Figure 8. Suppose the columns are numbered left to right as 0, …, 6. This block of four is
12
Figure 7. Connect 4 position illustrating block of four containing B1
considered to not contain an R2 since a move in either of columns 0 or 2 would result in the piece dropping below the block of four.
Figure 8. Connect 4 position illustrating the concept of gravity
This region is not as threatening, since it cannot be affected immediately,
13
only some time into the future. However, consider the region indicated in Figure 9. This block can be affected in the next move by a player dropping a piece down column 5. Either black will move there and promote it to a B2, or red will move there and block black from connecting four within that block. This region is more meaningful since its status can immediately be altered. For the features B3 and R3, however, the element of gravity is not taken into consideration. Even if the block cannot be affected in the next move, the block poses much more of a threat since it is close to a win for one of the players.
Figure 9. Connect 4 position illustrating the concept of gravity
4.2 Static Evaluation in Connect 4 It was mentioned previously in Section 2.1.4 that in some cases, a game position could be evaluated statically without search. This section will
14
discuss the application of static evaluation to non-terminal and terminal Connect 4 positions.
4.2.1 Static Evaluation of Non-terminal Positions In the Connect 4 position in Figure 10, the second column from the left contains three consecutive red pieces and a blank square above them. If it is black's turn in this position, then black had better perform a move in column 1 to block red from moving there and winning. Of course, if it is red's turn then red wins.
Figure 10. Connect 4 position illustrating static evaluation
The non-terminal positions that can be evaluated statically are the positions in which the player needs to block the opponent to avoid a loss, the case of a sure win for the player where the player can simply move and win, and the case of a sure loss-no matter what the player does, the opponent can win on
15
the next move. In terms of Partial Order Bounding and evaluation, if these positions are encountered in the leaf nodes of the tree, their success values are as follows: A position that indicates a sure win is considered a success. A position where there is a sure win for the opponent and it is the opponent's turn from that position is evaluated as failure. The case where a player needs to block the opponent evaluates to failure. The reason for this is that such a position is unfavorable for the player, since he/she must defend him/herself, rather than try to advance his/her position. Note that a leaf node is not necessarily a terminal position. It could be a non-terminal position occurring at the ply that is considered the depth cutoff in the search.
4.2.2 Static Evaluation of Terminal Positions Another class of game positions that can be evaluated statically are terminal positions. As is expected, a position that is a win for the player has the evaluation success, whereas a position that is a loss for the player has the evaluation failure. The case of a draw could be considered either way, but since this type of position is encountered so late in the game, it is not likely that its evaluation makes a difference in the play of the game. In this thesis, a draw position will be considered a failure.
4.3 Partial Order Evaluation for Connect 4 The partial order evaluation that is applied to Connect 4 uses the same six features that are used in the Alpha-Beta evaluation. One difference from
16
Alpha-Beta is that the values are not collapsed into one value. The status of each player is kept separately. The evaluation vector contains two features, one for black's status, and the other for red's status. The basic idea behind the partial order evaluation is to consider how much a player can improve his/her position and how much the player can damage his/her opponent's position. Another difference from the Alpha-Beta evaluation is that the partial order evaluation considers how much the status for both players has changed since the state being evaluated. The leaf nodes in the game tree are compared to the current root node of the tree and the result represents the progress made from the current root state to the leaf state. The evaluation takes into account how much a player's position might have improved or worsened.
4.3.1 Implementation of the Partial Order Evaluation for Connect 4 This section describes how the partial order evaluation works. The weighted sum is divided into two parts: w1*B3+w2*B2+w3*B1 to represent black's status for a state, and w1*R3+w2*R2+w3*R1 to represent red's status. The black and red status values are calculated for the leaf state under consideration. The same is done for the root state of the tree. It is assumed that black is the player making the decision at the root. To obtain the value for black's progress, black's root status is subtracted from black's leaf status. Red's leaf status is subtracted from red's root status to obtain red's progress value.
17
The interpretation of the feature vector is as follows: A positive progress value for either player is favorable for black and a negative progress value is unfavorable for black. If black's progress value is positive, it means that black's leaf status is greater than black's root status; this implies that black's position has improved. A positive value for red's progress indicates that red's position has worsened, which is favorable for black. Similarly, a negative value for black denotes that black's position worsened, and a negative value for red means that red's position has improved. If a progress value is zero, then the player's status has not changed; it is the same for both the root and leaf nodes. Following are a few examples of partial order evaluations and their interpretations. The evaluation (0,0) indicates that there is no change for either of the two players. An evaluation of (1,0) means that black's position improves by 1 and red's position does not change. An evaluation value of (2,1) means that black improves by 2 and red's position worsens by 1. The evaluation (1,-1) indicates that black improves by 1 and red also improves by 1. The evaluation (-1,1) means that black worsens by 1 and red worsens by 1.
4.3.2 Applying the Partial Order Evaluation to a Connect 4 Position As an example of applying this evaluation in game tree search, consider the game states in Figure 11. The lines through the squares represent the potential ways that one or the other of the players can win. Suppose the state on
18
Figure 11. Connect 4 states illustrating the partial order evaluation
the left is the root of the game tree and the state on the right is a leaf node. The individual feature values for the root state are B3=1, B2=0, B1=2, R3=1, R2=1, R1=1. The values of the features for the leaf state are B3=1, B2=1, B1=2, R3=1, R2=1, R1=1. The difference of the two states is that there is now a B2 in column 6 of the second state. Note that column 6 now contributes a one to both B1 and B2. At this point, the progress values for black and red are calculated. Suppose the weights for B3/R3, B2/R2, B1/R1 are 3, 2, 1, respectively. For black's value: black's leaf status-black's root status = ((3*1+2*1+1*2)-(3*1+2*0+1*2)) = 2. For red's value: red's root status-red's leaf status = ((3*1+2*1+1*1)-(3*1+2*1+1*1)) = 0. Thus the evaluation for the leaf node in this example is (2,0). The interpretation of this is that black improves by 2 (gains a B2) and red's status does not change. In terms of Partial Order Bounding, a bound in this context will represent
19
how much a player wants to improve his/her position and how much the player wants to damage the opponent's position. The bound indicates a subgoal in the game; a player will want to choose a move which leads to a position whose evaluation is at least as good as the bound.
4.3.3 Partial Order Lattice for Connect 4 The lattice for the Connect 4 partial order evaluation is illustrated in Figure 12. The feature vectors comprising this lattice represent a portion of the full
Figure 12. Partial Order Lattice for Connect 4
lattice; this is simply the "core" of the full lattice. The evaluations with
20
higher positive values appear higher in the lattice, whereas evaluations with greater negative values appear lower in the lattice. The structure of the lattice indicates the incomparability of some evaluations. For example, consider the evaluations (0,1) and (1,0). The first of these evaluations means red worsens by 1 and the second means that black improves by 1. Intuitively, these positions seem incomparable since it is not clear which of the two is better. On the one hand, a player can improve his/her position a little bit; on the other hand, a player can block the opponent from doing so. It is difficult to compare these two choices. The lattice also shows the possible static evaluations for a state. The cases of a win and a sure win are obviously the most favorable positions, so they appear at the top of the lattice. Similarly, the need to block the opponent, and the cases of sure loss and loss are the worst possible positions for a player, so they appear at the bottom of the lattice.
4.4 Move Ordering and Pruning in Connect 4 One useful application of partial order evaluation in this research is move ordering and pruning. If static analysis of a position can immediately reveal bad moves, then search can prune away these moves and not even consider them in the search. Pruning allows for the potential to reduce the search space.
21
4.4.1 Three Classes of Moves for Connect 4 In Connect 4, there are three different classes of moves that a game position could contain. This section discusses these different types of moves in detail.
4.4.1.1 Moves Leading Directly to a Loss One type of move is a move that leads directly to a loss. If a player makes this move, the opponent will be able to win from the resulting position. In the position in Figure 13, if it is black's turn and black plays in column 1, red will be able to place a piece on top of black and make a diagonal connection of four with the existing three red pieces. A player will want to prune this type of move away, unless there is no other choice.
Figure 13. Connect 4 position illustrating 3 different types of moves
22
4.4.1.2 Moves That Affect the Status of Either Player Another type of move is one in which the status of either or both players will be affected. This is the case where the status of at least one of the players will change after the move is performed. These are the potentially "good" moves for either player. Moves of this type will have the highest priority since they make the most difference in a game position. Consider the position in Figure 13. If one of these lines passes through a square, then a move in that square will change the status of the position. The moves in columns 0, 2, and 6 fall into the category of "potential good moves" since a line passes through the squares for the moves, thus affecting the status of one of the players. If it is black's turn and black moves in column 0, then red will be blocked in the diagonal leading in the upward right direction. If black plays in column 6, then black will be promoted to a B2.
4.4.1.3 Moves That Affect Neither Player The third type of move is a move that results in no change in the status for either of the players. Since neither player is affected, the moves in this class have a lower priority than the potential good moves. However, these moves can be used as "escape routes". If there are no good moves available, that is, the only possible moves are moves which do not affect either player and moves which directly lead to a loss, then the player can use one of these "escape routes" to avoid losing to the opponent. Thus, these moves function as a last resort when no good moves are available. In Figure 13, the top
23
square in column 5 does not contain a line passing through it. This means that neither player is affected by a move in this column, therefore the move is considered an "escape route".
4.4.2 Partial Move Ordering for Connect 4 From these three types of moves, a partial move ordering can be defined, which is given in Figure 14. The moves with first priority are the potential good moves, where either of the player's status is affected by the move. The moves with next highest priority are the "escape routes" which result in no change for either player, but can be used as a last resort to escape a loss. The moves with the lowest priority are the moves leading directly to a loss (hence "doom"). These moves will not be chosen unless there is no other choice and the player is forced to lose.
Figure 14. Move ordering for Connect 4
24
The partial move ordering for the position in Figure 13 is shown in Figure 15. The moves for black in the columns 0, 2, and 6 are considered potential good moves, column 5 can be used as an escape route, and column 1 is a move of doom. Since there are good moves available, the escape route (5) and the doomed move (1) can be pruned away. The only moves that will be considered are 0, 2, and 6.
Figure 15. Move ordering and pruning for Connect 4 position in Figure 13
The concept of move ordering and pruning resembles the way a human player would analyze the position. He/she would consider all of the possible moves and try to determine which one may be the best choice. A human could easily observe that a move like column 1 in Figure 13 is disastrous, and that a move in column 5 is useless. It might be more difficult to determine which of 0, 2, or 6 is the best, so he/she would have to think harder and try to look further ahead in the game.
25
4.4.3 Further Pruning of Moves Affecting Either Player The above discussion leads to further move pruning that was implemented in this research. The potential good moves left after pruning away the bad or useless moves are examined more closely. The goal is to find a move among these that is optimal. Each of these remaining moves is evaluated using the partial order evaluation. Any moves that are strictly worse than any other move are pruned away. If after this pruning process only one move remains, then that will be the move chosen. Thus the position is evaluated statically. For example, suppose the evaluations of the good moves are (1,0), (2,0), (4,1), (6,1), (4,0), (2,0), and (2,0). It is clear that the best of these evaluations is (6,1), and so the move that will be chosen is the move with this evaluation. If after pruning away suboptimal moves two or more remain, then a search will need to be performed. Suppose the remaining evaluations are (2,1), (0,2), and (3,0). All of these are incomparable to each other, hence it is not clear which one is the best. In this situation, search is required. This further move pruning is an example of the need for balance between short-term and long-term plans. The algorithm attempts to find a move that will make the most difference immediately. If a lot can be accomplished in one move, then that move will be favored. If there are several moves that do not reveal a clear winner, then a long-term plan will be chosen. A look further into the game may reveal an optimal move. This strategy also resem-
26
bles the way a human would play. If he/she can see that one move will strongly advance his/her position or block the opponent in several ways, then that move will be chosen. If there are several moves where one appears as good as any other, then the human would have to think further ahead in the game.
4.5 Initial Experiments for Connect 4 For all of the experiments performed, the Connect 4 playing program was pitted against itself: one player using standard Alpha-Beta search, and the other using Partial Order Bounding. The bound chosen for POB contains one element from the poset.
4.5.1 Evaluation Parameters There are several parameters which affect the performance of POB: weights of the game features in the evaluation, depth cutoff in the search tree, and whether or not further enhancements to move pruning were used. Several tests were performed by adjusting parameters. The goal is to find a successful combination of parameters that results in a win for POB. The performance metrics used to determine the success of POB are the outcome of the game and the number of nodes examined during search. In Table 1, the results of these initial tests are shown. There are parameter combinations that clearly lead to a loss for POB. When the weight on B1
27
Table 1. Results of initial tests for Connect 4
and R1 is 0, POB performs poorly. Although it seems that B1 and R1 are rather insignificant, the two features do have an impact on intelligence of play. The feature weight combination of 8, 4, 1 leads to failure for POB. Another case with unsuccessful results is when the feature weights are 1, 1, 1 and no further move pruning is performed. In two instances, a win was found for POB. These initial tests determined which parameter combinations seem promising, and the focus of the research is concentrated on these areas, in particular, the two cases of a win for POB.
28
4.6 Searching for Bounds An important component in the implementation of POB is the method to determine the bounds to use for the search. A desirable bound is one in which a move is evaluated as a success when the search is performed with that bound.
When a bound finds a move to be success, the search termi-
nates and that move is chosen. The challenge is to find a bound that will find a good enough move and do so efficiently. In Mueller's experiments with Go[4], it is almost always the case that a bound can be determined by a static analysis of the game position. However, in games such as Connect 4 that use a heuristic evaluation, it cannot be easily determined which bounds to use during search. Several may need to be tested in order to find one that is able to find a good move. This paper will now discuss approaches to finding suitable bounds for Connect 4. A window of bounds can be defined which will represent all of the possible bounds to be considered during search. For example, in the case of a bound containing one element, the window could be defined as (0,0) to (-9,-9). These are the only bounds that the search will consider.
4.6.1 Searching for Bounds: Naive Approach A naive approach to finding a bound is as follows: Start by choosing the bound (0,0), and perform search using this bound. If a success move is
29
found, then search will terminate, and the success move will be chosen. If all moves evaluate as failure, then an iteration process will begin. The bound is relaxed by one (to (-1,-1)) and search is performed using the refined bound. The process is repeated until there is a move evaluated as a success. If the search is taken to the bottom of the bound window, then the iteration terminates and the first available move is chosen. This approach is rather naive since the bound starts over at (0,0) for each move, thus performing an excessive number of iterations. This results in inefficient search and relatively high node counts as compared with Alpha-Beta. The bound searching method is applied to the two cases where POB wins the game. In one instance, further move pruning is used, and in the other case, no further move pruning is performed.
4.6.1.1 Results for Naive Approach The results of the experiments are given in Tables 2 (pruning) and 3 (nonpruning). The first column in the table shows the number of nodes examined for each POB move; the third column shows the same thing for Alpha-Beta. These represent the alternating moves by POB and Alpha-Beta until the end of the game at the bottom of the table. The word “Static” indicates that no search was performed for that move.
The word “Static” followed by
“(prune)” means that a move was determined by the further move pruning. The middle column indicates the number of iterations taken to determine the bound. As a result of several iterations for many moves, the node counts are
30
relatively high compared to Alpha-Beta.
Table 2. Results of naive approach for pruning
4.6.2 Searching for Bounds: More Informed Approach Rather than starting search at the top of the bound window for each move, a more informed method was applied, where bounds from previous moves are reused. The player will use the bound from the previous turn and refine the bounds throughout the game. The player performs search with the bound from the previous turn, and the success/failure values of the moves are calculated. From these, it is determined whether the bound should be relaxed or tightened. The higher a bound appears in the lattice, the more selective it
31
Table 3. Results of naive approach for non-pruning
is, and the more evaluations that will result in failure. A bound appearing low in the lattice has the opposite effect; many evaluations will result in success. Relaxing a bound will cause it to appear lower in the lattice, and tightening a bound will cause it to appear higher in the lattice. The approach of refining bounds emulates the way a human would play a game. A bound can represent a subgoal at some point in the game. A player will attempt to choose moves which lead to positions at least as good as the bound. A bound can be thought of as a player's aspirations. If they are too high, no good move will be found; the bound is too selective for the avail-
32
able positions. This results in a need for the player to lower his/her expectations. On the other hand, if a player observes that a position can easily be achieved, he/she should strive to find something better.
4.6.2.1 Relaxing Bounds In the case where a bound must be relaxed, it indicates that the player's aspirations are too high, the bound was unable to find a successful move. If all of the moves evaluate as failure, then an iteration process begins, where bounds are relaxed each time until a successful move is found. In Figure 16, using the bound (0,0) it is unable to find a move that is evaluated as a success. The bounds are relaxed for each iteration, and in the fourth iteration, the bound (-3,-3) finally finds a success move. The player then chooses this move.
Figure 16. Process of relaxing bounds
4.6.2.2 Tightening Bounds In the case where a bound needs to be tightened, it indicates that the player's aspirations may be too low. If using the bound yields a success move in the
33
first iteration, the player will strive for reaching a position that is more favorable. In this iteration process, the bounds are tightened for each iteration, in hopes of finding a better move. The process is repeated until all moves result in failure. At this point, the player's aspirations have gone too high. The player then chooses the success move found by the highest bound that found a success (the one from before the iteration which evaluated all the moves as failure). In Figure 17, the starting bound is (-5,-5). This results in a success move being found immediately, thus the bounds are
Figure 17. Process of tightening bounds
tightened in an attempt to find a better move. In this case, a better move is not found before (-3,-3) found all failure. The move that is chosen is the first move, which is found by the bound (-4,-4). The example in Figure 18 shows that, indeed, the implementation of tightening bounds is effective in many cases. The bound (-7,-7) is able to find a
Figure 18. Effectiveness of tightening bounds
34
better move than the bound (-8,-8). The first move was good enough for (-8,-8), but not for (-7,-7). The bound (-7,-7) rejected two moves before finding a success in the third move. As bounds are tightened, the search is more selective and it is possible to find better moves for the player.
4.6.2.3 Results of More Informed Approach The results of the implementation of reusing previous bounds are shown in Tables 4 (pruning) and 5 (non-pruning). In the case of pruning, the node counts are significantly lower than those for the naive approach to finding
Table 4. Results of reusing bounds (Stage 1) for pruning
35
bounds. This is explained by the fact that fewer iterations are performed for most of the moves. The case of non-pruning does not reveal as impressive results, although there are a few cases (turns 1, 5, 6, 7) that result in fewer node counts. In several cases, the node counts increased. The explanation for this is that performing fewer iterations for some moves forced more iterations to be performed in other moves.
Table 5. Results of reusing bounds (Stage 1) for non-pruning
4.6.3 Searching for Bounds: Further Enhancements To further reduce the node counts, additional enhancements were made to
36
the implementation of searching for bounds. There are refinements specific to both the case of pruning and the case of non-pruning. All of them are applied to the case where bounds are tightened. These enhancements are discussed in detail, as are their results.
4.6.3.1 Enhancements for Pruning If after pruning all of the suboptimal good moves only 2 or 3 moves remain and one of them is a success, it is not likely that a better move can be found by further tightening bounds. The fact that very few moves are left makes the chances of finding better moves highly unlikely. If this case happens, the search can terminate and the success move will be chosen. This is illustrated in Figure 19. Successive refinements of bounds do not reveal a better move, thus the success move found by (-6,-6) will be sufficient.
Figure 19. Further enhancements for pruning
4.6.3.2 Enhancements for Non-Pruning It may suffice to terminate the search for bounds if a predetermined number
37
of moves are rejected before a success move is found. Consider the example in Figure 20. Suppose a player wants at least three of the moves rejected before a move is found which evaluates as success. The bound (-3,-3) achieves this; there is no need to continue searching for a higher bound to find all the moves being failure, as is the case with bound (-2,-2). Once a pattern like this occurs, search can terminate and the success move will be chosen.
Figure 20. Further enhancements for non-pruning
If a pattern of success/failure values for the moves occurs for consecutive iterations, it is much less likely that a better move will be found. Consider the example in Figure 21, and suppose the player wants at least two moves rejected before revealing a success. The bounds (-4,-4) and (-3,-3) result in
Figure 21. Further enhancements for non-pruning
the same pattern of success values for the moves, and a sufficient number of moves have been rejected; it is not necessary to continue the search with bound (-2,-2). The search can terminate with bound (-3,-3), and the third move will be chosen. This is considered "good enough" for the player.
38
4.6.3.3 Results of Further Enhancements The results of these further enhancements appear in Tables 6 (pruning) and 7 (non-pruning), under the columns labeled as POB (Stage 2). In the case of pruning, the node counts for the earlier moves have improved. There is no
Table 6. Results for further enhancements for pruning (Stage 2)
iteration column in this case because each move only needed one iteration to determine a move. The same bound could be reused throughout the entire game, and still result in the same behavior as the two previous approaches. In the case of non-pruning, the node counts for the second and third moves have decreased. This is not a significant improvement, although the few
39
places that resulted in improvement are also the cases where the node counts for Stage 1 are the highest. In Table 8, the node counts for the ideal non-pruning case are given. If the implementation is ideal, then there is no need to search to determine the bounds. If the bounds to use for each move were previously known, then only one iteration would be required for each move. The pruning case has no ideal node counts since in Stage 2 it is already possible to achieve the ideal node counts. Thus, only one iteration is required for each move.
Table 7. Results of further enhancements for non-pruning (Stage 2)
40
Table 8. Node counts for ideal case for non-pruning
4.7 Searching to a More Shallow Depth In the case of non-pruning, the initial tests performed to determine the outcome of the game only involved even depth cutoffs. As Table 1 shows, a win is obtained by POB when it searches to a depth of 8, whereas a search to a depth of 6 for POB results in a loss. It was of interest to determine whether or not POB would be successful if it performed a search only to the depth of 7. Further experimentation showed that POB was, in fact, able to win when it searched to a depth cutoff of 7. The resulting node counts for Stage 1, Stage 2, and the ideal case are shown in Table 9. In comparison with the results given in Tables 7 and 8, it is clear that POB searching to the
41
more shallow depth performs significantly more efficiently.
The node
counts are still much higher than those for Alpha-Beta, but there has been a significant improvement from the case where POB searches to a depth of 8.
Table 9. Results of searching to depth 7 for non-pruning
4.8 Threshold Pruning In this thesis, another approach to move pruning, called Threshold Pruning, was implemented that takes a different approach to pruning the moves that affect at least one of the players. The earlier method prunes away all moves that are strictly worse than any other move. With this approach, it may seem
42
that moves could be pruned away prematurely. A move may be considered inferior at the time of evaluation of the root position, but if that move is chosen, it may prove to be a good move later in the game. Threshold Pruning only prunes away the moves that are the least optimal, and still leaves moves that appear semi-optimal. The evaluations for each move are calculated as before. The average of the evaluations is then computed, which is used as a threshold evaluation. Any move whose evaluation is strictly worse than the threshold is pruned away. As an example, suppose the evaluations of the good moves are (1,0), (2,0), (4,1), (6,1), (4,0), (2,0), and (2,0). The average of these evaluations is (3,0) (any remainder is truncated) and this represents the threshold evaluation. After pruning inferior moves, the remaining moves are those whose evaluations are (4,1), (6,1), and (4,0). The earlier method pruned all moves except for the one with evaluation (6,1). In this approach, (4,1) and (4,0) are considered to be potentially good moves. In the short-term they are suboptimal, but in the long-term they may prove to lead to a favorable position for the player. There is a case where a win can be found for POB using Threshold Pruning. The depth cutoff is 4 for Alpha-Beta and 8 for POB, and the weights of the features B3/R3, B2/R2, B1/R1 are 3,2,1 respectively. The results of the implementation of the new approach to move pruning are given in Table 10. Only the results for Stage 1 and the ideal case are shown. When Stage 2 is implemented in this case, the outcome of the game is a loss for POB. The use of further bound searching enhancements results in POB choosing a 43
Table 10. Results of Threshold Pruning method
different sixth move than the case where no further enhancements are implemented. In Stage 1, two moves are rejected before POB chooses a move. However, in Stage 2, POB chooses the first available move. This is illustrated in Figure 22. The different move choice of POB in Stage 2 is enough to alter the course of the game from this point, and results in a loss for POB.
Figure 22. Different choice of moves in Stage 1 and Stage 2 for Threshold Pruning
44
This is a prime example that illustrates the tradeoff between intelligence and efficiency. Even though fewer nodes are expanded in Stage 2, the result is that POB is not selective enough in choosing a move. It is not as likely that a better move will be found by tightening bounds in the case of pruning, but there are situations where indeed that is the case.
4.9 Summary of Results The results of the experiments for searching for bounds for pruning and nonpruning are summarized in Tables 11 and 12. The column labeled “Total”
Table 11. Summary of total and average node counts for pruning
Table 12. Summary of total and average node counts for non-pruning
45
indicates the total number of nodes examined in the entire game. The right-most column indicates the average number of nodes examined per move in the game. On the left side of the table are listed the different methods that were implemented. The results for Alpha-Beta are given in the top row for comparison. A consistent pattern occurs in both the cases of pruning and non-pruning. As the method for bound search is refined, there is a significant improvement in the efficiency of search, as measured by the node counts. In the case of pruning, the node counts achieved are reasonable compared with Alpha-Beta. In the case of non-pruning, the node counts are still significantly higher than those for Alpha-Beta. Two key factors explaining this behavior are that POB searches deeper in the case of nonpruning than in the case of pruning and that the branching factor is higher since no moves are pruned. When POB searches to a depth of 7, the efficiency improves significantly from when POB searches to a depth of 8. The most dramatic improvement is in the ideal case, in which the node counts are much better as compared with those for Alpha-Beta. The results of Threshold Pruning are given in Table 13. The node counts for
Table 13. Summary of total and average node counts for Threshold Pruning (Note: the Alpha-Beta node counts result from the Threshold Pruning test, rather than the nonpruning test)
46
non-pruning are given for comparison. In both methods, POB searches to a depth of 8. Threshold Pruning is indeed able to achieve node counts that are somewhat better. The results of several experiments have shown that POB can be improved. Further enhancements may give POB the potential to achieve even better results.
4.10 Observations and Conclusions There are several observations that can be made from the results of these experiments. The first observation is that there is a trade-off between the intelligence of play and the efficiency of the search. Many times, in order to find a better move, the node count must be increased. This is evident in the fact that POB needs to search deeper in the tree in order to obtain a win. Another case where this observation is evident is the fact that in many cases several iterations must be performed in the search for bounds that sometimes produce a more favorable move. Partial Order Bounding simulates the way a human would play the game much more than Alpha-Beta search does. In the implementation of the partial order evaluation and the cases of move pruning, it is required that the game position be analyzed closely. Rather than blindly computing a number for a position, qualitative assessments are given to the position. The use of bounds also resembles human play, since they can function as subgoals throughout a game. The method for determining the bounds to use requires game specific
47
analysis, both in terms of the nature of the game, and specific game positions. Partial order evaluation and move pruning play a role in determining bounds, and the sequence of success/failure values for moves must be closely analyzed. There are many more experiments that could be conducted using POB. There are many possibilities for the various evaluation parameters and bounds. With additional experimentation, POB has the potential to be more successful. More work needs to be done in order to find the appropriate combination that leads to an efficient win for POB.
4.11 Future Work There are many promising directions for future work using POB. A major consideration is the method for determining bounds used during search. In the game of Go, Mueller[4] was able to statically analyze a game position and automatically determine the bounds without having to perform a search. It would be of interest to know if this is possible for Connect 4, or if the heuristic evaluation prohibits this possibility. Another approach to determining bounds is to implement a learning algorithm that would learn a set of bounds to use throughout a game. A schedule of bounds for each move could be stored in a database and each bound could be used directly without the need for a search. There are some variations in the parameters that could be implemented.
48
Rather than using a bound that contains only one element, try experimenting with bounds that contain several evaluations. A partial order evaluation other than vector dominance could also be implemented. An evaluation could be implemented which keeps the six features B3, B2, B1, R3, R2, R1 separate, rather than collapse them for each player. To determine the extent of the success of POB, several experiments could be performed on different variations of the Alpha-Beta method. If POB is only able to be victorious with one specific version of Alpha-Beta, then the extent of POB is rather limited. However, if POB fares well with many Alpha-Beta versions, then POB should prove to be a promising new approach to game playing.
49
5 The Method of Decomposition Search and its Application to the Game Connect 4 Another game search method recently developed by Mueller[5] is called Decomposition Search. This method is most useful in games that have a large branching factor. Mueller has successfully applied this method to the game of Go. The general idea of Decomposition Search is to partition a game into several independent subgames, and perform a localized search on each subgame.
5.1 Partitioning a Game An example of partitioning a game is given in Figure 23, which shows the
Figure 23. The partition of a 3 pile game of NIM (Taken from [5])
decomposition of the game NIM. In the game of NIM, there are several piles of stones from which two players alternate turns taking one or more stones from any one of the piles. The player who takes the last stone is the winner. Each subgame is independent in that a move in one subgame does not change the status of any of the other subgames. The game ends when each of the subgames has ended.
50
5.2 Local Search For each of the subgames, a local search is performed to find the value of that subgame. The evaluation of a subgame is based on Combinatorial Game Theory[2,3]. The evaluations of the subgames are analyzed in order to determine the evaluation of the entire game. The decomposition method is an example of a divide and conquer algorithm, in which a problem is divided into several simpler subproblems when global minimax search is too inefficient.
5.3 Decomposition Search Algorithm The basic algorithm for Decomposition Search is as follows: 1. Partition a game into a sum of independent subgames. 2. Local game search: find the game graph of each subgame. 3. Evaluation: For each game graph, evaluate the terminal states, then find the evaluation for the interior states using CGT. 4. From the evaluations of the subgames, choose the best move (which will be the best move in the entire game).
5.4 The Application of Decomposition Search to Connect 4 Although this research does not focus on the full application of Decomposition Search to Connect 4, some initial ideas are presented below. Figure 24 illustrates how a Connect 4 position can be partitioned. The lines through
51
Figure 24. Partitioning a Connect 4 position
the squares indicate all of the possible ways that either player can eventually win. If there are two sections of the board in which no lines cross one another, then those two sections are considered to be independent subgames. A move in one of them will not affect the status of the other subgames. In this example, there are three independent subgames, as denoted by the highlighted regions. The squares that are not highlighted can simply be ignored, as they do not affect the ways in which a player can win. The application of local search and CGT to Connect 4 is only in its earliest stages. One concern is the possible outcomes of a Connect 4 game. In traditional combinatorial games, the first player who cannot move loses the game. There is no such situation in a Connect 4 game. Another consideration is the case of a draw in Connect 4. In standard combinatorial games, there is always a clear winner. Much research needs to be done to determine how to deal with the aspects of Connect 4 that deviate from the characteristics of a typical combinatorial game.
52
One requirement for applying Decomposition Search and CGT to a game is that the game must be able to be partitioned into independent subgames. Since there is a method of partitioning that can be applied to Connect 4, the full application of Decomposition Search to Connect 4 seems quite promising. The nature of Decomposition Search has potential to prove successful since it restricts the search to smaller, localized regions of the board, and it reduces the exponential branching factor that exists in full global search. If the application of Decomposition Search to Connect 4 would prove to be successful, a possibility to further enhance game playing would be to combine the methods of Partial Order Bounding and Decomposition Search. An initial idea could be to apply POB to the local game graphs resulting from the partitioning of a game position. The combination of the efficiency of POB and the human-like aspects of Partial Order Evaluation with the localized focus of Decomposition Search may prove to be an effective, innovative approach to game playing.
53
6 Related Work In addition to the work on POB and Decomposition Search done by Mueller in [4] and [5], there has been work done on the actual game of Connect 4. In [1], an actual solution to the game Connect 4 is proposed. It is claimed that if the first player chooses moves correctly, then he/she can be guaranteed a win.
54
Appendix: Rules of Connect 4 The game of Connect 4 is played on a 6 row by 7 column board of squares. There are two players black and red, who place pieces of their respective colors on the board. A legal move consists of a player dropping a chip of his/her color down one of the 7 columns.
The piece will fall to the
bottom-most square that is not already occupied. For example, in Figure 25, if a player drops a piece into the third column from the left, the piece will
Figure 25. The game Connect 4
fall to the second row from the bottom, and be placed in the square above the square with the 'B'. When a column becomes full, a player can no longer place pieces in that column. The players alternate turns until the game has ended. A player must play his/her turn; a turn cannot be skipped. The objective of the game is for a player to connect four pieces in a row of his/her color, either horizontally, vertically, or diagonally. It is possible that the
55
board could become full with neither player connecting four in a row; in this case, the game ends in a draw.
56
References [1]V. Allis. A Knowledge-based Approach of Connect-Four. Report IR-163 by the faculty of Mathematics and Computer Science at the Vrije Universiteit Amsterdam, The Netherlands. 1988. [2]E.Berlekamp, J.Conway, and R.Guy. Winning Ways. Academic Press, London, 1982. [3]J.Conway. On Numbers and Games. Academic Press, London/New York, 1976. [4]M.Mueller. Partial Order Bounding: A New Approach to Evaluation in Game Tree Search. Artificial Intelligence Journal. 2001. Also available from: http://www.cs.ualberta.ca/~mmueller/publications.html (this thesis references the online version). Pages 4-5, 8-11, 17, 24. [5]M.Mueller. Global and Local Game Tree Search. Information Sciences. 2000. Also available from: http://www.cs.ualberta.ca/~mmueller/publications.html (this thesis references the online version). Pages 1-8. [6]A.Plaat, J.Schaeffer, W.Pijls, and A.De Bruin. Best-first fixed-depth minimax algorithms. Artificial Intelligence, 1996. [7]K. Rosen. Discrete Mathematics and its Applications : Third Edition. McGraw Hill, Inc. 1995. [8]S.Russell and P.Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, 1995.
57