- 11,281
- Sweden
- eran0004
I started programming a basic machine learning tic-tac-toe AI. It knows how to recognise patterns, keeps a memory of moves it has made in previous games and is basing its decisions on the outcomes from those games.
When it faces a game state it has never seen before, it will append it to its memory, determine which moves are possible and give these moves a starting weight of 0.5. It then randomly* selects the move to make and adds it to the list of moves it made. At the end of the game the AI will adjust the weight of the moves it made according to this:
Win: +(1-w)/e
Draw: +0
Loss: -w/e
(where w is the original weight and e is Euler's number).
The memory is then saved to a file so the next game won't have to start with a completely blank AI.
* The random selection is weighted based on the value of each move, so if there are two available moves, one with a value of 0.8 and the other with a value of 0.2, then the first move will have an 80% probability of being selected.
If it does work (which it should), then the clever thing is that I could create two instances of the AI and let them play against each other, like, a hundred thousand games or something like that, by which point they should be more or less impossible to beat. And if I can export their memory after, say, 20 games, 100 games, 1000 games and 10000 games, I should get different difficulty options for the AI depending on which memory I choose to load.
What I have so far is the functions for recognising and remembering game states and for choosing which move to make. I just let the computer play its very first game (proud father moment!), and it went like this:
So victory for X!
The code so far looks like this (Python 3.6):
Edit: Apparently, 10000 games takes like two seconds to play, and there seems to be around 830 possible gamestates (excluding rotated or mirrored duplicates, but including gamestates where the game continues even after it has been won).
When it faces a game state it has never seen before, it will append it to its memory, determine which moves are possible and give these moves a starting weight of 0.5. It then randomly* selects the move to make and adds it to the list of moves it made. At the end of the game the AI will adjust the weight of the moves it made according to this:
Win: +(1-w)/e
Draw: +0
Loss: -w/e
(where w is the original weight and e is Euler's number).
The memory is then saved to a file so the next game won't have to start with a completely blank AI.
* The random selection is weighted based on the value of each move, so if there are two available moves, one with a value of 0.8 and the other with a value of 0.2, then the first move will have an 80% probability of being selected.
If it does work (which it should), then the clever thing is that I could create two instances of the AI and let them play against each other, like, a hundred thousand games or something like that, by which point they should be more or less impossible to beat. And if I can export their memory after, say, 20 games, 100 games, 1000 games and 10000 games, I should get different difficulty options for the AI depending on which memory I choose to load.
What I have so far is the functions for recognising and remembering game states and for choosing which move to make. I just let the computer play its very first game (proud father moment!), and it went like this:
Code:
[ ][ ][ ]
[ ][ ][ ]
[X][ ][ ]
[ ][ ][O]
[ ][ ][ ]
[X][ ][ ]
[ ][X][O]
[ ][ ][ ]
[X][ ][ ]
[ ][X][O]
[ ][ ][ ]
[X][O][ ]
[ ][X][O]
[ ][ ][X]
[X][O][ ]
[ ][X][O]
[ ][ ][X]
[X][O][O]
[ ][X][O]
[X][ ][X]
[X][O][O]
[ ][X][O]
[X][O][X]
[X][O][O]
[X][X][O]
[X][O][X]
[X][O][O]
So victory for X!
The code so far looks like this (Python 3.6):
Code:
import math, random
class Memory():
def __init__(self):
self.gameStates = {} # gamestate: [weight, weight, weight...]
self.moves = {} # gamestate: move
memory = Memory()
startValue = 0.5
def rotateMatrix(s="", cw = True):
'''Rotates a 3x3 matrix 90 degrees (clockwise if cw == True)'''
if len(s) != 9:
print("Matrix error: length %d, expected 9" %len(s))
return s
rotated = ""
k = 1-(2*cw)
index = -1+(10*cw)
for i in range(3):
for j in range(3):
index += 3*k
rotated += s[index]
index -= 10*k
return rotated
def mirrorMatrix(s=""):
'''Mirrors a 3x3 matrix'''
mirrored = s[6:]+s[3:6]+s[:3]
return mirrored
def matchGameState(gameState):
'''Searches the memory for the gameState (including rotation and mirror)'''
def addToMemory(gameState):
'''Adding a missing gamestate to the memory.'''
memory.gameStates[gameState] = []
for pos in gameState:
if pos == '-':
memory.gameStates[gameState].append(startValue)
else:
memory.gameStates[gameState].append(None)
keys = memory.gameStates.keys()
for mirror in range(2):
for rotation in range(4):
if gameState in keys:
print("Gamestate found in memory with mirror %d, rotation %d" %(mirror, rotation))
return gameState, mirror, rotation
gameState = rotateMatrix(gameState)
gameState = mirrorMatrix(gameState)
addToMemory(gameState)
return gameState, 0, 0
def chooseMove(gameState, mirror=0, rotation=0, sign='X'):
moves = []
weights = []
for i in range(9):
if gameState[i] == '-':
moves.append(i)
weights.append(memory.gameStates[gameState][i])
move = random.choices(moves, weights)[0]
memory.moves[gameState] = move
gameState = gameState[:move]+sign+gameState[move+1:]
for i in range(rotation):
gameState = rotateMatrix(gameState, False)
for i in range(mirror):
gameState = mirrorMatrix(gameState)
return gameState
def go(gameState, sign='X', print_=True):
gameState, mirror, rotation = matchGameState(gameState)
gameState = chooseMove(gameState, mirror, rotation, sign)
if print_: printGame(gameState)
return gameState
def printGame(gameState):
s = ''
for i in range(len(gameState)):
c = gameState[i]
if c == '-':
c = ' '
s += '[%s]' %c
if (i+1)%3==0:
s+='\n'
print(s)
Edit: Apparently, 10000 games takes like two seconds to play, and there seems to be around 830 possible gamestates (excluding rotated or mirrored duplicates, but including gamestates where the game continues even after it has been won).
Last edited: