First we need to formulate the problem: We want to know the
expected value of the random variable V "length of coin flip head-tail-sequence
X ending with a sequence
S (the chosen sequence) of length
l and not having
S anywhere else as subsequence". V can take any integer value between
l and ∞ (not smaller as X cannot end with S when it's smaller than S), so we can use this formula for the expected value:
E[V] =
l * P("X ends with S after
l coin flips") + (
l+1) * P("X ends with S after
l+1 coin flips") + ...
That's an infinite sum, so we need a trick to express it in a shorter form. Let's look at the probability that X ends with S after
l coin flips. The probability doesn't depend on the whole sequence, just the last
l characters of X. In particular, if we are just one flip away from getting a sequence that ends in S, we have a 1/2 chance of getting the right flip. If we are two flips away, we have 1/2 a chance to get to a sequence that is only one flip away, and so on. If we fail that's were the magic happens (or where the concrete sequence S finally matters): Suppose our S is HHTHH and we have a sequence ending in HH, so we are 3 coins away from success. If our next flip is T then we are just two flips away, but if our next flip is H, then we are still 3 coins away from success, as the sequence still ends with HH. This is in contrast with S=HHHHH as every T brings us back to being 5 coin flips away from success.
So we need to consider l states: S0 : 0 coins away from success, S1: 1 coins away from success, S2 : 2 coins away from success, ..., Sl: l coins away from success. Then we need to check for each state and considering our sequence S what happens when we flip another coin. For example with S=HHTHH:
S0 means we have a sequence X=...HHTHH and we are done
S1 means we have a sequence X=...HHTH and H gets us to S0 while T gets us to S5
S2 means we have a sequence X=...HHT and H gets us to S1 while T gets us to S5
S3 means we have a sequence X=...HH and H gets us to S3 while T gets us to S2
S4 means we have a sequence X=...H and H gets us to S3 while T gets us to S5
S5 means we have a sequence X=... and H gets us to S4 while T gets us to S5
How we can write down how the probabilities P
i to be in each state S
i change when we flip coins: When we haven't flipped a coin, we are in state S5:
P0(0)=0, P1(0)=0, P2(0)=0, P3(0)=0, P4(0)=0, P5(0)=1
If we flip a coin after k flips, the probabilities change like this:
P0(k+1) = 0.5*P1(k)
P1(k+1) = 0.5*P2(k)
P2(k+1) = 0.5*P3(k)
P3(k+1) = 0.5*P4(k) + 0.5*P3(k)
P4(k+1) = 0.5*P5(k)
P5(k+1) = 0.5*P5(k) + 0.5*P4(k) + 0.5*P2(k) + 0.5*P1(k)
The expected value E[V] is then calculated using P0(k), the probability of getting our sequence S after k flips:
E[V] = 0*P0(0) + 1*P0(1) + 2*P0(2) + ...
Here is a python script that first does some trials of our coin flipping game, and then calculates the expected value with the method described above. If you want to try it with another sequence, you have to find out the probability changes in one step like i did for HHTHH above. I guess you could come up with a closed form solution for E[V] with more advanced math, but i leave it at that.
Code:
import random as rnd
coin=['H','T']
S='HHTHH'
trials = 10000
avg = 0
for t in range(trials):
X=''
while 1:
X += rnd.choice(coin) # add a flip
if X[-len(S):] == S: # check end of X for S
avg += len(X)
break
avg /= trials
print ('average # trials',avg)
P = [0,0,0,0,0,1]
p = [0,0,0,0,0,0]
EV = 0
for k in range(trials):
if (k > 6 and k*P[0] < 0.0001):
print('after ',k,' flips:')
break
EV += k*P[0] # update expected value
p[0]=P[1]
p[1]= P[2]
p[2]= P[3]
p[3]= P[3]+P[4]
p[4]= P[5]
p[5]=P[1]+P[2]+ P[4]+P[5]
P[0] = 0.5*p[0]
P[1] = 0.5*p[1]
P[2] = 0.5*p[2]
P[3] = 0.5*p[3]
P[4] = 0.5*p[4]
P[5] = 0.5*p[5]
print('expected value for ',S,' is ~',EV)