Tuesday, August 19, 2008

HTH or HTT?

Assuming we've got a long random string composed of the letters H and T, and only those two letters, which of the patterns "HTH" or "HTT" (overlap not allowed) do you think will occur more frequently?

For example, if overlap is not allowed, in the string "HTHTHTTH", both "HTH" and "HTT" occur once. However, if we allow for overlap, then "HTH" occurs twice and "HTT" once.

So, the question is, which of the patterns "HTH" or "HTT" (without overlap) do you think will occur more frequently? Will "HTH" occur at a higher, the same or lower frequency than "HTT".

You must give an answer.

To find out the answer run the Simulation application written in Java.

This question is an adaptation of the question asked by Peter Donnelly in his TED presentation on "How juries are fooled by statistics."

6 comments:

Anonymous said...

Some ruby guru will probably make a fool of the following code, but here it is :

class String
def count_occur(pattern)
self.scan(pattern).size
end
end

len = 1e5

10.times do
chain=Array.new(len){rand>=0.5?'H': 'T'}.to_s
puts [/HTH/, /HTT/].map{|pat|
"#{pat.inspect} -> #{chain.count_occur(pat)}" }.join(',')
end

I vote for HTT :

/HTH/ -> 9726,/HTT/ -> 12628
/HTH/ -> 10041,/HTT/ -> 12462
/HTH/ -> 10125,/HTT/ -> 12323
/HTH/ -> 9988,/HTT/ -> 12534
/HTH/ -> 9857,/HTT/ -> 12592
/HTH/ -> 10026,/HTT/ -> 12499
/HTH/ -> 10183,/HTT/ -> 12418
/HTH/ -> 10079,/HTT/ -> 12364
/HTH/ -> 10093,/HTT/ -> 12402
/HTH/ -> 9987,/HTT/ -> 12531

Ceki said...

And when overlap is allowed?

Anonymous said...

class String
def count_occur(expression, overlap)
expression = '(?=(' + expression + '))' if overlap
pattern = Regexp.new(expression)
self.scan(pattern).size
end
end

len = 1e5

def show(chain, expressions, overlap)
expressions.map{|expr|
"#{expr} -> #{chain.count_occur(expr, overlap)}" }.join(',')
end

10.times do
chain=Array.new(len){rand>=0.5?'H': 'T'}.to_s
expressions = ['HTH', 'HTT']
puts "no_overlap = #{show(chain, expressions, false)}"
puts "overlap = #{show(chain, expressions, true)}"
end

Ceki said...

Ruby is sure more succinct than Java which, by common sense, should contribute to better legibility. On the other hand, I must say that I did not find the Ruby version more legible, because 1) of bad indentation 2) I don't know Ruby, none of which is of course Ruby's fault.

Anonymous said...

The fact that ruby code is shorter than java code is only one of the reason I enjoy this language so much.
You shouldn't forget dynamic typing, meta-programming possibilities, open classes, ...
Yes, brevity can lead to ugliness. (From my small experience, ruby encourages you to write readable code in many ways).
For 2), it's your fault ;-) : Give it a try !

Ceki said...

Most of the problems that really interest me have to do with data structures, and Java is good enough a language for their implementation. So, I don't feel the need to try other languages.