Undecidability Proof of Halting Problem without Diagonalization

As a TA for a graduate level Theory of Computation course, I dont understand why we have to teach the diagonalization proof of the undecidability of the halting problem. It is much easier if we just use a normal argument about functions in a programming language. Basically, functions correspond to Turing machines in a theory of computation context. In order to improve the situation, I came up with a simpler formulation based on the view of Turing machines as functions.

Suppose that we have this function Accept(M,w) which takes two arguments, a function M and its input w, and can tell you whether or not M(w) will output True. Basically, suppose that Accept is the hypothetical solver for the halting problem. To connect to usual theory of computing terminology, the fuction M corresponds to a Turing machine, True corresponds to reaching an accept state, and False corresponds to reject state. Note that although the argument M may go into infinite loops, never will Accept. Accept always returns True or False in a finite amount of time. Accept determines this without actually running M because M may go into an infinite loop and never return. This way of finding properties of programs without running them is usually called static analysis. It has the same essence as fortune-telling and all other sciences.

Now we come to our simple contradiction: does the following expression (call it F) return True or False?

Accept(λm.not(Accept(m,m)), λm.not(Accept(m,m)))

It turns out that this question cannot be answered at all! If F returns True, then when we actually apply the function λm.not(Accept(m,m)) to the argument λm.not(Accept(m,m)), it should return True, right? But when we apply it, we get

not(Accept(λm.not(Accept(m,m)), λm.not(Accept(m,m))))

Notice that it is exactly the negation of the original expression F. This means that F should return False (because the actual run returns False). On the other hand, if F returns False, then when we actually apply the function on its argument, it will return True. Thus the contradiction. QED.

Some may argue that Im implicitly using diagonalization here. Well, you can say that they are equivalent in effect, but the equivalence doesnt mean that diagonalization is a good word for describing what Im doing. Actually Im just making a loop in a circuit. I dont see the diagonal, then why call it that way? The diagonal argument and the loop argument may have the same essence, but they are very different in their comprehensibility. I feel that the loop way is a much easier to understand than diagonalization.

An interesting final question may be, what is the connection between the above expressions with the Y combinator?

λf.(λx.f(x x)) (λx.f(x x))