Active Learning

(An essay from Cybernetic Ruminations)

May, 1995

Brandyn Webb / brandyn@sifter.org

I composed this letter for the Inductive Learning mailing list, but I quit the list before I finished the letter. Here's how far I got:

An active learner receiving both positive and negative evidence and feedback may, in fact, be composed of a number of sub-units which operate in various different manners. When considering the usefulness or necessity of a particular type of evidence, we have to consider what path through these units is responsible for the task under consideration, and further, which unit is the current learning bottleneck. Active learning is a highly composite problem, so it is fairly confounding to argue over the corresponding necessity of xyz without specifying which components of the "active" learning process we are addressing. Further, the apparent need for "negative" feedback may diminish when we consider the units independently, rather than viewing the entire process as homogeneous.

I am not familiar with current models of active learning, but let me just make one up for the purposes of discussion:

	U -> P -> C
	^    v    v
	M <- A <- P'
		
	U = Universe
	P = Perception
	C = Consciousness/Control
	A = Action Decision
	M = Muscle/Mechanics

Here, the Universe is Perceived, the Control decides what it would _like_ to Perceive, which causes Mechanical Action, thereby changing the Universe. (I will not pretend to justify this model, it is just a thought experiment!) In this model, there are three trainable units: P, A, and C. Each of them necessarily has a qualitatively different mode of learning:

P = Perception

The job of P is to create a compact and utilitarian representation of U. I would argue that the most efficient means of learning P is if P attempts to _model_ U. This learning process can happen independently of A and C, and has the ability to "bootstrap" itself from a state where P, A, and C are all cold. However there are a number of possible feedback paths from C to P: Obviously, C, through A, affects U, and therefor has an influence on what P is given to learn from. I.e., if C chooses to "attend" to a particular subset of U, P will gain extra experience in the subset, U'. Further, there is the possibility of a direct feedback from C to P in the form of relevance assessment: "This is important, tell me more". That is, if C needs a finer analysis of U' than is inherently warranted by the U<->P relationship, C can, with one signal, force P to learn (and hence ultimately convey) more about U' [1]. C could, in principle, explicitly supervise P, but I would argue that this introduces a paradox when you consider the ultimate source of information.... Finally, we could introduce a loop back from C to P, so that P learns to model not only U but also C, which would allow P to model (and hence represent back to C and A) the state information in C as well as U. This consideration is perhaps an unnecessary complication for this example, but it does make a few of the assumptions below slightly more reasonable....

So, P learns passively (locally speaking), unsupervised, and only from positive evidence.

A = Action

The job of A, in this simple model, is to map P and P' (C's output) to M, such that P becomes P+ = P'. For example, you may "envision" a dart in the bullseye (P'), and then call on A to make it happen (P+ ~= P'). It is relevant, here, that P (and hence P') may represent U at many different levels of abstraction. So, in the dart example, it may only be necessary for P' to represent the _concept_ that the dart is in the bullseye, which is a much easier task for C than to explicitly synthesize, say, the actual visual details of that event.

The immediate implication is that A is a mapping function that receives negative feedback in the form of a differential between P' and the final P->P+. But this interpretation makes the implementation of A paradoxical! It assumes that, somehow, A can translate P'-P+ to M'-M. i.e., in order for A to _utilize_ P' as a training signal, it must map P' to M' (the M that _would_ have resulted in P'). But if we could do that, we wouldn't need A in the first place. From the dart example: you envision the dart in the bullseye, but instead, your leg twitches and you kick your coffeecup. Now, without cheating by using your knowledge of mechanics, how can your brain know that if you had moved your arm instead of your leg, you would have had a better chance?

In fact, it is much simpler if A simply learns to map from (P+, P) to M. That is, regardless of what was _intended_ (P'), there is always a positive training example to be gained from what _happened_ (P, M -> P+). For instance, you aim at the bullseye, but hit the 20, you have just gained experience on how to hit the 20.

Notice that this removes C->P' from A's learning loop. A is only directly dependent on P. That is, you can at least _begin_ to learn the Action mapping as soon as you are able to Perceive the consequences of your Actions -- independent of any conscious intensions.

In this simple model, the only affect C has on A is via P' -- by controlling the intended consequences of an action. There are a couple of ways, off hand, that C might pursue a learning goal through this route. Both of them, unfortunately, require some some higher-level reasoning on the part of C. For one, C may intentionally _misguide_ A. For example, your darts keep hitting the board higher than you intend, so you consciously decide to aim low. When you succeed in hitting your target, you have created a _positive_ training example based on the _actual_ result (P+), which means eventually you will learn to hit where you aim. At a more basic level, C may choose to present P' at a lower level of abstraction. For example, you envision the dart hitting the bullseye, but it just sits there in your hand. So you consciously decide to extend your arm rapidly and in the general direction of the dart board, and to release the dart at the end. Here, you are using more primative knowledge in A to generate an action which will ultimately create a positive training example for a more abstract goal.

So, A too learns passively (locally speaking) and only from positive examples. Though since A is essentially learning a supervised mapping, one could say that the basic learning law may utilize the output "error" and hence call that "negative" feedback.

C = Control

Finally there is the big C, which Conveniently subsumes everything P and A can't do. But let's take a stab at what C might do, anyway. Recalling our model:

	U -> P -> C
	^    v    v
	M <- A <- P'

C presumably has some hard-wired goals and notions of what is good and bad in P. E.g., lots of activity from the Pain receptors is probably a bad thing. The algorithm in C might induce what temporal paths of P ultimately lead to good, and might then constantly impose on P' this expectation. When P+ fails to match P', the path is broken, and C might set out to re-establish the path via a variety of exploratory or corrective means, some of which were hinted at above. In the process, C is training A, by example, to Anticipate the implementation of particular goals, so that C may be freed up to apply its relatively linear-type processing to more abstract issues. E.g., so you can contemplate the topology of a Kline-watermelon while you are winning a game of darts.

Now, while we've hardly begun to explore how C might work, we can begin to analyze some of the recent issues within the context of this simple model.

Language acquisition is both an active and passive task, simply because language usage is both active and passive. Passively, P acquires a model of speech perception, including the correlations between spoken phrases and other observable events in U. I.e., the U<->P relationship can bootstrap a basic understanding of rudimentary language without involving A or C. (There is a strong correlation between the observable presence of that mug and the phonetic sequence k-ah-f-ee.) By this model, there is no inherent need for the active path, P'->A->M, in the acquisition of speech _perception_, except perhaps as a means of directing focus (turn your head to hear) or forced repetition "It's called a _what_?", both of which are merely accelerators, not necessities.

Speech production (A), on the other hand, is dependent on P--simply because A cannot learn to produce what it cannot perceive. But, since A,M,U,P,A forms a complete cycle, A can be bootstraped without C. E.g., the random outputs (M) of an untrained A might produce babble which would serve as rudimentary training examples for the M->(U->)P mapping. Ultimately, however, C contributes to this process by corralling A into domains of interest. E.g., C decides it wants to experience the goodness of the Ball. So it sets forth upon the Path of expectation, which includes the expected perception of the word "ball". This expectation reaches A through P', and, based on A's babble training, A generates through M->U the word "ba".

One of two things can happen here. IF P is sufficiently advanced as to perceive "ba" differently from "ball", then C may observe the broken path (P'/P+ mismatch) and try again (to attain the perception "ball"). Through incidental or induced random variations in A, repetition in the vicinity of the desired target _may_ lead to improvement in A. Alternately, C may breakdown the "ball" expectation into smaller units, and present them through P' in succession ("ba" "ll"), which will confer to A a positive training example for the joint "ball" concept. [Note that C may actually use P as a tool in these tasks: e.g., present the phonetic-unit concept "ball" to P, and ask for a perceptual breakdown, which C could then forward to P'. It is not clear what, if any, learning must actually happen in C.]

On the other hand, if P is _not_ able to distinguish "ba" from "ball", then neither A nor C can learn anything further from this error. It remains to wait for such time as P has advanced sufficiently. For example:

>CHILD: Other...spoon. Now give me other one spoon.

It may be the case here that the child simply does not perceive the difference. Both, in his mind, lead to the same concept: other-spoon. In that case, the father's example would not serve as negative feedback per se -- the child perceives no discrepancy. Rather, it would simply serve as a positive example for purposes of perception--many more of which will be required before the statistical evidence outweighs the inherited "one spoon" bias (overgeneralization). As to whether we choose to call this evidence positive or negative is mostly a semantic issue. It is _not_ negative in the sense that the child never receives an explicit example of what is not correct. It _is_ negative in the sense that the child's expectation is never realized as positive evidence.

...

Point is, when we're dealing with active learning, we can't just assume that the only task at hand is to learn the correct action. Perception and evaluation are necessary prerequisites. Before you can learn to speak, you must learn to listen.

-Brandyn (brandyn@sifter.org)

footnotes: [1] This idea is credited to the work of Gary Lynch and Richard Granger, who have found strong evidence for signals of this type between hypocampus (?) and olfactory cortex.


Brandyn Webb / brandyn@sifter.org

(Back to Cybernetic Ruminations)