The primary topics of research during the past
year were the modeling of reflexive reasoning in the CIC task,
the contribution of metacognitive / reflective skill to reflexive
learning, and modifications of SHRUTI to handle reasoning about
uncertainty. A knowledge base was derived from critical incident
interviews with active-duty Naval officers and from analysis
of experimental protocols of officers at the Surface Warfare
Officers Schools and Naval Postgraduate School. The knowledge
base was then encoded within a SHRUTI network. One of the core
scenarios selected to serve as at testbed for CIC-related research
(Korea), was also encoded within SHRUTI, for use in testing
the effects of training.
A machine learning study was conducted using the CIC knowledge
base and the Korea scenario. Scenarios for training and testing
were generated by creating a probabilistic event tree, with
branching alternatives for all relevant features of tactical
situations similar to the Korean scenario. The event tree
included general information about the context (e.g., the
level of hostility between countries to which various platforms
belonged, the appropriateness of different platforms for attacking
an AEGIS cruiser, and the degree of danger a platform was
in that might motivate protective action). The event tree
also included different possible intents (e.g., intent to
attack, intent to protect), and branches for observable actions
that might result from those intents (e.g., whether or not
a platform was closing on another platform, flying at high
or low altitude, and flying at high or low speed). Encoding
the information in these scenarios made critical use of the
dynamic variable binding capabilities of SHRUTI. The scenarios
generated in this way were randomly divided into training
and test sets. Backpropagation in the SHRUTI network was used
to adapt the weights in the knowledge base based on exposure
to the training scenarios.
Two conditions were compared: Simple reflexive learning,
and learning with a metacognitive "hint" midway
through learning. The hint consisted in the suggestion that
the network consider the possibility of non-hostile intent
of an approaching platform. The hint by itself provided no
evidence for or against any particular hypothesized intent.
The function of this hint was simply to shift the model's
attention slightly, in order to overcome limitations on the
spread of reflexive activation. Dependent variables in the
experiment consisted in the changes in knowledge base weights
and the performance of the trained system in the Korea scenario.
Both reflexive training and reflexive + metacognitive training
changed the weight on rules for hostile intent. But only reflexive
+ metacognitive training changed the model's prior beliefs
in the likelihood of non-hostile and hostile intent. In addition,
the two training manipulations led to different results in
the Korea scenario. There was significant support for the
possibility of intent to protect in that scenario after reflexive
+ metacognitive training, but no support at all for intent
to protect after reflexive training. Thus, the metacognitive
hint would lead an officer to take more time before engaging
in this test scenario.
The following enhancements have been made to Shruti: encoding
of schemas and their integration with rules; encoding of taxon
facts -- distillations of prior observations and inferences;
new computational encoding of instances, types, and sub- and
super-ordinate relations; exception conditions for rules;
abductive and defeasible inference (partially completed);
and learning of strengths associated with rules and taxon
facts (via backpropagation). In addition, work on integrating
the Shruti architecture with adaptive critics (value function
decompositions and reflexive planning) and support for metacognitive
attention shifting is continuing.
Recent accomplishments
We have been working with a core scenario, common
to two other projects funded under the same initiative, in which
we have examined the structural changes which reflective reasoning
can introduce into the Long-Term Memory (LTM) and the reflexive
reasoning of an agent. In the simulations of the model, we explored
how a training hint, modeled after Metacognitive training done
with experienced Navel Officers, would influence the agent's
perception of alternative explanations of evidence. Without
the metacognitive training intervention, the computation agent
would interpret a novel, but related, scenario as predominately
supporting the conclusion of hostile intent for an approaching
air track. While the anomalous evidence in the scenario did
lend some support to the negation of that conclusion, the agent
was not able to explain the evidence in terms of an alternative
hypothesis. This behavior is very close to that observed by
more junior CIC officers, who tend to interpret evidence more
readily in favor of hostile intent and give less consideration
to alternative explanations and to the broader context of events
(political motives and ramifications of actions).
With the metacognitive training intervention,
the simulation learned to recognize an alternative explanation
of the evidence in the novel scenario: that the approaching
air track might be intending to provide protection and coordinate
rescue for crew in a downed helicopter near ownship. By recognizing
such alternative explanations, as do more experienced officers,
the agent is able to focus its resources on the structural uncertainty
in the various explanations and can attempt to construct a novel
explanation that best fits both prior experience and the novel
aspects of the current situation.
Perhaps the most interesting aspect of this
result lies in how metacognition acts to structure LTM, and,
in doing so, changes the reflexive (first blush) reasoning of
the agent. One of the predictions of the neural model is that
the only a limited set of inferences can possibly be computed
in the time scale of reflexive reasoning (~500ms). This has
the effect, among others, that only knowledge within a limited
inferential distance (in terms of chaining from one relation
to another through learned patterns of causal relationships)
can participate in reflexive reasoning. That is, while the greater
body of knowledge in LTM is unable to participate in any given
query, all inferentially / structurally close knowledge does
participate. In the above example, this plays out where the
more junior officer (and the agent without metacognitive training)
fail to consider the role of the downed helo near ownship with
respect to the intent of the approaching air track. However,
the more experienced officer (agent with metacognitive training
intervention) notices this relationship explicitly because LTM
has been structurally changed to bring such relationships into
a closer inferential proximity.
In related work, we are uniting reinforcement
learning with the Shruti architecture by an appeal to the mathematical
formalism underlying reinforcement learning, (dynamic programming
realized in approximate and incremental algorithms), and to
the macro-scale of cortical brain structure, especially to the
organization of the sensory and motor projections. At a gross
structural level, cortical organization divides into sensory
and motor circuits. As we move centripetally in brain structure,
the brain exchanges overtly spatial representations for innately
temporal organizations encoding situational awareness and intentionality.
Also, we find increasing interconnection between the sensory
and motor pathways.
Reinforcement learning is concerned with translating
an estimate of expected future value (EFV) (or related measures,
such as cost/benefit) into a reflexive behavior, which optimizes
EFV over time. In situations which have innate uncertainty,
such as the decision making facing the CO/TAO in an Aegis CIC,
both the situation estimate and the generated plans must be
novel in response to the novel features of the environment.
We have pursued an approach where abstract relations in the
world, such as the causal linkage between an intention (to attack)
and observed actions (localizing ownship, turning on targeting
radar) are explicitly encoded within the reflexive reasoning
mechanism of Shruti. Shruti then, in response to observations
in its world, and in response to top-down priming concerning
own and other intention, reflexively elaborates patterns of
activity that are both explanations of the evidence and predictions
concerning the world. To integrate this mechanism with reflexive
planning in such relational networks, we are working with the
notion of value function decompositions (VFD).
A value function decomposition serves as a projection
of an agent's intention into actions that seek to achieve desirable
world states in service of that intent. We already have defined
patterns of interconnection in the sensory circuit that encode
causal relations and which are concerned with modeling the world,
that is, with belief. We extend this by generating, in the motor
circuit, patterns of hierarchical relationships among those
same relations that encode the value function decomposition.
These patterns of interconnection serve to differentiate intention
into goals, that is, the desire to make a relation true (or
false).
We use the interconnections between the motor
and sensory circuits to pass activity into the sensory system
(world model), which automatically initiates the same reflexive
reasoning process to uncover causal relations which bear on
the goal, and which can be exploited as plans to achieve the
goal. Sub-goals are discovered for every causally relevant relation
that bears on any active goal. Once stable patterns of activity
are discovered, the elaborated goal (and plan components) can
be actualized by removing top-down inhibition, reflecting the
decision to act. This model of reflexive planning is adapted,
in response to reinforcement, by tuning the weights on the centrifugal
projections that are the value function decomposition.
One of the more interesting aspects of this
approach is that it makes explicit the functional illusion of
goal/sub-goal based planners. Relations, as used to model the
world, become goals by being in a state in which they are highly
activated in intention (contrast with activated in belief).
That is, the differentiation of intention via the value function
decomposition is such that an intentional state activates relations,
as goals, in proportion to, and via, weights which are tuned
by reinforcement learning. The process of sub-goaling is merely
that of spreading activation (of differentiated intention) along
causal chains and the interconnection patterns of the value
function decomposition. As the intentional state shifts, or
the situation estimate shifts, the system automatically, and
reflexively, re-plans.
The central limit of this model, and, arguably,
a limit to which human decision makers are subject as well,
is the limited scope of coherent and reflexive inference. This
is the same issue discussed above with respect to the structural
organization of LTM over time. The design responds to this issue
by providing a reflective, or metacognitive, adaptive process
that is concerned with shifting internal attention to elaborate
and compare explanations and plans.