Why use multiple markers?

For our dog to select the appropriate behavior for a given cue or context, their brain needs to have established associations among sensory stimuli, selected behaviors, and rewards. In training scenarios, typically the rewards are treats, toys, personal play, or a behavior the dog enjoys, and we associate these with the specific behavior/s we desire.

One part of the brain plays an important role in learning such stimulus-action-reward (antecedent-behavior-consequence) associations. However, another part of the brain is focused on reward-prediction error.

So, what is reward prediction error (RPE)?

A common analogy often used to describe RPE, centers around a typical human's response to the delivery of an initially unexpected treat. 

You are sitting at your desk typing and a friend taps you on the shoulder, you turn around to see what they want, and they hand you a candy. Your dopamine neurons will fire at this unexpected reward delivery. Once this same scenario has happened many times though, you now turn around after being tapped on the shoulder expecting a candy. Now the dopamine neurons fire at the tap on the shoulder, instead of the presentation of the candy. This is because the tap on the shoulder becomes a cue that reliably predicts the candy, but the tap on the shoulder is still occurring with no specific warning and on no particular schedule, so now the tap on the shoulder is the unexpectedly good thing, whereas initially the candy itself was the unexpectedly good thing. (Dopamine neurons fire at reward delivery before learning has occurred, but the activity shifts forward in time to the presentation of a reward cue (marker) when the reward is predictable from the cue).

We can see the outward expression of this process in our dogs. After many repetitions of delivering a reward immediately after a marker, we see their positive response change from initially occurring only at reward delivery, but then moving forward to occurring at the marker.

Now going back to our human analogy. What happens if you turn around and your friend gives you something that you perceive to be better than candy? Conversely what if you turn around and they give you nothing, or they give you something you perceive to be less enjoyable than candy?.......... a reward prediction error occurs.

  • Rewards that are surprising or unexpected elicit strong increases in dopamine neuron firing. (Positive error.)
  • Anticipated rewards produce little or no change. (No error.)
  • The lack of materialization of an anticipated reward results in dopamine neuron firing being reduced below baseline. (Negative error.)

Dopamine neurons encode the discrepancy between reward predictions, and information about the actual reward received. They then send this information to downstream brain regions involved in reward learning.

Going back to our human analogy again. We may enjoy candy, we may also enjoy cheese, but we will likely have a preference. That preference may even vary at different times and in different contexts. Accurately predicting what is being offered before turning around, eliminates the possibility of disappointment. So maybe a tap on the right shoulder for candy and a tap on the left for cheese.

Similarly for our dogs, it is ideal if the specific marker predicts the actual reward that will be delivered. Using multiple markers allows the dog to predict the type of reward (treat, tug, ball throw, release, etc.) that will be made available. Markers that not only tell the dog what type of reward will be delivered, but exactly how and where they will access the reward are the most effective. Using multiple markers eliminates the chance of inadvertently causing a negative RPE.

Understanding how the RPE system functions can be useful for ensuring our dog's learning is not negatively impacted, by receiving a lesser reward than what was anticipated. We can also utilize this concept to influence behavior extinction in a kind way, by providing a reward that is substantially lower than the predicted reward.

Sign-tracking versus Goal-tracking

Another concept associated with markers, is that of "sign tracking" (ST) behavior, versus "goal tracking" (GT) behavior. This has been extensively studied in rodents, and in more recent years the research has expanded to humans and other species.

As we know from our training scenarios, cues (markers) that are associated with rewards, acquire predictive value, and influence behavior. However, it is worth noting that there is significant individual variation in the propensity to assign motivational value (incentive salience) to the reward cues (markers) themselves.

In classical conditioning studies in rats, where a cue (extension of a lever), predicts a food reward being delivered in a bowl in another location, some of the rats will favor approaching and interacting with the lever (sign trackers), whereas other rats will favor approaching and interacting with the site of delivery (goal trackers). All rats will eat the food once it is delivered; it is the behavior that occurs between the conditioned cue (marker), and the delivery of the reward itself, that differs.

This concept has been demonstrated across many species. The cue (marker) predicts the reward for both sign-trackers and goal-trackers, but the individual responses are quite different. Some individuals (STs) will indicate they have detected a relationship between the cue and the reward by focusing their attention and behavior on the cue. In these individuals the cue itself becomes attractive and desired and elicits reward-seeking behavior. Whereas other individuals (GTs) will indicate they have detected a relationship between the cue and the reward by focusing their attention and behavior on the reward delivery area.

We see the results of this phenomenon when training our dogs. For some dogs (GTs), the motivation to work is tightly linked to receiving the reward, hence their behavior may change once they lose interest due to satiation. Arousal will likely lower, and performance may also reduce unless the behavior is already trained through to habit (automatic response). Whereas other dogs (STs) appear unable to shift their thoughts and actions away from rewards and the cues associated with the rewards, even once satiated. Dogs displaying ST behavior may also show obsessive responses to a clicker for example, becoming hyper-aroused just from the sight / sound of the marker or marking device.

Even in our day-to-day management, we note that during food preparation some dogs (ST), primarily focus their interactions on the cues that indicate that food will be available soon, whereas other dogs (GT), primarily focus their interactions on the area where the food will be made available for consumption.

Overall, sign-trackers place excessive incentive value on the cues associated with the reward; the feelings associated with the cues, surpass the pleasure associated with the reward itself. For example, when the reward is food, sign trackers push on in their attempts to elicit the cues associated with the reward, even when the pleasure of consuming the food diminishes. In contrast, goal trackers lose interest in attempting to attain the reward as they become satiated.

Sign tracking behavior has been linked to impulsivity, deficits in attentional processing, an increase in compulsive behaviors, and susceptibility to addiction in humans.

The difference between ST and GT is not just observable as a difference in behavior, several differences have been identified in the neurobiological mechanisms underlying these distinct associative learning strategies (Kuhn et al, 2018).

To complicate matters further, some studies show that there can be variation within the one individual in sign-tracking and goal-tracking behavior across various reinforcers. This suggests that ST and GT behavior may be influenced by individual differences in the value placed on specific reinforcers, as opposed to reflecting dispositional differences (Patitucci et al, 2016).