From Silicon Labs - Timing 101: The Case of the Ouroboros Clock
Hello and welcome to another Timing 101 blog article.
In this post, I will go over an interesting and curious clock chip feedback arrangement that comes up from time to time. It can arise accidentally, or as an attempted recovery or test mode, but should generally be avoided as explained. Further, understanding the Ouroboros clock might help explain some odd behavior in a complicated timing application. Before diving in to exactly what I mean by an "Ouroboros" clock, let's review some basic clock switching terminology and the standard input clock switching configuration.
Some Basic Clock Switching Terminology
Clock chips often support switching from one input clock to another based on some qualifying criteria such as LOS (Loss of Signal) or an OOF (Out of Frequency) condition. Here’s the terminology most often used:
Freerun Mode:
Output clock based on an
attached crystal, or other resonator, or substitute external reference clock. The output clock's frequency stability, wander, and jitter characteristics are determined by the chip's crystal oscillator for example, independent of an input clock.
Holdover Mode:
Output clock based on historical frequency data of a selected input clock and employed when the input clock is lost and no valid alternate is available. Usually historical data must be collected over some minimum time window to be considered valid. The frequency accuracy is only as good as the data collected.
Locked Mode:
Output clock frequency and phase locked to a selected input clock, i.e. normal operation.
The Standard Input Clock Switching Configuration
Consider the illustration in the figure below where two jitter attenuator clock ICs are cascaded. This could be for additional jitter attenuation or for optimizing frequency plans and distribution. For the purposes of illustration, the devices are depicted as very simplified Si5345 block diagrams. In this figure there are two input clocks supplied to Device #1, IN0 and IN3. In typical applications one clock may be regarded as the "primary" clock and the other as the "secondary" or backup clock. The primary clock might be recovered from network data while the secondary clock relies on a local oscillator. If the primary clock fails or is disqualified by LOS or OOF, then the clock chip switches to the secondary clock. This is usually intended to keep "downstream" devices up and running. If the primary clock returns and is valid then the clock IC may revert to it, or not, depending on the option selected.
The presumption here is that as long as either of these two clocks is present then a valid locked mode clock will be yielded at OUT0 supplying an input clock to downstream Device #2. In fact, if both input clocks to Device #1 were lost the device could go in to holdover mode, as described above, or even freerun mode, and still yield a temporary reasonable output clock.
The Ouroboros Clock Configuration
In standard applications, downstream clocks are not fed back to upstream clock inputs. Rather they are usually scaled or jitter attenuated versions of upstream independent stable or data-derived clocks.
But what if we did attempt the configuration shown in Figure 2 below? In this case, one of the outputs of downstream Device #2 is being fed back in to upstream Device #1. This might be intended as a temporary expedient backup clock.
Now what happens when we lose our primary clock IN0 as suggested by Figure 3 below? The secondary or backup clock IN3 to Device #1 relies on the output of Device #2. Note that this is just a locked version of Device #1's output. We generally do not do see this sort of connection with one device but it is proposed occasionally with applications involving 2 devices. Even then, engineers will usually intuit that we are trying to get away with something.
This is the Ouroboros clock configuration. (And yes, it does sound almost like a Big Bang Theory episode title.) The Ouroborus clock configuration is so named because its feedback resembles the mythological symbol for a snake chasing (or biting) its tail. According to the Wiktionary entry the word comes from the Greek words ourá for "tail" and bóros, for "devouring or swallowing". See the illustration below in Figure 4. It is an ancient symbol for cyclic infinity and the term fits this application.
A Gedanken
Let’s consider a simplified gedanken or thought experiment consisting of a single basic PLL. Then assume that it has successfully been placed in to the Ouroboros configuration as follows in Figure 5 below.
Now we can think through the probable consequences. If everything is ideal and there is no PFD (Phase Frequency Detector) error output then the situation is at least marginally stable. However, even ignoring loop noise, it is most likely in a practical PLL that there is a fixed phase offset between the clocks presented at PFD (+) and PFD (-). In normal PLL operation the VCO can be adjusted so as to frequency and phase lock the output clock to the independent input clock. In the Ouroboros configuration, there is nothing the VCO can do to reduce phase error.
Assume the output clock is measured with phase just a little bit faster, at PFD (+) versus PFD (-). The loop will then attempt to track for that by tuning the VCO to a higher frequency. But a relative phase difference will still be present. So, the loop will continue attempting to correct for the measured phase error until the VCO is “railed” at its highest frequency. Note that, to generalize, the VCO could be tuned either higher or lower in frequency depending on the polarity of the phase difference. All that matters is that a phase delta be seen by the PFD that leads to a runaway condition.
Trying to accomplish this with two Si5345s is just this problem writ large, albeit with further complications due to clock validation and switching logic. In addition there will always be slight part to part variations in output frequency and calculated HO frequency. These can also drive the PFD in one direction or another where 2 separate devices are involved.
Lab Confirmation
So, what really happens in the lab? Consider a project plan with these attributes:
- IN0: 100 MHz
- IN1: 100 MHz
- OUT0: 100 MHz
- Nominal Bandwidth: 100.000 Hz
- Fastlock Enable Off
- Ramped Exit from Holdover
- OOF IN0 and IN1:
- Assertion Threshold 100 ppm
- De-assertion Threshold 98 ppm
Now take such a project plan and apply it to 2 Si5345 evaluation boards, configured as shown in the second figure above, except using IN1 instead of IN3 as the secondary or backup input clock.
Apply a signal generator to Device #1 IN0 and let both boards run until HOLD_HIST_VALID is true. What happens when you remove the 100 MHz input clock at IN0?
Initially only LOS[0] is reported by Device #1. Otherwise all is well. However, the output clock frequency from Device #2 starts ramping in frequency (it can be ramping up or down in general but happened to be ramping up in my particular experiment.)
Eventually the output clock from Device #2 being used as the backup input clock goes far enough out of frequency that it fails Device #1’s OOF criterion. The settled conditions are as follows:
- Device #1 goes in to holdover mode
- Device #2 operates in locked mode.
Note that in general there is no reason why the devices could not be stable with each in the opposite states. Our experience has been that most of the time there is a preferred set of states but you will see the alternate set from time to time, almost as if there is a chaotic element to the results.
In this case, the Ouroboros configuration didn’t really buy us anything except perhaps a little time. However, note that the output frequency was ramping the entire time until Device #1’s OOF[1] asserted and Device #2 still ends up relying on Device #1 HO clock. That’s just one potential issue for this impractical configuration. But there’s another potentially worse effect.
Ouroboros Oscillation
This configuration can also result in a positive feedback system that can be made to oscillate, leading to puzzling and odd behavior. In particular, this can happen if one of the devices can be made to enter and exit HO. For example, this phenomenon can be observed if the project plan OOF specs are tightened as follows.
- OOF IN0 and IN1:
- Assertion Threshold 000 ppm
- De-assertion Threshold 9375 ppm
Now the two devices will interact with each other and may never settle. Below is an annotated frequency plot of Device #2 output clock data from a logging frequency recorder. You can see that the Device #2 output frequency is slowly oscillating frequency-wise with a varying period on the order of 8 or 9 seconds.
There are three features noted on the plot above about the state of Device #1 as Device #2's output frequency varies:
- Device #1 is in holdover or HO mode
- Device #1 is in ramped exit from HO
- Device #1 is entering in to HO
During this time period no alarms are issued by Device #2. This state can last indefinitely. I started one trial of this experiment on a Friday afternoon and it was still cycling on Monday morning. The devices can even exchange roles as to which one is in the HO state!
Having a device constantly entering and exiting HO is even worse than simply going straight in to HO.
Conclusion
The bottom line is that the Ouroboros clock configuration either does nothing useful except delay entering HO or can even trigger an oscillation which produces repetitive wander in the output clock. Downstream clocks should generally stay downstream.
Hope you have enjoyed this Timing 101 article and will understand the implications if you spot an Ouroboros
As always, if you have topic suggestions, or there are questions you would like answered, appropriate for this blog, please send them to kevin.smith@silabs.com with the words Timing 101 in the subject line. I will give them consideration and see if I can fit them in. Thanks for reading.
Keep calm and clock on,
Kevin