Part 1 · AI infrastructure timing

State of timing in
AI infrastructure.

21 May 2026

A growing vendor stack is selling timing as the missing piece of AI infrastructure.

SiTime calls it the foundation of the AI data center. Clockwork.io raised a $21M Series A around clock sync and latency visibility; funding databases list about $41.5M total. Syncworks positions itself as a precision-timing integrator.

Before we ask whether they're right (that's the next post), let me show you what they're actually selling, and what the rack actually has. We'll come back to the claims in the posts that follow.

Grandmaster (PTP source)
GNSS antenna · OCXO · Microchip / Meinberg
↓ IEEE 1588 / PTP
Inside the PTP Tree
Switch
Boundary clock · ~100 ns
NIC PHC
PTP hardware clock · ~100 ns
Host kernel
phc2sys · single µs
— PTP TREE TERMINATES —
Outside the Tree · Each plane its own clock
BMC
Boot RTC
NTP optional
PDU
SNMP sysUpTime
Manager wall-clock
Optic (CMIS)
CMIS telemetry
ePPS optional
GPU
CUDA / CUPTI
NVML / DCGM
+ CDU cooling · K8s scheduler · OOB mgmt · NVMe · PSU · UPS
Nine streams. No causal order from timestamps alone.
Overview. Rack timing architecture, top-down. PTP discipline reaches NICs and switches. The rest of the rack runs its own clocks.

Where time begins

Atomic standard
10⁻¹⁵ to 10⁻¹⁶
GNSS satellite
tens of ns
Grandmaster (PTP)
~100 ns
NIC PHC
sub-µs
CPU TSC
µs–ms
Diagram 1. The time stack. Five layers. Cosmos to CPU. Accuracy degrades with each link.

In a PTP-equipped data center, the precise-time path usually looks like this. Five links.

At the top, the caesium and rubidium atomic standards. NIST in the United States, USNO at the Naval Observatory, with equivalents in other countries, each maintaining its own ensemble. Modern caesium fountains realize the SI second to fractional uncertainties around one part in 10¹⁵ to 10¹⁶: roughly one second of error over tens of millions of years. The ensemble does better than any single clock.

Below them, the satellites. GPS, Galileo, GLONASS, BeiDou. Each carries an atomic clock disciplined against (synced to) the ground ensembles. Each broadcasts a time signal you can recover to within tens of nanoseconds.

Below the satellites, the grandmaster in the rack's timing closet. A box from Microchip, Meinberg, or Oscilloquartz. GNSS antenna on the roof, oven-controlled oscillator inside, PTP packets out. It pushes time downward over the network using IEEE 1588.

The NIC takes it from there. Data-center NICs from Intel, Mellanox, and Broadcom each carry a PHC (PTP Hardware Clock) sitting next to the Ethernet MAC, the silicon block that handles frame transmission on the wire. The PHC disciplines its own oscillator against the grandmaster's PTP packets. Sub-microsecond accuracy on hardware you can buy.

At the bottom, the CPU's TSC (Time Stamp Counter). A 64-bit register that increments at the CPU's invariant frequency. The kernel uses it as a clocksource that clock_gettime() can read; phc2sys can steer the host system clock against the NIC PHC, if you've configured it to.

Five links. Cosmos to CPU. Each one introduces uncertainty. Each one is a vendor.


Inside the rack

NIC PHC
PTP-disciplined · ~100 ns
Top-of-rack switch
PTP boundary clock · ~100 ns
— PTP signal terminates here —
BMC
Set at boot, runs free
PDU
SNMP sysUpTime, no wall clock
QSFP-DD optic
CMIS telemetry; ePPS optional
Cooling (CDU)
BMS-set or free-running
K8s scheduler
Host system clock
GPU (CUDA / NVML)
Local timers, not disciplined
OOB mgmt switch
Separate clock domain
NVMe storage
Embedded controller, SMART timestamps
PSU controllers
Per-PSU embedded clock
UPS controller
Failover timer, free-running
Diagram 2. Rack cross-section. Two planes inside the PTP tree (yellow). Ten examples of planes outside it.

The stack stops where the rack starts. Inside the rack, it gets complicated.

A frontier AI rack built from Spectrum / ConnectX-class components exposes roughly a dozen distinct timebases: physical clocks on silicon, boot-relative counters, and event streams that inherit time from whatever upstream component happens to set theirs. Two classes sit inside the PTP tree as a coherent timing source: the NIC PHCs and the switch boundary clocks. The rest usually don't expose the same rack epoch.

The NIC is inside the tree. The PHC on the NIC is the anchor. PTP packets arrive on the network port, the PHC syncs to them, and two Linux daemons from the open-source linuxptp project keep the host clock aligned: ptp4l runs the PTP wire conversation, phc2sys keeps the host clock in step with the PHC.

The switch is inside the tree. Top-of-rack switches from NVIDIA, Arista, Broadcom, and Cisco implement PTP boundary clocks. They take time from the grandmaster on one port and redistribute on others. The spine repeats one layer up.

That's it. After NIC and switch, almost nothing is inside the tree.

The BMC, the small ARM SoC on every server motherboard that owns sensor telemetry (power, temperature, fan speed), has its own clock. Typically set at boot from the host BIOS, then runs free. It's typically not disciplined against the rack's PTP tree.

Many PDUs at the bottom of the rack expose only an SNMP sysUpTime counter: centiseconds since the device last rebooted, not wall-clock time. Even when a PDU has an internal RTC for its own logs, the wall-clock timestamp on a PDU event in your monitoring stack typically belongs to the manager that polled it, not the PDU itself.

Every QSFP-DD optical module in the back-end fabric is one of the dense pluggable transceivers that carry the high-speed links. Each module runs a microcontroller with CMIS firmware. CMIS (the Common Management Interface Specification) is the sideband control plane every modern optical module exposes to its host. The microcontroller has a clock. Some form-factor specifications include an optional ePPS/Clock path for host-to-module timing alignment, but public module documentation often treats that path as optional or unimplemented rather than a universal rack epoch.

The cooling distribution unit has its own clock. Some are set by a building management system. In the systems I've examined, many run free.

The Kubernetes scheduler uses the host system clock for its event log. Whether that host is inside the PTP tree depends on how the cluster was provisioned.

The GPU has timers. CUDA events measure elapsed GPU work to sub-microsecond resolution; CUPTI can expose device timestamps in nanoseconds since device reset. NVML (the NVIDIA Management Library) and DCGM (Data Center GPU Manager) expose GPU telemetry and events. None are disciplined against the rack's PTP tree by default.

A dozen distinct timebases. Two disciplined. The rest typically aren't.

The rule book has gaps

Several standards govern parts of this. None covers all of it.

IEEE 1588-2019
What it covers
The canonical PTP wire protocol. Boundary-clock and transparent-clock state machines (§9).
Where it stops
Handles the wire beautifully. Stops the moment you leave the NIC.
IEEE 802.1AS-2020
What it covers
The TSN (Time-Sensitive Networking) profile of PTP.
Where it stops
Built for AV bridging and automotive Ethernet. Not designed with the AI rack as a target.
White Rabbit
What it covers
A CERN extension combining SyncE (Synchronous Ethernet, a frequency reference in the line code) with PTP phase to reach sub-nanosecond accuracy.
Where it stops
Mature, open. Rarely deployed in commercial AI today.
OCP-TAP v1r1
What it covers
The 2023 Datacenter PTP reference, written by Microsoft, Meta, and Google under the Open Compute Project's Time Appliances Project.
Where it stops
Voluntary. NVIDIA, Cisco, and Arista implement against it; nothing forces the rest of the rack to come along.
Redfish DateTimeSource
What it covers
A DMTF (Distributed Management Task Force) field on the BMC or manager REST API that records the source of its DateTime value.
Where it stops
Records RTC, firmware, host, NTP, or PTP source. It does not by itself make every management-plane event part of the rack PTP epoch.
CMIS 5.x
What it covers
The Common Management Interface Specification for QSFP-DD/OSFP-class modules. Defines the host-to-module management interface; form-factor specs can include optional ePPS/Clock signals.
Where it stops
Useful management plane, not a guarantee that module telemetry is exposed in the host's PTP epoch.
NIC PHC
1588OCP-TAP
Switch
1588OCP-TAP
BMC
~Redfish
PDU
no standard
Optic
~CMIS
Cooling
no standard
Scheduler
no standard
GPU
no standard
Diagram 3. Standards coverage matrix. Most cells empty. ~ marks optional fields rarely populated.

The standards that exist work. The rest of the rack isn't built out of them. Where the rule book stops, vendors arrive to sell solutions.


Who's selling timing

Physical clock and oscillator IP

SiTime is the loudest. NASDAQ-listed, MEMS oscillators. June 2025 launched TimeFabric™, IEEE-1588 software suite with 24-hour holdover and "up to 9× more accurate" sync than quartz, for "AI data centers, SmartNICs, switches, routers, accelerators." Microchip, Microsemi sit adjacent with longer histories.

Grandmasters

Meinberg (founded 1979), Microchip TimeProvider, Oscilloquartz (now Adva / Adtran). Boxes in a closet: GNSS in, PTP out. Telecom and finance for twenty years. Now repositioning for AI clusters.

Switch and NIC silicon

NVIDIA Spectrum-X, Mellanox ConnectX, Marvell, Broadcom Thor, Intel E810. PTP boundary clocks and PHCs on silicon. NVIDIA's Spectrum PTP technical blog documents the boundary-clock and transparent-clock modes. Marvell holds a granted patent on time-aware link-level telemetry (US 12,375,256).

Pure-software timing

Clockwork.io, founded 2018, raised a $21M Series A around clock synchronization and latency visibility; funding databases list ~$41.5M total. Software-only timing stack.

Standards bodies and integrators

Syncworks operates in the precision-timing integrator space. IETF, ITU-T, IEEE working groups maintain the underlying specifications.

Service-layer

Equinix operates a Precision Time service for terrestrial time-feed distribution to interconnected data centers (US application 20240214101A1).

Diagram 4. Six tiers of vendors, clustered by where in the time stack they sell.

Six tiers, clustered by where in the stack they touch.

Across these tiers, the pitch is some variant of "precise time improves AI infrastructure." What "improves" means varies by vendor and by quarter. The next post asks whether any of those mechanisms actually exist.


Nine clocks, one incident

Two of the rack's timebases agree to within hundreds of nanoseconds. Most of the others don't agree to anything in particular.

A NIC PHC inside the PTP tree is good to roughly a hundred nanoseconds in steady state. The host kernel clock, disciplined via phc2sys, is good to single-digit microseconds. A BMC clock that was set at boot and then never disciplined typically drifts on the order of seconds per day, on commodity RTC oscillators with tens of ppm of error. A PDU's SNMP sysUpTime resolves to one-hundredth-of-a-second ticks since the last reboot, and the wall-clock interpretation of that tick is whatever the polling manager thinks the time is.

Nanoseconds to seconds. Seven to nine orders of magnitude across the planes.

The result is what an operator sees during a cross-plane incident. Here's an illustrative timeline. Nine streams from one rack: the BMC's System Event Log, the cooling controller, the NIC's PTP fabric reports, the Kubernetes scheduler, the switch's syslog, NCCL's own logs, GPU XID reports via DCGM, the optics network management system, the PDU's SNMP traps. None agreeing on what happened first.

NIC fabric
Switch syslog
BMC SEL
PDU SNMP
Optics NMS
Cooling
Scheduler
NCCL log
DCGM
14:32:07.483
14:32:07.612
14:32:07.124
14:32:08.901
14:32:07.890
14:32:07.412
14:32:07.501
14:32:07.621
14:32:07.745
Diagram 5. Same incident. Nine clocks. None agree.

Read them in the order the timestamps suggest, you get one story. Read them in the physical order the events actually occurred, you get a different one. There's no way to tell which is which from the timestamps.

Same incident. Nine clocks. None agree.

What comes next

A five-layer time stack that terminates at NICs and switches. Roughly a dozen distinct timebases in a frontier rack, two classes disciplined and the rest typically not. A standards rule book that covers the disciplined part comprehensively and the rest barely at all. Six tiers of vendors selling into the gap. And the operational consequence: nine streams that don't agree on what happened.

The next post asks the question this one didn't.

The vendors selling into the gap have a thesis: that precise time, pushed further into the rack, changes something material about how AI clusters are built and operated. The next post checks whether that thesis is true.

I read the source.

References

  1. SiTime Corporation. "SiTime Enhances AI Data Center Performance and Utilization with TimeFabric Software Suite." Press release, June 30, 2025. link
  2. SiTime Corporation. TimeFabric™ Software Suite product page. link
  3. NVIDIA. "Calculating and Synchronizing Time with the Precision Timing Protocol on the NVIDIA Spectrum Switch." Developer blog. link (accessed 2026-05-21). Mellanox IEEE 1588 PTP design guide.
  4. IEEE Standards Association. IEEE 1588-2019: Standard for a Precision Clock Synchronization Protocol. 2019.
  5. IEEE Standards Association. IEEE 802.1AS-2020: Timing and Synchronization for Time-Sensitive Applications. 2020.
  6. Open Compute Project, Time Appliances Project. Datacenter PTP Profile v1r1. 2023. link
  7. Open Compute Project, Time Appliances Project. Simple PTP (SPTP) specification. October 2023. link
  8. CERN. White Rabbit project documentation. link
  9. DMTF. Redfish Manager schema — DateTimeSource values. link
  10. IETF. RFC 3418, SNMPv2-MIB — sysUpTime. link
  11. OIF. Common Management Interface Specification (CMIS), Revision 5.x. link
  12. NVIDIA. CUDA event and CUPTI timestamp documentation. CUDA events, CUPTI.
  13. Marvell Technology Group. US Patent 12,375,256 — time-aware link-level telemetry.
  14. Equinix Inc. US patent application 20240214101A1. link
  15. Clockwork Systems Inc. Funding history per PitchBook (accessed 2026-05-21). See also Clockwork's Series A announcement, March 16, 2022.

Post 2 asks: does any of this fragmentation matter for training performance? The answer involves reading some NCCL source code. It's going to be uncomfortable for vendors.