State of timing in AI infrastructure

A growing vendor stack is selling timing as the missing piece of AI infrastructure.

SiTime calls it the foundation of the AI data center. Clockwork.io raised a $21M Series A around clock sync and latency visibility; funding databases list about $41.5M total. Syncworks positions itself as a precision-timing integrator.

Before we ask whether they're right (that's the next post), let me show you what they're actually selling, and what the rack actually has. We'll come back to the claims in the posts that follow.

Grandmaster (PTP source)

GNSS antenna · OCXO · Microchip / Meinberg

↓ IEEE 1588 / PTP

Inside the PTP Tree

Switch

Boundary clock · ~100 ns

↓

NIC PHC

PTP hardware clock · ~100 ns

↓

Host kernel

phc2sys · single µs

— PTP TREE TERMINATES —

Outside the Tree · Each plane its own clock

BMC

Boot RTC

NTP optional

PDU

SNMP sysUpTime

Manager wall-clock

Optic (CMIS)

CMIS telemetry

ePPS optional

GPU

CUDA / CUPTI

NVML / DCGM

+ CDU cooling · K8s scheduler · OOB mgmt · NVMe · PSU · UPS

↓

Nine streams. No causal order from timestamps alone.

Overview. Rack timing architecture, top-down. PTP discipline reaches NICs and switches. The rest of the rack runs its own clocks.

Where time begins

Atomic standard

10⁻¹⁵ to 10⁻¹⁶

GNSS satellite

tens of ns

Grandmaster (PTP)

~100 ns

NIC PHC

sub-µs

CPU TSC

µs–ms

Diagram 1. The time stack. Five layers. Cosmos to CPU. Accuracy degrades with each link.

In a PTP-equipped data center, the precise-time path usually looks like this. Five links.

At the top, the caesium and rubidium atomic standards. NIST in the United States, USNO at the Naval Observatory, with equivalents in other countries, each maintaining its own ensemble. Modern caesium fountains realize the SI second to fractional uncertainties around one part in 10¹⁵ to 10¹⁶: roughly one second of error over tens of millions of years. The ensemble does better than any single clock.

Below them, the satellites. GPS, Galileo, GLONASS, BeiDou. Each carries an atomic clock disciplined against (synced to) the ground ensembles. Each broadcasts a time signal you can recover to within tens of nanoseconds.

Below the satellites, the grandmaster in the rack's timing closet. A box from Microchip, Meinberg, or Oscilloquartz. GNSS antenna on the roof, oven-controlled oscillator inside, PTP packets out. It pushes time downward over the network using IEEE 1588.

The NIC takes it from there. Data-center NICs from Intel, Mellanox, and Broadcom each carry a PHC (PTP Hardware Clock) sitting next to the Ethernet MAC, the silicon block that handles frame transmission on the wire. The PHC disciplines its own oscillator against the grandmaster's PTP packets. Sub-microsecond accuracy on hardware you can buy.

At the bottom, the CPU's TSC (Time Stamp Counter). A 64-bit register that increments at the CPU's invariant frequency. The kernel uses it as a clocksource that clock_gettime() can read; phc2sys can steer the host system clock against the NIC PHC, if you've configured it to.

Five links. Cosmos to CPU. Each one introduces uncertainty. Each one is a vendor.

Inside the rack

NIC PHC

PTP-disciplined · ~100 ns

Top-of-rack switch

PTP boundary clock · ~100 ns

— PTP signal terminates here —

BMC

Set at boot, runs free

PDU

SNMP sysUpTime, no wall clock

QSFP-DD optic

CMIS telemetry; ePPS optional

Cooling (CDU)

BMS-set or free-running

K8s scheduler

Host system clock

GPU (CUDA / NVML)

Local timers, not disciplined

OOB mgmt switch

Separate clock domain

NVMe storage

Embedded controller, SMART timestamps

PSU controllers

Per-PSU embedded clock

UPS controller

Failover timer, free-running

Diagram 2. Rack cross-section. Two planes inside the PTP tree (yellow). Ten examples of planes outside it.

The stack stops where the rack starts. Inside the rack, it gets complicated.

A frontier AI rack built from Spectrum / ConnectX-class components exposes roughly a dozen distinct timebases: physical clocks on silicon, boot-relative counters, and event streams that inherit time from whatever upstream component happens to set theirs. Two classes sit inside the PTP tree as a coherent timing source: the NIC PHCs and the switch boundary clocks. The rest usually don't expose the same rack epoch.

The NIC is inside the tree. The PHC on the NIC is the anchor. PTP packets arrive on the network port, the PHC syncs to them, and two Linux daemons from the open-source linuxptp project keep the host clock aligned: ptp4l runs the PTP wire conversation, phc2sys keeps the host clock in step with the PHC.

The switch is inside the tree. Top-of-rack switches from NVIDIA, Arista, Broadcom, and Cisco implement PTP boundary clocks. They take time from the grandmaster on one port and redistribute on others. The spine repeats one layer up.

That's it. After NIC and switch, almost nothing is inside the tree.

The BMC, the small ARM SoC on every server motherboard that owns sensor telemetry (power, temperature, fan speed), has its own clock. Typically set at boot from the host BIOS, then runs free. It's typically not disciplined against the rack's PTP tree.

Many PDUs at the bottom of the rack expose only an SNMP sysUpTime counter: centiseconds since the device last rebooted, not wall-clock time. Even when a PDU has an internal RTC for its own logs, the wall-clock timestamp on a PDU event in your monitoring stack typically belongs to the manager that polled it, not the PDU itself.

Every QSFP-DD optical module in the back-end fabric is one of the dense pluggable transceivers that carry the high-speed links. Each module runs a microcontroller with CMIS firmware. CMIS (the Common Management Interface Specification) is the sideband control plane every modern optical module exposes to its host. The microcontroller has a clock. Some form-factor specifications include an optional ePPS/Clock path for host-to-module timing alignment, but public module documentation often treats that path as optional or unimplemented rather than a universal rack epoch.

The cooling distribution unit has its own clock. Some are set by a building management system. In the systems I've examined, many run free.

The Kubernetes scheduler uses the host system clock for its event log. Whether that host is inside the PTP tree depends on how the cluster was provisioned.

The GPU has timers. CUDA events measure elapsed GPU work to sub-microsecond resolution; CUPTI can expose device timestamps in nanoseconds since device reset. NVML (the NVIDIA Management Library) and DCGM (Data Center GPU Manager) expose GPU telemetry and events. None are disciplined against the rack's PTP tree by default.

A dozen distinct timebases. Two disciplined. The rest typically aren't.

The rule book has gaps

Several standards govern parts of this. None covers all of it.

Standard	What it covers	Where it stops
IEEE 1588-2019	The canonical PTP wire protocol. Boundary-clock and transparent-clock state machines (§9).	Handles the wire beautifully. Stops the moment you leave the NIC.
IEEE 802.1AS-2020	The TSN (Time-Sensitive Networking) profile of PTP.	Built for AV bridging and automotive Ethernet. Not designed with the AI rack as a target.
White Rabbit	A CERN extension combining SyncE (Synchronous Ethernet, a frequency reference in the line code) with PTP phase to reach sub-nanosecond accuracy.	Mature, open. Rarely deployed in commercial AI today.
OCP-TAP v1r1	The 2023 Datacenter PTP reference, written by Microsoft, Meta, and Google under the Open Compute Project's Time Appliances Project.	Voluntary. NVIDIA, Cisco, and Arista implement against it; nothing forces the rest of the rack to come along.
Redfish DateTimeSource	A DMTF (Distributed Management Task Force) field on the BMC or manager REST API that records the source of its DateTime value.	Records RTC, firmware, host, NTP, or PTP source. It does not by itself make every management-plane event part of the rack PTP epoch.
CMIS 5.x	The Common Management Interface Specification for QSFP-DD/OSFP-class modules. Defines the host-to-module management interface; form-factor specs can include optional ePPS/Clock signals.	Useful management plane, not a guarantee that module telemetry is exposed in the host's PTP epoch.

IEEE 1588-2019

What it covers

The canonical PTP wire protocol. Boundary-clock and transparent-clock state machines (§9).

Where it stops

Handles the wire beautifully. Stops the moment you leave the NIC.

IEEE 802.1AS-2020

What it covers

The TSN (Time-Sensitive Networking) profile of PTP.

Where it stops

Built for AV bridging and automotive Ethernet. Not designed with the AI rack as a target.

White Rabbit

What it covers

A CERN extension combining SyncE (Synchronous Ethernet, a frequency reference in the line code) with PTP phase to reach sub-nanosecond accuracy.

Where it stops

Mature, open. Rarely deployed in commercial AI today.

OCP-TAP v1r1

What it covers

The 2023 Datacenter PTP reference, written by Microsoft, Meta, and Google under the Open Compute Project's Time Appliances Project.

Where it stops

Voluntary. NVIDIA, Cisco, and Arista implement against it; nothing forces the rest of the rack to come along.

Redfish DateTimeSource

What it covers

A DMTF (Distributed Management Task Force) field on the BMC or manager REST API that records the source of its DateTime value.

Where it stops

Records RTC, firmware, host, NTP, or PTP source. It does not by itself make every management-plane event part of the rack PTP epoch.

CMIS 5.x

What it covers

The Common Management Interface Specification for QSFP-DD/OSFP-class modules. Defines the host-to-module management interface; form-factor specs can include optional ePPS/Clock signals.

Where it stops

Useful management plane, not a guarantee that module telemetry is exposed in the host's PTP epoch.

	1588	802.1AS	WR	OCP-TAP	Redfish	CMIS
NIC PHC	✓	·	·	✓	·	·
Switch	✓	·	·	✓	·	·
BMC	·	·	·	·	~	·
PDU	·	·	·	·	·	·
Optic	·	·	·	·	·	~
Cooling	·	·	·	·	·	·
Scheduler	·	·	·	·	·	·
GPU	·	·	·	·	·	·

NIC PHC

✓1588✓OCP-TAP

Switch

✓1588✓OCP-TAP

BMC

~Redfish

PDU

no standard

Optic

~CMIS

Cooling

no standard

Scheduler

no standard

GPU

no standard

Diagram 3. Standards coverage matrix. Most cells empty. ~ marks optional fields rarely populated.

The standards that exist work. The rest of the rack isn't built out of them. Where the rule book stops, vendors arrive to sell solutions.

Who's selling timing

Physical clock and oscillator IP

SiTime is the loudest. NASDAQ-listed, MEMS oscillators. June 2025 launched TimeFabric™, IEEE-1588 software suite with 24-hour holdover and "up to 9× more accurate" sync than quartz, for "AI data centers, SmartNICs, switches, routers, accelerators." Microchip, Microsemi sit adjacent with longer histories.

Grandmasters

Meinberg (founded 1979), Microchip TimeProvider, Oscilloquartz (now Adva / Adtran). Boxes in a closet: GNSS in, PTP out. Telecom and finance for twenty years. Now repositioning for AI clusters.

Switch and NIC silicon

NVIDIA Spectrum-X, Mellanox ConnectX, Marvell, Broadcom Thor, Intel E810. PTP boundary clocks and PHCs on silicon. NVIDIA's Spectrum PTP technical blog documents the boundary-clock and transparent-clock modes. Marvell holds a granted patent on time-aware link-level telemetry (US 12,375,256).

Pure-software timing

Clockwork.io, founded 2018, raised a $21M Series A around clock synchronization and latency visibility; funding databases list ~$41.5M total. Software-only timing stack.

Standards bodies and integrators

Syncworks operates in the precision-timing integrator space. IETF, ITU-T, IEEE working groups maintain the underlying specifications.

Service-layer

Equinix operates a Precision Time service for terrestrial time-feed distribution to interconnected data centers (US application 20240214101A1).

Diagram 4. Six tiers of vendors, clustered by where in the time stack they sell.

Six tiers, clustered by where in the stack they touch.

Across these tiers, the pitch is some variant of "precise time improves AI infrastructure." What "improves" means varies by vendor and by quarter. The next post asks whether any of those mechanisms actually exist.

Nine clocks, one incident

Two of the rack's timebases agree to within hundreds of nanoseconds. Most of the others don't agree to anything in particular.

A NIC PHC inside the PTP tree is good to roughly a hundred nanoseconds in steady state. The host kernel clock, disciplined via phc2sys, is good to single-digit microseconds. A BMC clock that was set at boot and then never disciplined typically drifts on the order of seconds per day, on commodity RTC oscillators with tens of ppm of error. A PDU's SNMP sysUpTime resolves to one-hundredth-of-a-second ticks since the last reboot, and the wall-clock interpretation of that tick is whatever the polling manager thinks the time is.

Nanoseconds to seconds. Seven to nine orders of magnitude across the planes.

The result is what an operator sees during a cross-plane incident. Here's an illustrative timeline. Nine streams from one rack: the BMC's System Event Log, the cooling controller, the NIC's PTP fabric reports, the Kubernetes scheduler, the switch's syslog, NCCL's own logs, GPU XID reports via DCGM, the optics network management system, the PDU's SNMP traps. None agreeing on what happened first.

NIC fabric

Switch syslog

BMC SEL

PDU SNMP

Optics NMS

Cooling

Scheduler

NCCL log

DCGM

14:32:07.483

14:32:07.612

14:32:07.124

14:32:08.901

14:32:07.890

14:32:07.412

14:32:07.501

14:32:07.621

14:32:07.745

Diagram 5. Same incident. Nine clocks. None agree.

Read them in the order the timestamps suggest, you get one story. Read them in the physical order the events actually occurred, you get a different one. There's no way to tell which is which from the timestamps.

Same incident. Nine clocks. None agree.

What comes next

A five-layer time stack that terminates at NICs and switches. Roughly a dozen distinct timebases in a frontier rack, two classes disciplined and the rest typically not. A standards rule book that covers the disciplined part comprehensively and the rest barely at all. Six tiers of vendors selling into the gap. And the operational consequence: nine streams that don't agree on what happened.

The next post asks the question this one didn't.

The vendors selling into the gap have a thesis: that precise time, pushed further into the rack, changes something material about how AI clusters are built and operated. The next post checks whether that thesis is true.

I read the source.

References

SiTime Corporation. "SiTime Enhances AI Data Center Performance and Utilization with TimeFabric Software Suite." Press release, June 30, 2025. link
SiTime Corporation. TimeFabric™ Software Suite product page. link
NVIDIA. "Calculating and Synchronizing Time with the Precision Timing Protocol on the NVIDIA Spectrum Switch." Developer blog. link (accessed 2026-05-21). Mellanox IEEE 1588 PTP design guide.
IEEE Standards Association. IEEE 1588-2019: Standard for a Precision Clock Synchronization Protocol. 2019.
IEEE Standards Association. IEEE 802.1AS-2020: Timing and Synchronization for Time-Sensitive Applications. 2020.
Open Compute Project, Time Appliances Project. Datacenter PTP Profile v1r1. 2023. link
Open Compute Project, Time Appliances Project. Simple PTP (SPTP) specification. October 2023. link
CERN. White Rabbit project documentation. link
DMTF. Redfish Manager schema — DateTimeSource values. link
IETF. RFC 3418, SNMPv2-MIB — sysUpTime. link
OIF. Common Management Interface Specification (CMIS), Revision 5.x. link
NVIDIA. CUDA event and CUPTI timestamp documentation. CUDA events, CUPTI.
Marvell Technology Group. US Patent 12,375,256 — time-aware link-level telemetry.
Equinix Inc. US patent application 20240214101A1. link
Clockwork Systems Inc. Funding history per PitchBook (accessed 2026-05-21). See also Clockwork's Series A announcement, March 16, 2022.

Post 2 asks: does any of this fragmentation matter for training performance? The answer involves reading some NCCL source code. It's going to be uncomfortable for vendors.

State of timing inAI infrastructure.

Where time begins

Inside the rack

The rule book has gaps

Who's selling timing

Physical clock and oscillator IP

Grandmasters

Switch and NIC silicon

Pure-software timing

Standards bodies and integrators

Service-layer

Nine clocks, one incident

What comes next

References

State of timing in
AI infrastructure.