Commit Graph

15 Commits

Author SHA1 Message Date
Vladimir Oltean
834f8933d5 net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.

The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.

Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.

The following scenarios were tested:

1. Quantitative testing:

   tc qdisc add dev swp2 clsact
   tc filter add dev swp2 ingress flower skip_sw \
           dst_mac 42:be:24:9b:76:20 \
           action gate index 1 base-time 0 \
           sched-entry OPEN 1200 -1 -1 \
           sched-entry CLOSE 1200 -1 -1 \
           action trap

   ping 192.168.1.2 -f
   PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
   .............................
   --- 192.168.1.2 ping statistics ---
   948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms

2. Qualitative testing (with a phase-aligned schedule - the clocks are
   synchronized by ptp4l, not shown here):

   Receiver (sja1105):

   tc qdisc add dev swp2 clsact
   now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
           sec=$(echo $now | awk -F. '{print $1}') && \
           base_time="$(((sec + 2) * 1000000000))" && \
           echo "base time ${base_time}"
   tc filter add dev swp2 ingress flower skip_sw \
           dst_mac 42:be:24:9b:76:20 \
           action gate base-time ${base_time} \
           sched-entry OPEN  60000 -1 -1 \
           sched-entry CLOSE 40000 -1 -1 \
           action trap

   Sender (enetc):
   now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
           sec=$(echo $now | awk -F. '{print $1}') && \
           base_time="$(((sec + 2) * 1000000000))" && \
           echo "base time ${base_time}"
   tc qdisc add dev eno0 parent root taprio \
           num_tc 8 \
           map 0 1 2 3 4 5 6 7 \
           queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
           base-time ${base_time} \
           sched-entry S 01  50000 \
           sched-entry S 00  50000 \
           flags 2

   ping -A 192.168.1.1
   PING 192.168.1.1 (192.168.1.1): 56 data bytes
   ...
   ^C
   --- 192.168.1.1 ping statistics ---
   1425 packets transmitted, 1424 packets received, 0% packet loss
   round-trip min/avg/max = 0.322/0.361/0.990 ms

   And just for comparison, with the tc-taprio schedule deleted:

   ping -A 192.168.1.1
   PING 192.168.1.1 (192.168.1.1): 56 data bytes
   ...
   ^C
   --- 192.168.1.1 ping statistics ---
   33 packets transmitted, 19 packets received, 42% packet loss
   round-trip min/avg/max = 0.336/0.464/0.597 ms

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07 17:31:57 -07:00
Vladimir Oltean
747e5eb31d net: dsa: sja1105: configure the PTP_CLK pin as EXT_TS or PER_OUT
The SJA1105 switch family has a PTP_CLK pin which emits a signal with
fixed 50% duty cycle, but variable frequency and programmable start time.

On the second generation (P/Q/R/S) switches, this pin supports even more
functionality. The use case described by the hardware documents talks
about synchronization via oneshot pulses: given 2 sja1105 switches,
arbitrarily designated as a master and a slave, the master emits a
single pulse on PTP_CLK, while the slave is configured to timestamp this
pulse received on its PTP_CLK pin (which must obviously be configured as
input). The difference between the timestamps then exactly becomes the
slave offset to the master.

The only trouble with the above is that the hardware is very much tied
into this use case only, and not very generic beyond that:
 - When emitting a oneshot pulse, instead of being told when to emit it,
   the switch just does it "now" and tells you later what time it was,
   via the PTPSYNCTS register. [ Incidentally, this is the same register
   that the slave uses to collect the ext_ts timestamp from, too. ]
 - On the sync slave, there is no interrupt mechanism on reception of a
   new extts, and no FIFO to buffer them, because in the foreseen use
   case, software is in control of both the master and the slave pins,
   so it "knows" when there's something to collect.

These 2 problems mean that:
 - We don't support (at least yet) the quirky oneshot mode exposed by
   the hardware, just normal periodic output.
 - We abuse the hardware a little bit when we expose generic extts.
   Because there's no interrupt mechanism, we need to poll at double the
   frequency we expect to receive a pulse. Currently that means a
   non-configurable "twice a second".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23 22:15:07 -07:00
Vladimir Oltean
4bfa1616d9 net: dsa: sja1105: make future_base_time a common helper
Because the PTP_CLK pin starts toggling only at a time higher than the
current PTP clock, this helper from the time-aware shaper code comes in
handy here as well. We'll use it to transform generic user input for the
perout request into valid input for the sja1105 hardware.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23 22:15:07 -07:00
Vladimir Oltean
1e762bd278 net: dsa: sja1105: Use PTP core's dedicated kernel thread for RX timestamping
And move the queue of skb's waiting for RX timestamps into the ptp_data
structure, since it isn't needed if PTP is not compiled.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30 20:31:40 -08:00
Vladimir Oltean
86db36a347 net: dsa: sja1105: Implement state machine for TAS with PTP clock source
Tested using the following bash script and the tc from iproute2-next:

	#!/bin/bash

	set -e -u -o pipefail

	NSEC_PER_SEC="1000000000"

	gatemask() {
		local tc_list="$1"
		local mask=0

		for tc in ${tc_list}; do
			mask=$((${mask} | (1 << ${tc})))
		done

		printf "%02x" ${mask}
	}

	if ! systemctl is-active --quiet ptp4l; then
		echo "Please start the ptp4l service"
		exit
	fi

	now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }')
	# Phase-align the base time to the start of the next second.
	sec=$(echo "${now}" | gawk -F. '{ print $1; }')
	base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))"

	tc qdisc add dev swp5 parent root handle 100 taprio \
		num_tc 8 \
		map 0 1 2 3 5 6 7 \
		queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
		base-time ${base_time} \
		sched-entry S $(gatemask 7) 100000 \
		sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \
		clockid CLOCK_TAI flags 2

The "state machine" is a workqueue invoked after each manipulation
command on the PTP clock (reset, adjust time, set time, adjust
frequency) which checks over the state of the time-aware scheduler.
So it is not monitored periodically, only in reaction to a PTP command
typically triggered from a userspace daemon (linuxptp). Otherwise there
is no reason for things to go wrong.

Now that the timecounter/cyclecounter has been replaced with hardware
operations on the PTP clock, the TAS Kconfig now depends upon PTP and
the standalone clocksource operating mode has been removed.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 14:50:35 -08:00
Vladimir Oltean
41603d78b3 net: dsa: sja1105: Make the PTP command read-write
The PTPSTRTSCH and PTPSTOPSCH bits are actually readable and indicate
whether the time-aware scheduler is running or not. We will be using
that for monitoring the scheduler in the next patch, so refactor the PTP
command API in order to allow that.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 14:50:35 -08:00
Vladimir Oltean
6cf99c13ea net: dsa: sja1105: Restore PTP time after switch reset
The PTP time of the switch is not preserved when uploading a new static
configuration. Work around this hardware oddity by reading its PTP time
before a static config upload, and restoring it afterwards.

Static config changes are expected to occur at runtime even in scenarios
directly related to PTP, i.e. the Time-Aware Scheduler of the switch is
programmed in this way.

Perhaps the larger implication of this patch is that the PTP .gettimex64
and .settime functions need to be exposed to sja1105_main.c, where the
PTP lock needs to be held during this entire process. So their core
implementation needs to move to some common functions which get exposed
in sja1105_ptp.h.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-11 12:45:30 -08:00
David S. Miller
2f184393e0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Several cases of overlapping changes which were for the most
part trivially resolvable.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-20 10:43:00 -07:00
Vladimir Oltean
2fb079a28a net: dsa: sja1105: Switch to hardware operations for PTP
Adjusting the hardware clock (PTPCLKVAL, PTPCLKADD, PTPCLKRATE) is a
requirement for the auxiliary PTP functionality of the switch
(TTEthernet, PPS input, PPS output).

Therefore we need to switch to using these registers to keep a
synchronized time in hardware, instead of the timecounter/cyclecounter
implementation, which is reliant on the free-running PTPTSCLK.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-18 12:55:30 -04:00
Nishad Kamdar
b790b5549b net: dsa: sja1105: Use the correct style for SPDX License Identifier
This patch corrects the SPDX License Identifier style
in header files related to Distributed Switch Architecture
drivers for NXP SJA1105 series Ethernet switch support.
It uses an expilict block comment for the SPDX License
Identifier.

Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-15 20:16:26 -07:00
Vladimir Oltean
664277781c net: dsa: sja1105: Change the PTP command access pattern
The PTP command register contains enable bits for:
- Putting the 64-bit PTPCLKVAL register in add/subtract or write mode
- Taking timestamps off of the corrected vs free-running clock
- Starting/stopping the TTEthernet scheduling
- Starting/stopping PPS output
- Resetting the switch

When a command needs to be issued (e.g. "change the PTPCLKVAL from write
mode to add/subtract mode"), one cannot simply write to the command
register setting the PTPCLKADD bit to 1, because that would zeroize the
other settings. One also cannot do a read-modify-write (that would be
too easy for this hardware) because not all bits of the command register
are readable over SPI.

So this leaves us with the only option of keeping the value of the PTP
command register in the driver, and operating on that.

Actually there are 2 types of PTP operations now:
- Operations that modify the cached PTP command. These operate on
  ptp_data->cmd as a pointer.
- Operations that apply all previously cached PTP settings, but don't
  otherwise cache what they did themselves. The sja1105_ptp_reset
  function is such an example. It copies the ptp_data->cmd on stack
  before modifying and writing it to SPI.

This practically means that struct sja1105_ptp_cmd is no longer an
implementation detail, since it needs to be stored in full into struct
sja1105_ptp_data, and hence in struct sja1105_private. So the (*ptp_cmd)
function prototype can change and take struct sja1105_ptp_cmd as second
argument now.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14 16:45:40 -07:00
Vladimir Oltean
a9d6ed7a8b net: dsa: sja1105: Move PTP data to its own private structure
This is a non-functional change with 2 goals (both for the case when
CONFIG_NET_DSA_SJA1105_PTP is not enabled):

- Reduce the size of the sja1105_private structure.
- Make the PTP code more self-contained.

Leaving priv->ptp_data.lock to be initialized in sja1105_main.c is not a
leftover: it will be used in a future patch "net: dsa: sja1105: Restore
PTP time after switch reset".

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14 16:45:40 -07:00
Vladimir Oltean
61c7712627 net: dsa: sja1105: Make all public PTP functions take dsa_switch as argument
The new rule (as already started for sja1105_tas.h) is for functions of
optional driver components (ones which may be disabled via Kconfig - PTP
and TAS) to take struct dsa_switch *ds instead of struct sja1105_private
*priv as first argument.

This is so that forward-declarations of struct sja1105_private can be
avoided.

So make sja1105_ptp.h the second user of this rule.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14 16:45:40 -07:00
Vladimir Oltean
47ed985e97 net: dsa: sja1105: Add logic for TX timestamping
On TX, timestamping is performed synchronously from the
port_deferred_xmit worker thread.
In management routes, the switch is requested to take egress timestamps
(again partial), which are reconstructed and appended to a clone of the
skb that was just sent.  The cloning is done by DSA and we retrieve the
pointer from the structure that DSA keeps in skb->cb.
Then these clones are enqueued to the socket's error queue for
application-level processing.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-08 15:20:40 -07:00
Vladimir Oltean
bb77f36ac2 net: dsa: sja1105: Add support for the PTP clock
The design of this PHC driver is influenced by the switch's behavior
w.r.t. timestamping.  It exposes two PTP counters, one free-running
(PTPTSCLK) and the other offset- and frequency-corrected in hardware
through PTPCLKVAL, PTPCLKADD and PTPCLKRATE.  The MACs can sample either
of these for frame timestamps.

However, the user manual warns that taking timestamps based on the
corrected clock is less than useful, as the switch can deliver corrupted
timestamps in a variety of circumstances.

Therefore, this PHC uses the free-running PTPTSCLK together with a
timecounter/cyclecounter structure that translates it into a software
time domain.  Thus, the settime/adjtime and adjfine callbacks are
hardware no-ops.

The timestamps (introduced in a further patch) will also be translated
to the correct time domain before being handed over to the userspace PTP
stack.

The introduction of a second set of PHC operations that operate on the
hardware PTPCLKVAL/PTPCLKADD/PTPCLKRATE in the future is somewhat
unavoidable, as the TTEthernet core uses the corrected PTP time domain.
However, the free-running counter + timecounter structure combination
will suffice for now, as the resulting timestamps yield a sub-50 ns
synchronization offset in steady state using linuxptp.

For this patch, in absence of frame timestamping, the operations of the
switch PHC were tested by syncing it to the system time as a local slave
clock with:

phc2sys -s CLOCK_REALTIME -c swp2 -O 0 -m -S 0.01

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-08 15:20:40 -07:00