Scenarios are:
1. Host receives ROAM_START from firmware
2. Host receives EAPOL M1 from AP, host forwarded
it to supplicant and supplicant buffered it (with
timer 100 msec) to process it after association
completion.
3. Host starts processing CP stats request, which
holds RTNL kernel lock
4. Host starts a CP_STATS_WAIT_TIME_STAT (800 msec)
timer and sends WMI_REQUEST_STATS_CMDID to FW.
So cp stats are holding the RTNL lock.
5. In waiting state host/FW completed roaming within
a few milliseconds. Host Call roamed indication
to the kernel. Kernel post it to the Work queue
to indicate this to the supplicant. The Work
queue requires the RTNL lock to send the
indication to the supplicant.
6. Now Kernel is waiting on the RTNL lock taken by
the CP stats request which is waiting for the CP
stats response (WMI_UPDATE_STATS_EVENTID).
7. Host receives CP stats response but the host is
unable to handle it with the below reason.
8. Timed out happens for WMI_UPDATE_STATS_EVENTID,
then Kernel takes RTNL lock to indicate
association/roam completion status to the
supplicant.
9. As Kernel is sending association indication
after CP_STATS_WAIT_TIME_STAT (800 msec), by
this time supplicant deleted buffered EAPOL
first frame, this results in DUT failing to
initiate the 4-WAY handshake.
10. Finally AP sends the Deauthentication frame
to DUT.
Reason for unable to process CP stats response :
As per the current design, While processing Roaming,
the host deletes the old peer and creates a new peer
for roamed AP. If the Host receives cp stats response
after peer delete due to roaming, the host is unable
to stop waiting for timer CP_STATS_WAIT_TIME_STAT
and fails to release RTNL kernel lock till timeout.
After time out only, Kernel can take RTNL lock to
indicate association/roam completion status to the
supplicant.
Fix is to stop the wait timer CP_STATS_WAIT_TIME_STAT
and release RTNL kernel lock even in case peer has
deleted by HOST for which CP stats request sent.
Change-Id: Ie5b5275da10a06da50b2fbb8ab206b78f2c64d6a
CRs-Fixed: 3234063