Monday morning, they reported that their Lync clients were working, however all of their phones were not able to sign in. I immediately realized I had not replaced the certificate on their F5 and did not have the management IP and credentials. The IT person who had this information was on a plane and would be out of pocket for a couple hours. Therefore, in order to avoid an extended outage I attempted to change DNS for the VIP, that I thought their phones (CX600) were using, to point directly to one of the FE’s and skip the F5. After making the change they reported that all of their phones were still down. We could not get logs from the phones, so I thought that this had to still be somehow connected with the F5 but could not be certain due to lack of logs.
I was able to obtain the information for the F5 from a colleague
luckily and began the certificate replacement process however the F5 would NOT
accept the certificate. I reached out to one of our MVP’s Jeff Guillet and we
were still not able to get it to take the certificate. I then escalated to F5
support at which point we attempted to export and import the certificate in a
multitude of different ways. We tried the certificate by itself, no extended
properties, importing via text file, importing it via CLI nothing seemed to
work. When we pulled a packet capture, we saw the client hello, however we did
not see a server hello in response:
Running via CLI: tcpdump -n -i 0.0:nnn -s0 -w
/var/tmp/1-1475239048.pcap host 192.168.2.22 or 192.168.2.47 –vvv
Output:
We than ran an OpenSSL command on the F5 that would dump the
certificate information when an attempt to connect to the VIP was made, this
resulted in no certificate being sent:
[admin@sac-f5-02:Active:Changes Pending] ~ # openssl
s_client -connect 192.168.2.148:443:
At this point, we swapped the old expired certificate back
and verified that we were able to obtain output with a certificate warning
which we could and running the same command showed the old cert and chain:
[admin@sac-f5-02:Active:Changes Pending] ~ # openssl
s_client -connect 192.168.2.148:443
We then attempted a couple other variations of importing and
exporting the certificate. We enabled debug logging on the SSL components, and
then dumped the SSL log to the CLI:
tmsh modify /sys db log.ssl.level value Debug
tailf /var/log/ltm |grep -i 'ssl'
However this resulted in nothing showing up, I verified that
logging was working by hitting another one of the VIP’s and the connection
showed up in the logs. We then attempted to reboot the passive F5, and failover
to that unit once it came back online in an attempt at answering the age old
question “Did you reboot?” however this also did not make a change. We once
again tried a series of imports and exports on the unit just to make sure it
wasn’t a combination of the reboot failover and importing. No luck.
We tried one other command that essentially makes a
connection and then dumps the output of that connection:
curl -iv https://192.168.2.148
At this point, our client had been without phones for a
little more than half the day, we had already escalated at F5 and had three of
their support engineers on the call. They sent out an all support announcement
as we had stumped most of their support staff and engineering also was out of
ideas. Finally someone got back to them and asked “What signature algorithm was
being used?” We immediately pulled the certificate information from the F5:
openssl x509 -in
/config/filestore/files_d/Common_d/certificate_d/\:Common\:Lync2013-Web-int-2015-V4.crt_51169_1
-noout -text
We responded to the individual who asked, who then brought
it to our attention that F5 does not support the RSASSA-PSS algorithm. We were
able to find a posting on F5’s support forums that described a similar output
from another user when suing RSASSA-PSS:
We were wondering why this all of a sudden started occurring.
We had recently migrated their PKI from a single Root/Issuing server to a two
tier PKI however it was supposed to be a 1:1 migration and no
settings/configuration was to be changed outside of making it two tiers. We
decided to check a certificate issued by their old root/issuing CA:
Then looking at all the certificates issued by the new
intermediate/issuing CA:
A quick search on the internet also showed that Adobe, Citrix,
Cisco,
Firefox, and VMWare
all do not support this algorithm and/or have various issues with its use.
Various blog posts and forum entries alluded to that you had to rebuilt your
PKI if this was the case. At this point we thought that we had two options,
purchase a 3rd party certificate for the F5 with just the pool name
or bypass SSL on the F5. After informing the client they elected to go with a
Godaddy certificate. After obtaining, installing and verifying that it worked,
we asked the client to then test a phone. They reported back that the phones
were still not able to sign in…. so we then pulled the DHCP options and lo and
behold they were pointing to the FE pool directly and not the F5. So all of this
work on the F5 while important, was not the root cause of the phone issue. I
immediately thought well if all of these companies and their devices don’t
support RSASS-PSS then maybe the phones don’t either. Sure enough the Polycom
CX line of phones does NOT support it!
EDIT: I have also been informed that the VVX line running at least 5.3.1 also do not support RSASSA-PSS.
EDIT: I have also been informed that the VVX line running at least 5.3.1 also do not support RSASSA-PSS.
We knew at this point we were looking at having to change
their PKI due to the fact that we needed a .local on their FE pool’s internal
cert and could not obtain that from Godaddy. We opened a up a ticket with
Microsoft PSS and while waiting on the call back started looking deeper into
how the PKI was setup. We noticed that their root CA and Intermediate/Issuing
CA both were using the sha1RSA signature algorithm and not the RSASSA-PSS:
We thought this was strange, why was the intermediate CA
issuing certs with a different signature algorithm than what their own certs
were using. We attempted cloning the web server template and selecting
different Cryptography providers, however this also did not work. We did a bit
more research and noticed that one of the common “resolutions” when rebuilding
the PKI was that we needed to disable alternatesignaturealgorithm by setting
its value in the registry to 0. So we decided because the root and intermediate
CA’s were using sha1RSA why not just try disabling that on the intermediate.
Solution:
We made the change using the following command followed by restarting the certificate service:
Solution:
We made the change using the following command followed by restarting the certificate service:
certutil -setreg csp\alternatesignaturealgorithm 0
net stop certsvc && net start certsvc
We then reissued the FE pool cert and lo and behold we
finally had a certificate using an acceptable signature algorithm:
We immediately assigned the Lync FE pool to use this new
cert and were able to confirm with the client that their phones were able to
sign in!
Lesson Learned: Check your signature algorithm when migrating
PKI’s and whatever you use check for compatibility!
A big thanks to Jeff Guillet, Rick Steele, and Scott Winslow
for assisting in this effort!
I had to add 'ca' to the command
ReplyDeletecertutil -setreg ca\csp\alternatesignaturealgorithm 0