[nuttx] Detect TCP disconnection in the middle of the internet

Discussion:

[nuttx] Detect TCP disconnection in the middle of the internet - SO_KEEPALIVE

Sebastien Lorquet sebastien@lorquet.fr [nuttx]

2018-01-24 14:48:32 UTC

Hello,

We have a nuttx board connected via IP to a management station, It behaves as a
client, connect() ing to the board.

The management station opens a connection with the board(s). The connection
stays open so the board can report real time events.

We want to manage the connection life cycle properly.

If we close the socket on the management station everything is of course ok.

If we disconnect the cable from the board, the PHY detects the link down and
closes the socket. This is also OK.

However, if we disconnect the network link by disconnecting a remote cable
(imagine a chain of switches), then the board does not detect the disconnection
and remains blocked in read() on the opened socket.

We have only allowed one simultaneous connection so we are stuck. Allowing more
connections, and killing the previous thread when accepting a second connection
is not a clean solution, even if it is a possible workaround.

So we are not able to detect a network disconnection properly.

We have found that we need SO_KEEPALIVE to have regular link test packet
exchanged between the server and the board.

Does someone have plans to implement such a feature?

Sebastien.

spudarnia@yahoo.com [nuttx]

2018-01-24 15:11:52 UTC

Permalink

There are no plans that I am aware of to implement SO_KEEPALIVE.

Couldn't you just ping the remote periodically to make sure that it is still responding? Like the logic in apps/system/ping and apps/system/ping6.

Greg

Sebastien Lorquet sebastien@lorquet.fr [nuttx]

2018-01-24 15:20:10 UTC

Permalink

I have none too, since we are very short on time.

But it looks like a fundamental mechanism in tcpip.

ping is impractical since we would have to ping both ways.

Not a problem however, there is a workaround: we will implement a connection
recovery, by allowing a second connection to kill and replace the current one.
The implemented remote protocol allows that.

Sebastien

Post by ***@yahoo.com [nuttx]
Â
There are no plans that I am aware of to implement SO_KEEPALIVE.
Couldn't you just ping the remote periodically to make sure that it is still
responding?Â Like the logic in apps/system/ping and apps/system/ping6.
Greg

Bertold Van den Bergh vandenbergh@bertold.org [nuttx]

2018-01-24 15:28:10 UTC

Permalink

Hello,

You can also implement this keepalive mechanism on the application layer.
Just make the board send a message every n seconds that the server responds
to. This allows both sides to cleanly close the connection when it is lost.
I find this much more reliable and portable than relying on OS features.

Sincerely,
Bertold

Post by Sebastien Lorquet ***@lorquet.fr [nuttx]
I have none too, since we are very short on time.
But it looks like a fundamental mechanism in tcpip.
ping is impractical since we would have to ping both ways.
Not a problem however, there is a workaround: we will implement a
connection recovery, by allowing a second connection to kill and replace
the current one. The implemented remote protocol allows that.
Sebastien
There are no plans that I am aware of to implement SO_KEEPALIVE.
Couldn't you just ping the remote periodically to make sure that it is
still responding? Like the logic in apps/system/ping and apps/system/ping6.
Greg

Sebastien Lorquet sebastien@lorquet.fr [nuttx]

2018-01-24 15:31:21 UTC

Permalink

We can also do that, thanks for the suggestion.

Sebastien

Post by Sebastien Lorquet ***@lorquet.fr [nuttx]
Â
Hello,
You can also implement this keepalive mechanism on the application layer. Just
make the board send a message every n seconds that the server responds to.
This allows both sides to cleanly close the connection when it is lost. I find
this much more reliable and portable than relying on OS features.
Sincerely,
Bertold
Â
I have none too, since we are very short on time.
But it looks like a fundamental mechanism in tcpip.
ping is impractical since we would have to ping both ways.
Not a problem however, there is a workaround: we will implement a
connection recovery, by allowing a second connection to kill and replace
the current one. The implemented remote protocol allows that.
Sebastien

Post by ***@yahoo.com [nuttx]
Â
There are no plans that I am aware of to implement SO_KEEPALIVE.
Couldn't you just ping the remote periodically to make sure that it is
still responding?Â Like the logic in apps/system/ping and apps/system/ping6.
Greg

Sebastien Lorquet sebastien@lorquet.fr [nuttx]

2018-01-24 15:49:22 UTC

Permalink

hello again,

we have decided to do that using SO_RECVTIMEOUT.

After read() returns with EAGAIN, we send a "null byte" (which, thankfully, is
possible with our protocol) and close the connection if the write fails. If the
write suceed, we just go back to read.

Thanks for the suggestion. I still think SO_KEEPALIVE would be a nice to have,
but this workaround is perfectly reasonable. This avoids the need for listening
for a second connection to kick the previous one, which was a security risk,
because it was possible to trigger a denial of service by repeatedly connecting
to the board.

Sebastien

Post by ***@yahoo.com [nuttx]
Â
There are no plans that I am aware of to implement SO_KEEPALIVE.
Couldn't you just ping the remote periodically to make sure that it is
still responding?Â Like the logic in apps/system/ping and apps/system/ping6.
Greg

spudarnia@yahoo.com [nuttx]

2018-01-24 16:23:04 UTC

Permalink

Post by Sebastien Lorquet ***@lorquet.fr [nuttx]
we have decided to do that using SO_RECVTIMEOUT.
After read() returns with EAGAIN, we send a "null byte" (which, thankfully, is
possible with our protocol) and close the connection if the write fails. If the
write suceed, we just go back to read.

Are you sure sending an empty TCP packet will fail? Certainly it would need to wait for an ACK if there were data. But what would make sending an empty packet fail?

Post by Sebastien Lorquet ***@lorquet.fr [nuttx]
Thanks for the suggestion. I still think SO_KEEPALIVE would be a nice to have,
but this workaround is perfectly reasonable. This avoids the need for listening
for a second connection to kick the previous one, which was a security risk,
because it was possible to trigger a denial of service by repeatedly connecting
to the board.

SO_KEEPALIVE would be nice to have. If someone wants to lead that effort, I will certainly give support and help out with review and testing.

Greg

Sebastien Lorquet sebastien@lorquet.fr [nuttx]

2018-01-24 17:18:15 UTC

Permalink

Are you sure sending an empty TCP packet will fail? Certainly it would need to
wait for an ACK if there were data. But what would make sending an empty
packet fail?

It was my naive belief, but after a real test, no it does not, because of
buffering and retries. But still, after two minutes the read fails (!) and the
connection can be closed, which is good for us in our context.

I am not sure that it would be a good idea to mess with TCP timeouts to make
this faster, or wether it's actually possible.

I have no idea if using SO_KEEPALIVE would be faster to detect the link loss. I
guess not, since basically it's just sending empty data packets and expecting
ACKs. But maybe there are shorter timeouts for expecting ACKs for pending
keepalive packets.

Sebastien

SO_KEEPALIVE would be nice to have. If someone wants to lead that effort, I
will certainly give support and help out with review and testing.
Greg

spudarnia@yahoo.com [nuttx]

2018-01-24 18:53:11 UTC

Permalink

Post by Sebastien Lorquet ***@lorquet.fr [nuttx]

Post by Sebastien Lorquet ***@lorquet.fr [nuttx]
After read() returns with EAGAIN, we send a "null byte" (which, thankfully, is
possible with our protocol) and close the connection if the write fails. If the
write suceed, we just go back to read.

Are you sure sending an empty TCP packet will fail? Certainly it would need to
wait for an ACK if there were data. But what would make sending an empty
packet fail?

It was my naive belief, but after a real test, no it does not, because of
buffering and retries. ...

RFC1122 paragraph 4.2.3.6 talks about the keep-alive "probe" packet. It is not just an empty TCP packet but also has the ACK bit set and the sequence number outside of the window. Apparently, that is supposed to cause the TCP state machine to respond with an ACK.

I would a proper understanding of the keep-alive requirements and a lot of time to experiment before I felt comfortable with the concepts.

Greg

Sebastien Lorquet sebastien@lorquet.fr [nuttx]

2018-01-24 15:26:54 UTC

Permalink

Some context:

In fact the tcpip link in our product replaces a different mechanism that was
used previously.

We had a tcpip connection to a Moxa IP-RS485 converter, then a RS485 bus to the
board.

Now we replaced the rs485 link by a direct TCP/IP connection to the board.

But the Moxa module supported keepalive, so we had no problems.

Older Lantronics and Digi IP/Serial modules were also used in some (protocol
compatible) products, and they also support keepalive.

Does not look complex, it's about sending empty packet with ACK flag after a
timer expires, but I have no idea where to start looking for.

http://www.tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/

Sebastien