Discussion:
[nuttx] [PATCH] syslog: prevent syslog messages from parallel processes getting interleaved [1 Attachment]
Jussi Kivilinna jussi.kivilinna@haltian.com [nuttx]
2017-05-10 15:34:34 UTC
Permalink
Hello,

There's patch to prevent syslog messages from parallel running processes getting interleaved.

Patch adds semaphore to control syslog writing so that mangled messages such as these do not occur:

3377..118800000000]]gcaotnemwaany__mmaeimn_:d bpgo:l l ( ) t i m e d o u t
t[o t a l 3 7 . 1 8 0 0u0s0e]de v e n t _ s tfarteee_ c h a nlgaer:g eosltd
[s t a t e3:7 .01 8=0>0 0n0e]wg astteawtaey:_ m1e1m
_[d b g : 3M7e.m1:9 0 0 0 0 ] e v e3n1t2_3s3t6a t e _ c h a7n3g4e5:6 o l d s2t3a8t8e8:0 1 1 = >2 2n8e1w2 8s
t[a t e : 307

Approx. same messages after patch:
930000]gateway_mem_dbg: total used free largest
[ 38.930000]conman_main: poll() timed out
[ 38.930000]gateway_mem_dbg: Mem: 312304 73456 238848 228096
[ 38.930000]event_state_change: old state: 0 => new state: 11

-Jussi
spudarnia@yahoo.com [nuttx]
2017-05-10 17:28:19 UTC
Permalink
I have declined to implement that change several times in the past and I will also decline to do so now. I forget all of the arguments but basically it completely screws up anything close to realtime behavior.
spudarnia@yahoo.com [nuttx]
2017-05-10 17:30:34 UTC
Permalink
The root cause of the problem is that you have multiple tasks running at the same priority. So they swap in and out on each character. In a realtime system, there should never be two tasks with the same priority. If you adjust your priorities, the problem with go away without screwing up realtime behavior.
spudarnia@yahoo.com [nuttx]
2017-05-10 17:44:32 UTC
Permalink
Here is a more detailed description of how this interleaved output occurs:

1. Two tasks A and B priority attempt to output data to the the syslog at the same time.
2. Each call lib_vsprintf() which generates output one character at a time.

When used with printf, the single character output goes to a buffer and the buffer is not flushed until the a carriage return is full, then the entire buffer in flush atomically. That is why you don't see this with printf().

3. If Task A gets the serial driver semaphore, it will write its single character, then release the semaphore which is taken by Task B.
4. Then Task A tries to take the serial driver semaphore but has to wait for Task B to output its one character.

In that way, the output is interleaved. And, I was mistaken earliear, priority may have some effect but it is not the primary issue here. Sorry, sometimes I speak too quickly without thinking things through.

Greg
spudarnia@yahoo.com [nuttx]
2017-05-10 17:45:34 UTC
Permalink
And I don't want to have any task holding a semaphore that blocks forward progress of the system for the entire duration of the operation of the syslog output. That is the thing that kills real timer performance.


A buffering solution would not have this problem (basically buffering output data as for C standard I/O). But is not simple to implement.
spudarnia@yahoo.com [nuttx]
2017-05-10 17:52:35 UTC
Permalink
A solution that I do like, but have not through completely, would be to put an I/O buffer in the group structure so that each task would buffer syslog I/O just as with C buffered I/O. That it would flush the buffer in one atomic write with no nasty semaphores that would be held for perhaps 100s of milliseconds. That would be a realtime friendly solution.

I might add this to my list of things to do.

Greg
spudarnia@yahoo.com [nuttx]
2017-05-10 20:45:00 UTC
Permalink
I have a prototype of a version of syslog that does buffered I/O. It is push on the syslog branch.


I have a list of other things that need to be done before I begin testing (which will will have to wait... the housekeeping is kicking me out of my office for today). But this might give you some idea.


That involves tiny changes to many files so it is pretty high in complexity.
spudarnia@yahoo.com [nuttx]
2017-05-11 00:05:17 UTC
Permalink
Everything seems to check out so I merged the syslog branch back to master. If anyone sees any log output problems, please let me know.


The solutions is really a good efficiency improvement. Which this change, each character if buffered in memory until the full output line is ready: The driver is NOT invoked on a character-by-character buffer and there are no semaphores involved at all.


Once the full syslog output has been buffered, it is written to the driver in one atomic write. So I would not expect to see any interleaved output.


For buffering, it uses the common I/O buffer (IOB) infrastructure that I recently broke out from the network. This is a fast allocation since it keeps pre-allocated IO buffers in a free list. And adds very little additional overhead if you are already using IOBs for some other purpose like networking.


Greg
Jussi Kivilinna jussi.kivilinna@haltian.com [nuttx]
2017-05-11 07:47:26 UTC
Permalink
Thanks! I'll give it a go and report any problems if any.

-Jussi
Post by ***@yahoo.com [nuttx]
Everything seems to check out so I merged the syslog branch back to master. If anyone sees any log output problems, please let me know.
The solutions is really a good efficiency improvement. Which this change, each character if buffered in memory until the full output line is ready: The driver is NOT invoked on a character-by-character buffer and there are no semaphores involved at all.
Once the full syslog output has been buffered, it is written to the driver in one atomic write. So I would not expect to see any interleaved output.
For buffering, it uses the common I/O buffer (IOB) infrastructure that I recently broke out from the network. This is a fast allocation since it keeps pre-allocated IO buffers in a free list. And adds very little additional overhead if you are already using IOBs for some other purpose like networking.
Greg
Jussi Kivilinna jussi.kivilinna@haltian.com [nuttx]
2017-05-11 12:19:53 UTC
Permalink
Hello,

Here's patch to fix building with CONFIG_SYSLOG_TIMESTAMP=y.

Another problem I noticed happens when syslog message has addition characters after last new-line. With buffering those now get lost as vsyslog does not flush output after lib_sprintf. Additional trailing characters could be ANSI escape sequence to reset state that message setups. For example, macro here uses colors and resets state after actual message (including '\n'):

#define color_dbg(fc,bc,x, ...) dbg("%c[38;5;%d;48;5;%dm" x "%c[0m", ESC, fc, bc, ##__VA_ARGS__, ESC)

With flushing added to vsyslog, then there is problem that next syslog line might come from other task before reset sequence, causing wrong line getting color. This could be avoided by not flushing on '\n' but only if IOB is full and/or at end of vsyslog. Would this make sense? Change would be as follows:


diff --git a/drivers/syslog/syslog_stream.c b/drivers/syslog/syslog_stream.c
index 04bb072..ac7e144 100644
--- a/drivers/syslog/syslog_stream.c
+++ b/drivers/syslog/syslog_stream.c
@@ -82,9 +82,9 @@ static void syslogstream_putc(FAR struct lib_outstream_s *this, int ch)
iob->io_len++;
this->nput++;

- /* Is the buffer full? Did we encounter a new line? */
+ /* Is the buffer full? */

- if (iob->io_len >= CONFIG_IOB_BUFSIZE || ch == '\n')
+ if (iob->io_len >= CONFIG_IOB_BUFSIZE)
{
/* Yes.. then flush the buffer */

diff --git a/drivers/syslog/vsyslog.c b/drivers/syslog/vsyslog.c
index b87d9b9..0b993b7 100644
--- a/drivers/syslog/vsyslog.c
+++ b/drivers/syslog/vsyslog.c
@@ -131,6 +131,10 @@ int _vsyslog(int priority, FAR const IPTR char *fmt, FAR va_list *ap)

ret = lib_vsprintf(&stream.public, fmt, *ap);

+ /* Flush the output */
+
+ stream.public.flush(&stream.public);
+
#ifdef CONFIG_SYSLOG_BUFFER
/* Destroy the syslog stream buffer */


-Jussi
Post by Jussi Kivilinna ***@haltian.com [nuttx]
Thanks! I'll give it a go and report any problems if any.
-Jussi
Post by ***@yahoo.com [nuttx]
Everything seems to check out so I merged the syslog branch back to master. If anyone sees any log output problems, please let me know.
The solutions is really a good efficiency improvement. Which this change, each character if buffered in memory until the full output line is ready: The driver is NOT invoked on a character-by-character buffer and there are no semaphores involved at all.
Once the full syslog output has been buffered, it is written to the driver in one atomic write. So I would not expect to see any interleaved output.
For buffering, it uses the common I/O buffer (IOB) infrastructure that I recently broke out from the network. This is a fast allocation since it keeps pre-allocated IO buffers in a free list. And adds very little additional overhead if you are already using IOBs for some other purpose like networking.
Greg
spudarnia@yahoo.com [nuttx]
2017-05-11 13:28:51 UTC
Permalink
Thanks I have included both your attached patch and your recommended inline change.

I was actually under the impression that lib_vsprintf() would flush the stream before it returns, but apparently it does not. lib_vsprintf() might be a more general place to put the flush although I think that only the syslog stream suffers from this.

Removing the check few newline is correct since it is redundanct. lib_vsprintf() already does:

1188 /* Output the character */
1189
1190 obj->put(obj, FMT_CHAR);
1191
1192 /* Flush the buffer if a newline is encountered */
1193
1194 if (FMT_CHAR == '\n')
1195 {
1196 /* Should return an error on a failure to flush */
1197
1198 (void)obj->flush(obj);
1199 }

So the flush was occurring twice anyway.

Greg
spudarnia@yahoo.com [nuttx]
2017-05-11 14:19:57 UTC
Permalink
I was actually under the impression that lib_vsprintf() would flush the stream before it returns, but apparently it does not. lib_vsprintf() might be a more general place to put the flush although I think that only the syslog stream suffers from this.
I looked into removing the flush from vsyslog() and moving it to lib_vsprintf(). But that would break time stamping the current usage: vsyslog() calls lib_sprintf() in that case and that is just a wrapper for lib_vsprintf(). So if lib_vsprintf() did the flush, it would flush between the time stamp and the line.
1188 /* Output the character */
1189
1190 obj->put(obj, FMT_CHAR);
1191
1192 /* Flush the buffer if a newline is encountered */
1193
1194 if (FMT_CHAR == '\n')
1195 {
1196 /* Should return an error on a failure to flush */
1197
1198 (void)obj->flush(obj);
1199 }
So the flush was occurring twice anyway.
But that is a problem in your color debug scenario. We cannot remove this flush since the system depends on it; it is the only flush in most cases.

Obviously more thought is necessary

Greg
Jussi Kivilinna jussi.kivilinna@haltian.com [nuttx]
2017-05-12 07:47:53 UTC
Permalink
I was actually under the impression that lib_vsprintf() would flush the stream before it returns, but apparently it does not. lib_vsprintf() might be a more general place to put the flush although I think that only the syslog stream suffers from this.
I looked into removing the flush from vsyslog() and moving it to lib_vsprintf(). But that would break time stamping the current usage: vsyslog() calls lib_sprintf() in that case and that is just a wrapper for lib_vsprintf(). So if lib_vsprintf() did the flush, it would flush between the time stamp and the line.
1188 /* Output the character */
1189
1190 obj->put(obj, FMT_CHAR);
1191
1192 /* Flush the buffer if a newline is encountered */
1193
1194 if (FMT_CHAR == '\n')
1195 {
1196 /* Should return an error on a failure to flush */
1197
1198 (void)obj->flush(obj);
1199 }
So the flush was occurring twice anyway.
But that is a problem in your color debug scenario. We cannot remove this flush since the system depends on it; it is the only flush in most cases.
Obviously more thought is necessary
Maybe this could be solved at flush function level. Introduce flush function that detects and ignores the flush on new-line case, but handles IOB full flush. This flush function would be passed to lib_vsprintf and then have full flush at end of vsyslog to empty IOB.

-Jussi
spudarnia@yahoo.com [nuttx]
2017-05-12 13:39:45 UTC
Permalink
Maybe this could be solved at flush function level. Introduce a flush function that detects and ignores the flush on new-line case, but handles IOB full flush. This flush function would be passed to lib_vsprintf and then have full flush at end of vsyslog to empty IOB.
Like this? commit 0fc068cc9c5ab6832478561804b667f477a31cba

I did it a little differently. I broke out the flush logic so that it is only used internally. All flushes by the lib_vsprinf() will be ignored but the internal flush will be used by syslog_putc() when the buffer is full and also by syslogstream_destroy() before the IOB is deallocated.


There is yet another place where the output can get split. That is in syslog_dev_write(): It will break up the stream to insert a CR before the LF. I think that can be avoid be generating the CR-LF sequence in the buffer and then detecting and ignoring valid CR-LF sequences, rather than expecting syslog_dev_write() to insert the CR into the buffer.

Like this: commit 1c9859520fc3b6d6bf5ad425a546a93d94695ebb

I don't like the fact that syslog_dev_write() still scans the entire output buffer to expand CR-LF sequence. This seems really wasteful, especially in this case where we can be sure that the is no CR or LF without a matching LF or CR. But I think, the existing behavior in syslog_dev_write() must be retained because it is needed in other contexts.

Greg
Jussi Kivilinna jussi.kivilinna@haltian.com [nuttx]
2017-05-12 14:58:52 UTC
Permalink
Post by ***@yahoo.com [nuttx]
Maybe this could be solved at flush function level. Introduce a flush function that detects and ignores the flush on new-line case, but handles IOB full flush. This flush function would be passed to lib_vsprintf and then have full flush at end of vsyslog to empty IOB.
Like this? commit 0fc068cc9c5ab6832478561804b667f477a31cba
I did it a little differently. I broke out the flush logic so that it is only used internally. All flushes by the lib_vsprinf() will be ignored but the internal flush will be used by syslog_putc() when the buffer is full and also by syslogstream_destroy() before the IOB is deallocated.
There is yet another place where the output can get split. That is in syslog_dev_write(): It will break up the stream to insert a CR before the LF. I think that can be avoid be generating the CR-LF sequence in the buffer and then detecting and ignoring valid CR-LF sequences, rather than expecting syslog_dev_write() to insert the CR into the buffer.
Like this: commit 1c9859520fc3b6d6bf5ad425a546a93d94695ebb
I don't like the fact that syslog_dev_write() still scans the entire output buffer to expand CR-LF sequence. This seems really wasteful, especially in this case where we can be sure that the is no CR or LF without a matching LF or CR. But I think, the existing behavior in syslog_dev_write() must be retained because it is needed in other contexts.
Thanks. With these, color debug case works and interleaved output problem is solved. I'll let you know if run into any problems.

-Jussi
Jussi Kivilinna jussi.kivilinna@haltian.com [nuttx]
2017-05-16 14:07:56 UTC
Permalink
Hello,

I found another problem. Debug assert in iob_alloc fails at line 245. I had set number of IOB buffers to 2 and assert failure happened rarely, with two tasks asserting at same time:

[ 2863.650000]up_assert: Assertion failed at file:iob/iob_alloc.c line: 245 task: <pthread>
[ 2863.650000]up_dumpstate: sp: 20003560
[ 2863.650000]up_dumpstate: stack base: 20003708
[ 2863.650000]up_dumpstate: stack size: 00000ffc
[ 2863.650000]up_dumpstate: stack used: 00000acc
....
[ 2863.660000]up_assert: Assertion failed at file:iob/iob_alloc.c line: 245 task: main_app
[ 2863.660000]up_dumpstate: sp: 20000eb0
[ 2863.660000]up_dumpstate: stack base: 20000ff0
[ 2863.660000]up_dumpstate: stack size: 00000fe4
[ 2863.660000]up_dumpstate: stack used: 0000050c
....

With setting number of IOB buffers to just one, assert fail happens much quicker with only one task crashing.

Looks like 'g_iob_sem.semcount' can get below zero in iob_alloc, when there is another task waiting at iob_allocwait. In that case "DEBUGASSERT(g_iob_sem.semcount >= 0)" is wrong. I don't have CONFIG_IOB_THROTTLE enabled so I don't know if the second assert should be removed too. Here's proposed change:

diff --git a/mm/iob/iob_alloc.c b/mm/iob/iob_alloc.c
index 27de294..b188769 100644
--- a/mm/iob/iob_alloc.c
+++ b/mm/iob/iob_alloc.c
@@ -242,7 +242,6 @@ FAR struct iob_s *iob_tryalloc(bool throttled)
*/

g_iob_sem.semcount--;
- DEBUGASSERT(g_iob_sem.semcount >= 0);

#if CONFIG_IOB_THROTTLE > 0
/* The throttle semaphore is a little more complicated because
@@ -250,7 +249,7 @@ FAR struct iob_s *iob_tryalloc(bool throttled)
*/

g_throttle_sem.semcount--;
- DEBUGASSERT(g_throttle_sem.semcount >= -CONFIG_IOB_THROTTLE);
+ //DEBUGASSERT(g_throttle_sem.semcount >= -CONFIG_IOB_THROTTLE); // Should this assert be removed also?
#endif
leave_critical_section(flags);


-Jussi
spudarnia@yahoo.com [nuttx]
2017-05-16 14:32:43 UTC
Permalink
This looks bad. I think that the assertion is correct and is telling you that is a serious problem.


In the IOB code, the semaphore is a true counting semaphore. Its range is:


1 <= semcount <= CONFIG_IOB_NBUFFERS: There are IOBs available. In this case iob_tryalloc() should succeed, that is, g_iob_freelist should contain EXACTLY semcount IOBs.


semcount <= 0: There no IOBs available; g_iob_freelist should be NULL. iob_tryalloc() should fail. Values less than zero indicate the number of threads waiting for an IOB to become available.


In the case of the assertion at line 325. We have this condition:


1. g_iob_freelist is not NULL. Since a positive, non-zero semcount indicates the number of free IOBs in the list, this means that the semcount must also be positive, and non-zero.


2. iob_tryalloc() decrements the semcount and the result is negative. That is impossible and the semaphore is now corrupted since a negative value means that there is a task waiting for an IOB and that is incorrect in this case.


So there is an error or race condition in the logic somewhere. The nature of the race seems to be that either (1) some logic freed an IOB by adding it to g_iob_freelist, but did not increment the semcount, or more likely, was pre-empted and has not yet incremented the semcount. Or (2) perhaps some other task took the semcount, but has not yet removed the IOB from the free list?


Either case is deserving of an assertion. My assumption is that using the IOBs in this way has revealed a race condition that does not occur in other contexts.


Other than at initialization, the g_iob_freelist of only accessed in iob_alloc.s and iob_free.c


Do you have a simple way of testing this? Or does the assertion only fire under a complex test scenario?


Greg
Jussi Kivilinna jussi.kivilinna@haltian.com [nuttx]
2017-05-16 15:03:40 UTC
Permalink
Post by ***@yahoo.com [nuttx]
This looks bad. I think that the assertion is correct and is telling you that is a serious problem.
1 <= semcount <= CONFIG_IOB_NBUFFERS: There are IOBs available. In this case iob_tryalloc() should succeed, that is, g_iob_freelist should contain EXACTLY semcount IOBs.
semcount <= 0: There no IOBs available; g_iob_freelist should be NULL. iob_tryalloc() should fail. Values less than zero indicate the number of threads waiting for an IOB to become available.
1. g_iob_freelist is not NULL. Since a positive, non-zero semcount indicates the number of free IOBs in the list, this means that the semcount must also be positive, and non-zero.
2. iob_tryalloc() decrements the semcount and the result is negative. That is impossible and the semaphore is now corrupted since a negative value means that there is a task waiting for an IOB and that is incorrect in this case.
So there is an error or race condition in the logic somewhere. The nature of the race seems to be that either (1) some logic freed an IOB by adding it to g_iob_freelist, but did not increment the semcount, or more likely, was pre-empted and has not yet incremented the semcount. Or (2) perhaps some other task took the semcount, but has not yet removed the IOB from the free list?
Either case is deserving of an assertion. My assumption is that using the IOBs in this way has revealed a race condition that does not occur in other contexts.
Other than at initialization, the g_iob_freelist of only accessed in iob_alloc.s and iob_free.c
Do you have a simple way of testing this? Or does the assertion only fire under a complex test scenario?
I made test application that triggers the problem when CONFIG_IOB_NBUFFERS is 1 to 3. With 4 or above, assert did not trigger, but might just require increasing number of threads and/or syslog calls in the test.

static void *syslog_flood(void *p)
{
int n = (intptr_t)p;
while (1)
{
syslog(LOG_INFO, "[%2d] stress testing 1234567890abcdefghijlkmnopqrstuvwxyz\n", n);
usleep(1);
}
return NULL;
}

int stress_syslog_iob_main(int argc, char *argv[])
{
pthread_t threads[10] = {};
int i;

for (i = 0; i < sizeof(threads) / sizeof(threads[0]); i++)
{
(void)pthread_create(&threads[i], NULL, syslog_flood, (void *)(intptr_t)i);
}

for (i = 0; i < sizeof(threads) / sizeof(threads[0]); i++)
{
if (threads[i])
{
pthread_join(threads[i], NULL);
}
}

return 0;
}

The proposed change does not fix the problem and test application just freezes with it.

-Jussi
spudarnia@yahoo.com [nuttx]
2017-05-16 15:11:23 UTC
Permalink
I have a change in mind and will be testing it shortly.


Basically, I think we have to keep the semcount and the freelist in exact agreement at all times. So I am adding logic to do that.


Greg
spudarnia@yahoo.com [nuttx]
2017-05-16 16:09:17 UTC
Permalink
Jussi,


Can you please they the attached patch. I have verified that basic IOB functionality is still okay, but there is no sense in committing it if it does not address your issue.


The basic idea is to have to separate lists of IOBs. Currently there is only a free list of IOBs. The problem, I believe, is because of asynchronies due sem_post() post cause the semcount and the list content to become out of sync.


This change adds a committed list: When there is a task waiting for an IOB, it will go into the committed list rather than the free list before the semaphore is posted. On the waiting side, when awakened from the semaphore wait, it will expect to find its IOB in the committed list, rather than free list.


Let me know. If this does not solve this issue that it is "back to the drawing board." Otherwise, I will commit it.


Greg
Jussi Kivilinna jussi.kivilinna@haltian.com [nuttx]
2017-05-16 16:32:31 UTC
Permalink
Post by ***@yahoo.com [nuttx]
Jussi,
Can you please they the attached patch. I have verified that basic IOB functionality is still okay, but there is no sense in committing it if it does not address your issue.
The basic idea is to have to separate lists of IOBs. Currently there is only a free list of IOBs. The problem, I believe, is because of asynchronies due sem_post() post cause the semcount and the list content to become out of sync.
This change adds a committed list: When there is a task waiting for an IOB, it will go into the committed list rather than the free list before the semaphore is posted. On the waiting side, when awakened from the semaphore wait, it will expect to find its IOB in the committed list, rather than free list.
Let me know. If this does not solve this issue that it is "back to the drawing board." Otherwise, I will commit it.
Greg
No problems so far. New improved stress test have not crashed yet, after running 12 minutes. Before this change, assert crash happened before all test threads printed first syslog.

Here is new stress test application:

struct flood_priv_s {
int n;
WDOG_ID wdog;
};
static void syslog_flood_wdog(int argc, wdparm_t arg1, ...)
{
struct flood_priv_s *priv = (void*)arg1;
syslog(LOG_INFO, "[%2d][WDOG] stress testing 1234567890abcdefghijlkmnopqrstuvwxyz\n", priv->n);
wd_start(priv->wdog, priv->n+1, syslog_flood_wdog, 1, (wdparm_t)priv);
}

static void *syslog_flood(void *p)
{
struct flood_priv_s priv = {
.n = (intptr_t)p,
.wdog = wd_create(),
};
int s;
wd_start(priv.wdog, priv.n+1, syslog_flood_wdog, 1, (wdparm_t)&priv);
while (1)
{
syslog(LOG_INFO, "[%2d] stress testing 1234567890abcdefghijlkmnopqrstuvwxyz\n", priv.n);
getrandom(&s,sizeof(s));
if (s % 4 == 0)
usleep(1);
}
return NULL;
}

int stress_syslog_iob_main(int argc, char *argv[])
{
pthread_t threads[10] = {};
int i;

for (i = 0; i < sizeof(threads) / sizeof(threads[0]); i++)
{
(void)pthread_create(&threads[i], NULL, syslog_flood, (void *)(intptr_t)i);
}

for (i = 0; i < sizeof(threads) / sizeof(threads[0]); i++)
{
if (threads[i])
{
pthread_join(threads[i], NULL);
}
}

return 0;
}

I improved test application to include watchdog timers to get syslog calls from interrupt context. However I had to make following change to fix interrupt context syslog (INTBUFFER untested):

diff --git a/drivers/syslog/syslog_write.c b/drivers/syslog/syslog_write.c
index a310070..390f594 100644
--- a/drivers/syslog/syslog_write.c
+++ b/drivers/syslog/syslog_write.c
@@ -40,6 +40,7 @@
#include <nuttx/config.h>

#include <sys/types.h>
+#include <nuttx/sched.h>
#include <nuttx/syslog/syslog.h>

#include "syslog.h"
@@ -100,8 +101,21 @@ ssize_t syslog_default_write(FAR const char *buffer, size_t buflen)
ssize_t syslog_write(FAR const char *buffer, size_t buflen)
{
#ifdef CONFIG_SYSLOG_WRITE
- return g_syslog_channel->sc_write(buffer, buflen);
-#else
- return syslog_default_write(buffer, buflen);
+ if (!up_interrupt_context() && !sched_idletask())
+ {
+#ifdef CONFIG_SYSLOG_INTBUFFER
+ /* Flush any characters that may have been added to the interrupt
+ * buffer.
+ */
+
+ (void)syslog_flush_intbuffer(g_syslog_channel, false);
#endif
+
+ return g_syslog_channel->sc_write(buffer, buflen);
+ }
+ else
+#endif
+ {
+ return syslog_default_write(buffer, buflen);
+ }
}

---
-Jussi
spudarnia@yahoo.com [nuttx]
2017-05-16 17:12:47 UTC
Permalink
Okay. Good enough for me now. I think the change is on the right track.


I have committed both the IOB change and your fix for interrupt buffering. If you uncover anything in your testing, I make have to revisit the IOB change.


Greg

spudarnia@yahoo.com [nuttx]
2017-05-16 15:04:11 UTC
Permalink
Okay, I think I understand what is causing the problem. I believe that this is the failure scenario:


Task A holds an IOB. There are no further IOBs. The value of semcount is zero.


Task B calls iob_alloc(). Since there are not IOBs, it calls sem_wait(). The value of semcount is now -1.


Task A frees the IOB. iob_free() adds the IOB to the free list and calls sem_post() this makes Task B ready to run and sets semcount to zero NOT 1. There is one IOB in the free list and semcount is zero. When Task B wakes up it would increment the sem_count back to the correct value.


But an interrupt occurs before Task B wakes up. It takes the IOB off of the free list and decrements the semcount. But since semcount is then < 0, this causes the assertion because that is an invalid state in the interrupt handler.


So I think that the root cause is that there the asynchrony between incrementing the semcount. I would want to think about this. Do you have any suggestions?


Greg
Loading...