Software Reset Due to RESET_MEM_ALLOC_FAIL

⚠️
Hi there.. thanks for coming to the forums. Exciting news! we’re now in the process of moving to our new forum platform that will offer better functionality and is contained within the main Dialog website. All posts and accounts have been migrated. We’re now accepting traffic on the new forum only - please POST any new threads at https://www.dialog-semiconductor.com/support . We’ll be fixing bugs / optimising the searching and tagging over the coming days.
2 posts / 0 new
Last post
kwilliams
Offline
Last seen: 5 months 2 weeks ago
Joined: 2020-09-23 20:34
Software Reset Due to RESET_MEM_ALLOC_FAIL

I’m running into software resets due to RESET_MEM_ALLOC_FAIL on my peripheral. Let me start with a quick overview of what behavior I'm trying to achieve:

  1. Central connects to the peripheral (no pairing or bonding)
  2. Central subscribes to notifications for the “Button State” characteristic of custom service 1
  3. Central writes a non-zero value to the control point characteristic in custom service 1.
  4. On the peripheral, this will start what I refer to in the code as the “sampling loop” (code snippets below). The loop, which is really just an app_easy_timer that keeps resetting itself until a zero is written to the aforementioned control point, checks the current state of the button onboard the DA14531 TINY daughterboard (SW2) and sends it via notification to the central.
  5. Once the user has not pressed the button for 3 straight seconds, the “inactivity timer” should trigger. This will stop the sample loop (i.e. cancel its timer) and enable a button interrupt wakeup (on the same button, SW2) through wakeupct. At this point, the device should go to sleep and only wake up when the button is pressed or a connection event must happen (I set the latency to the max of 499, so these should be very spaced out).
  6. The next time the user presses the button, the device should wake up, send a button press notification to central (it’s crucial that this notification is sent), and resume the sampling loop.
  7. Steps 4 – 6 are repeated until a zero is written to the control point.

As you can tell, this is quite similar to ble_app_peripheral, which is what I used as my starting point.

The issue that I'm facing is that sometimes (not all the time!) a RESET_MEM_ALLOC_FAIL  is triggered right as the inactivity timer is about to trigger. The peripheral usually needs to go through a few cycles switching between sampling and sleeping before the reset happens, but the number of cycles that happen before the reset fires is inconsistent.

I did some research on here and it seems that the usual cause of a RESET_MEM_ALLOC_FAIL is message mishandling. The only messages that I'm directly allocating are the button state notifications (see custSrvSetButtonState() below), but I don't think I'm producing these faster than they can be consumed because the sampling timer delay is the same as the connection interval.

I used the heap logging to capture the state of the heap when the reset happens and this was the output:

>>> ENV HEAP <<<

Used size in this HEAP  :  496 (current) -  512 (maximum)

Used size in other HEAPs:  404 (current) -  420 (maximum)

 

>>> DB HEAP <<<

Used size in this HEAP  :  592 (current) -  592 (maximum)

Used size in other HEAPs:    0 (current) -    0 (maximum)

 

>>> MSG HEAP <<<

Used size in this HEAP  : 1336 (current) - 1356 (maximum)

Used size in other HEAPs: 1180 (current) - 1180 (maximum)

 

>>> Non-Ret HEAP <<<

Used size in this HEAP  :    0 (current) -  196 (maximum)

Used size in other HEAPs:    0 (current) -    0 (maximum)

 

 

After increasing MSG_HEAP_SZ to 6880, the reset still happened with the following heap log output:

>>> ENV HEAP <<<

Used size in this HEAP  :  496 (current) -  528 (maximum)

Used size in other HEAPs:  404 (current) -  404 (maximum)

 

>>> DB HEAP <<<

Used size in this HEAP  :  592 (current) -  592 (maximum)

Used size in other HEAPs:    0 (current) -    0 (maximum)

 

>>> MSG HEAP <<<

Used size in this HEAP  : 6856 (current) - 6856 (maximum)

Used size in other HEAPs: 1180 (current) - 1196 (maximum)

 

>>> Non-Ret HEAP <<<

Used size in this HEAP  :    0 (current) -  196 (maximum)

Used size in other HEAPs:    0 (current) -    0 (maximum)

 

I'm doubtful that the heap size is the problem anyways, since I'm not doing anything that intensive (I'm sending out notifications every 30ms with a value that's a single byte). So, it's looking like a memory leak to me, but I'm not sure what I'm doing wrong with regards to memory management. Any help would be hugely appreciated! Thank you all so much.

Relevant code snippets:

/*
 * MACROS
 ****************************************************************************************
 */

#define CUSTS1_SAMPLE_TIMER_DELAY       MS_TO_TIMERUNITS(30)
#define CUSTS1_INACTIVITY_TIMEOUT       MS_TO_TIMERUNITS(3000)

/*
 * GLOBAL VARIABLE DEFINITIONS
 ****************************************************************************************
 */

ke_msg_id_t samplingTimer        __SECTION_ZERO("retention_mem_area0"); //@RETENTION MEMORY
ke_msg_id_t inactivityTimer      __SECTION_ZERO("retention_mem_area0"); //@RETENTION MEMORY

/*
 * FUNCTION DEFINITIONS
 ****************************************************************************************
 */
 
static void custSrvStartSampleLoop(void)
{
  samplingTimer = app_easy_timer(CUSTS1_SAMPLE_TIMER_DELAY, custSrvSampleLoopTimerCallback);
  inactivityTimer = app_easy_timer(CUSTS1_INACTIVITY_TIMEOUT, custSrvInactivityTimerCallback);
  
  arch_puts("Sample loop started\r\n");
}

static void custSrvStopSampleLoop(void)
{
  app_easy_timer_cancel(samplingTimer);
  app_easy_timer_cancel(inactivityTimer);
  samplingTimer = EASY_TIMER_INVALID_TIMER;
  inactivityTimer = EASY_TIMER_INVALID_TIMER;
  
  arch_puts("Sample loop stopped\r\n");
}

/**
 ****************************************************************************************
 * @brief Updates the state of the button so that the kernel can notify the subscriber.
 * @param[in] _pressed   Whether the button is pressed.
 ****************************************************************************************
*/
static void custSrvSetButtonState(const bool _pressed)
{
  struct custs1_val_ntf_ind_req *req = KE_MSG_ALLOC_DYN(CUSTS1_VAL_NTF_REQ,
                                                        prf_get_task_from_id(TASK_ID_CUSTS1),
                                                        TASK_APP,
                                                        custs1_val_ntf_ind_req,
                                                        DEF_SVC1_BUTTON_STATE_CHAR_LEN);
  
  if (req == NULL)
  {
    arch_puts("Failed to allocate notification\r\n");
    return;
  }

  uint8_t buttonState = _pressed ? CUSTS1_BUTTON_DOWN : CUSTS1_BUTTON_UP;

  req->conidx = app_env->conidx;
  req->handle = SVC1_IDX_BUTTON_STATE_VAL;
  req->length = DEF_SVC1_BUTTON_STATE_CHAR_LEN;
  req->notification = true;
  memcpy(req->value, &buttonState, DEF_SVC1_BUTTON_STATE_CHAR_LEN);

  ke_msg_send(req);
}

void custSrvControlPointWriteHandler(ke_msg_id_t const msgid,
                                     struct custs1_val_write_ind const *param,
                                     ke_task_id_t const dest_id,
                                     ke_task_id_t const src_id)
{
  uint8_t val = 0;
  memcpy(&val, &param->value[0], param->length);
 
  // note(KJW): we may want to use bools instead of using the timer variables to check state
  if ((val != CUSTS1_CONTROL_POINT_DISABLE) && (samplingTimer == EASY_TIMER_INVALID_TIMER))
    custSrvStartSampleLoop();
  else if ((val == CUSTS1_CONTROL_POINT_DISABLE) && (samplingTimer != EASY_TIMER_INVALID_TIMER))
    custSrvStopSampleLoop();
}

void custSrvSampleLoopTimerCallback(void)
{
  if (GPIO_GetPinStatus(BTN_PORT, BTN_PIN))
  {
    custSrvSetButtonState(false);
    if (inactivityTimer == EASY_TIMER_INVALID_TIMER)
      inactivityTimer = app_easy_timer(CUSTS1_INACTIVITY_TIMEOUT, custSrvInactivityTimerCallback);
  }
  else
  {
    custSrvSetButtonState(true);
    if (inactivityTimer != EASY_TIMER_INVALID_TIMER)
    {
      app_easy_timer_cancel(inactivityTimer);
      inactivityTimer = EASY_TIMER_INVALID_TIMER;
    }
  }

  if (ke_state_get(TASK_APP) == APP_CONNECTED)
  {
    // Set it once again until Stop command is received in Control Characteristic
    samplingTimer = app_easy_timer(CUSTS1_SAMPLE_TIMER_DELAY, custSrvSampleLoopTimerCallback);
  }
}

void custSrvInactivityTimerCallback(void)
{
  if (ke_state_get(TASK_APP) == APP_CONNECTED)
  {
    arch_puts("Inactivity limit reached, sleepytime...\r\n");
    app_easy_timer_cancel(samplingTimer);
    samplingTimer = EASY_TIMER_INVALID_TIMER;
    
    wkupct_register_callback(custSrvButtonWakeupCallback);
    wkupct_enable_irq(WKUPCT_PIN_SELECT(BTN_PORT, BTN_PIN),
                      WKUPCT_PIN_POLARITY(BTN_PORT, BTN_PIN, WKUPCT_PIN_POLARITY_LOW), // button is active LOW
                      1, // 1 event (press)
                      10); // debouncing time = 10 ms
    
    // TODO there are also "wkupct2" versions of the wakeupct functions, are they any better?
  }
}

void custSrvButtonWakeupCallback(void)
{
#if !defined (__DA14531__)
  if (GetBits16(SYS_STAT_REG, PER_IS_DOWN))
#endif
  {
    // not sure if this is necessary.
    // wkupct_quadec.h says that it is.
    // however, in the sample code (ble_app_sleepmode), it's only done when waking from sleep before a connection has been established.
     periph_init();
  }
  
  arch_puts("Button wake up\r\n");
  //wkupct_disable_irq(); not sure if this is necessary. only seems to be necessary if canceling irq before the callback is called.
  custSrvStartSampleLoop();
  custSrvSetButtonState(true);
}

My connection parameters:

static const struct connection_param_configuration user_connection_param_conf = {
    /// Connection interval minimum measured in ble double slots (1.25ms)
    /// use the macro MS_TO_DOUBLESLOTS to convert from milliseconds (ms) to double slots
    .intv_min = MS_TO_DOUBLESLOTS(30),

    /// Connection interval maximum measured in ble double slots (1.25ms)
    /// use the macro MS_TO_DOUBLESLOTS to convert from milliseconds (ms) to double slots
    .intv_max = MS_TO_DOUBLESLOTS(30),

    /// Latency measured in connection events
    .latency = 499,

    /// Supervision timeout measured in timer units (10 ms)
    /// use the macro MS_TO_TIMERUNITS to convert from milliseconds (ms) to timer units
    .time_out = MS_TO_TIMERUNITS(32000),

    /// Minimum Connection Event Duration measured in ble double slots (1.25ms)
    /// use the macro MS_TO_DOUBLESLOTS to convert from milliseconds (ms) to double slots
    .ce_len_min = MS_TO_DOUBLESLOTS(0),

    /// Maximum Connection Event Duration measured in ble double slots (1.25ms)
    /// use the macro MS_TO_DOUBLESLOTS to convert from milliseconds (ms) to double slots
    .ce_len_max = MS_TO_DOUBLESLOTS(0),
};

 

PM_Dialog
Offline
Last seen: 2 months 2 weeks ago
Staff
Joined: 2018-02-08 11:03
Hi kwilliams,

Hi kwilliams,

Thanks for your detailed message and for your interest in our TINY module BLE solution. This is a platform reset.  Τhe platform_reset_func() is invoked by the platform_reset(), which is implemented in the ROM code. As you mentioned, The most possible reason why you get this assertion is due to insufficient memory, because you are allocating messages which are never consumed. For example, if you are allocating notification messages and you have a small connection interval the messages are piled up until a connection event arrives, but with a large connection interval your run out of memory before the connection event arrives. You could increase the connection interval. Probably, in your application there might be some kind of memory leakage pilling up after each connection, as the error code is RESET_MEM_ALLOC_FAIL. To do so, check if there are any pending messages and make sure that you are consuming the messages that you get when the message is handled or if you are allocating data they should be freed.

In the attached code, I caw that small connection interval is used. Could you please try to increase the connection interval and in parallel keep the increase heap size?

Thanks, PM_Dialog