Strategies for IoT Edge Devices in Noisy Channels.

As Internet of Things technologies mature, there is more focus on aggregation of disparate information. Edge devices provide the rich data streams on which those techniques are based. While low level design of an edge device or its wireless protocol may not be foremost in a developers mind, a basic knowledge of techniques employed can improve the success of the entire ecosystem.

We’ve all experienced it…trying to hold a conversation in a crowded noisy room. Getting heard by the people around you is hard enough. Communicating with someone on the other side of the room is nearly impossible.

Now, imagine a tiny battery powered IoT edge device. It’s far from neighbors to relay its messages. It shares a noisy channel with other active transmitters. The device includes sensors for monitoring the speed of a conveyor belt and conveyor bearing temperature. Keeping the conveyor running at the correct speed is critical to the manufacturing process. Measurements of ambient and conveyor bearing temperatures may signal impending failure.

Once every second, the device wakes up, takes measurements, and broadcasts them. Each update includes a device identifier and the time the sample was collected, as well as conveyor speed and temperatures.

Extreme conditions such as these are more common than you might expect. Manufacturing operations are vast and noisy places. They house a myriad of devices, all competing to be heard. The devices need long operating lives – sometimes years – without a battery change.

How can we improve the chances of getting heard? What is the risk to the operation if the conveyor slows or overheats?

The Basics – Use Our Tools

Wireless protocols include tools to conquer busy channels. (More about those techniques, here)

Try a different channel – protocols may offer multiple channels. Maybe there’s one that’s not as busy. If we share spectrum with WiFi, this might not be a viable option.
Wait for a quiet period – protocols may include Clear Channel Assessment. In effect, we listen before transmitting. If the channel is super-busy, there may never be a quiet period. Or, the quiet periods may be too small to get the whole message out. We can’t depend on the other devices in our channel to be polite.
Acknowledge every message – protocols may include guaranteed delivery which usually involves re-transmission of unacknowledged messages. If multiple sensors are impacted, this technique may spam the channel, making it noisier.

Smaller is Better

Let’s take a look at a typical message packet

Security and privacy of the payload is a parallel concern and affects the size of the message packet. Securing messages and the effect on packet size are not addressed here.

Not too much we can do with the packet framing. (header, footer, etc) What if we can shrink the payload so the message fits in the tiny gaps in the channel traffic? Shrinking the message can also increase battery life.

Sensor ID

Sensors may use a MAC address to identify themselves. While useful as a unique identifier, the entire 6 byte identifier is not required in each message. In systems with fewer than 256 devices, each can map its MAC address to a more compact single byte identifier. By using a single byte to identify the device, we’ve already reduced the payload by 5 bytes.

Timestamp

Each packet is tagged with a time code. Packets may travel different routes to the host server and arrive out of order. But reducing the complexity of that timestamp can reduce the payload space it needs.

When a device joins the the network, it will negotiate attributes with the host. One of those is the sample rate. So, if the host service knows the device is transmitting once per second, the device can send a sequence number instead of a complex time code. With each transmission, the sequence is incremented. The size of the sequence number depends on many factors, not the least of which is rollover detection. This occurs when the sequence number reaches its maximum value and rolls over to zero. For sensors transmitting once per second, a 2 byte sequence number is usually adequate. (Rolls over once every 18 hours)

Speed and Temperatures

Rate based data (speed, rotation, flow, power consumption) are a challenge we’ll explore in the next section. First, let’s investigate temperature data.

More Messages?

First, we notice that temperature measurements use more than half of the payload. Breaking temperature data into its own message has two benefits. First, each message is shorter. Second, we can apply techniques to the two messages types, independently, and further reduce the overall channel traffic.

Dynamic Range and Measurement Precision

An average sensor may report temperature from 0°C to 100°C. (32°F to 212°F) Readings are digitized using an analog-to-digital converter. These converters normally provide at least 10 bits of resolution which, with oversampling, can be improved to 12 bit resolution, or more. Does a temperature reading need 10 bit resolution? Ten bits provides a precision of 0.2°F. (180° temperature range divided into 2^10 = 1024 steps) Do we need to report temperatures below 60°F or above 120°F. Assuming a 1°F resolution, and a range from 60°F to 120°F, we have reduced our data requirement to 6 bits. (60° range in 1° steps = 60 steps, or 6 bits) By further reducing either dynamic range or precision, the two temperature readings can be reduced to a single byte.

Some systems prefer “human readable” values from their devices. This improves interoperability of sensors and reduces the processing load on the host service. Still, we can transmit temperatures with 8 bit integers at 1° resolution and a range of 0°F to 255°F.

Measurement Period

Temperature changes slowly, so we can reduce channel traffic by sending those measurements less frequently. Reporting temperature twice, or even once, per minute is normally adequate. Reducing message frequency also has a beneficial effect on battery life.

Sounding the Alarm

Sometimes, temperature information is critical to the host server. Overheating must be reported immediately. By using alarms for critical measurements, we gain several advantages. First, they can be appended to payloads of more frequent messages. Second, they can require interaction with the host server to clear them. (a form of guaranteed delivery) Third, and not least, an alarm can be transmitted in a single bit.

Don’t we need guaranteed delivery?

We began exploring this topic in our discussion of temperature. If a few temperature measurement messages are missed, the next good message provides a full update. If readings become concerning, we can sound the alarm.

Reporting speed is a little more challenging. What if, for instance, the conveyor stalls? If those messages are missed, the host service will never know. Still, we can accomplish most of this requirement without guaranteed delivery.

Missed messages and average rate

The chart below shows several samples of the conveyor speed and the average over those samples
No data loss

Interrupt a few and our view of the process changes dramatically. The host never knows that the conveyor stopped.
Some data loss

Enter the Totalizer

Instead of transmitting the speed for each sample, we can transmit a totalized speed. This can either be a sum of all speed readings or a running count from the speed sensor. (Human readable speeds may be easier for the host server to process, as it needs no knowledge of the sensor architecture)

With the totalizer, the data looks like this:
With totalizer

Notice that, even with lost messages, the average speed is accurate. We still missed the event that caused the conveyor to stop, but comparing the instantaneous speed to the average speed tells us something is amiss. We can always add alarm to detect a stopped or slow conveyor.

Managing Roll-over

Unless our totalizer value has unlimited bits, (which defeats our goal!) there will be times when it reaches its maximum value and rolls over to zero. As with timestamp, it’s up to the host server to detect and compensate for rollovers. As with the timestamp, we must design the data so that rollover is manageable and detectable. We must assure that, during an outage, that there is only ever a single rollover. Ideally, timestamp and totalizer can be used together to manage rollover.

What’s it all mean?

The quality and reliability of data from edge devices is key to the success of any Internet of Things ecosystem. Design decisions affecting those devices may affect both. While it may not practical to employ all of the techniques described here, each has merit. Optimizing around a selected few will improve performance in your busy network and make sure that everyone gets heard.