Technical

From Seamonster
Jump to: navigation, search

Seamonster Instrument Overlay WithBgkdAndLinks.jpg



Introduction

This page sketches the technical motivation, concepts and components of SEAMONSTER. This is to be contrasted with scientific, social, and educational motives which are equally as important but described elsewhere.


Motivation, concepts, components overview

The motivation is pretty simple: Technology advances in the consumer electronics world can be adapted to science research. Often these are not adopted in science because of the work involved, so we'll see if we can push that adoption a little bit. A good example is WiFi, which can be used in principle to communicate video images across tens of kilometers in something like real time.


The technical concept behind SEAMONSTER is that for each task in a hypothetical automated system we fit enough computing horsepower to accommodate that task. If we need a computer that runs a rich operating system like Linux, we'll use that. If we just need to use a simple lower power microprocessor, we'll use that. The idea is to let the technology relax onto the system requirements in order to build it inexpensively.


The second technical concept behind SEAMONSTER is that power, data, and money are the vertices of a triangle, interconnected: If you spend more money you can have more power available in the field (because you bought more batteries or more solar panels for example and connected them to your sensor installation). If you spend more money you can also recover more data (because you can build many such installations). But if your money is limited it would be nice to take a different approach to power consumption: Have the sensor installation shut off its sensors much of the time, to begin with, and likewise with communication gear.


The technical components of SEAMONSTER are broadly hardware and software. The hardware can further be broken down in terms of sensors, platforms, infrastructure (including power), and a servers. Sensors are measuring environmental parameters, platforms host sensors and act as a means of convenying sensor data back to the project server, and infrastructure is what holds everything together and drives it: Batteries, solar panels, tripods, U-bolts, zip-ties, antennas, cables, connectors, and so on.


Servers are destination points that hold data and provide it through an interface to the rest of the world. Our primary server lives at the Natural Sciences Research Lab (NSRL) near the UAS campus.


Software runs on platforms and servers and keeps the system operating.


Below we list some related technical concepts to flesh out this brief sketch. The Architecture page delineates components of the technical picture using a block diagram. Because sensor webs are complicated constructions we try to adhere to two related engineering principles: Work modularly, build-and-test incrementally.

Data Acquisition

Network Geometry

Choose sites of interest for sensors as driven by science objectives. Place network nodes at these locations and add additional nodes to establish wireless connectivity to a sink. Sinks are (in this context) internet-connected computers that can trivially move information to/from the Server at the Natural Sciences Research Laboratory (NSRL) at the University. The NSRL sink is the "sink sink"; it is the final destination for all data in the acquisition phase.

In Year 1 we have these sites of interest:

  • Lemon Creek Glacier Accumulation Area: Lynn & Linda supraglacial lakes.
Lemon Glacier - Supraglacial Lake Level Instrumentation Page
  • Cairn Peak: Glacier-constraining ridge line overlooking Lemon Creek Glacier with a line of sight view across the watershed to NSRL.
Lemon Glacier - Cairn Peak Instrumentation Page
  • West ridge Lemon Creek repeater station
  • Lemon Creek terminus
  • USGS Stream Gauge Station.
Lemon Creek USGS Station Instrumentation Page
  • Lower Lemon Creek Station (sink)
  • Buoy in Douglas Channel, Lemon Creek Estuary
  • Mendenhall Glacier terminus island / Mendenhall Lake.
Mendenhall Glacier - Terminus Instrumentation Page
  • Mendenhall Glacier visitor's center
  • Mendenhall Glacier ablation area e.g. 8 km up-glacier from terminus
  • Mendenhall Glacier flux gate "pinch point" closer to equilibrium line as such
  • Auke Lake forested high-spot east
  • Auke Lake center of lake: Buoy and temperature string
  • UAS Campus - Auke Lake.
UAS Campus Instrumentation Page
  • Natural Sciences Research Laboratory (NSRL) located a few km ESE from UA Campus: Final sink.
NSRL Campus Instrumentation Page
  • McGinnis Creek Watershed - Unglaciated Valley near Mendenhall Glacier.
McGinnis_Creek_Instrumentation Page
  • Eaglecrest Snow Study site, Douglas Island
Eaglecrest (Fish Creek) Instrumentation Page


Two-Tier

The remote node or microserver has enough horsepower (ARM-processor SBC) to host a Mote-type subnet, called here "Tier 2". We plan to test this in Summer 2007, where a Mote base station is connected via USB to the Microserver SBC. Data is spooled from O(10) local motes via multi-hop TinyOS-based communication protocol using Moteiv TmoteSky devices. Typical data is temperature, light, relative humidity, stream turbidity, and water depth in the Lynn-Lindy lake complex. It is likely that the Base Tmote will be driven off the microserver power implying a changing duty cycle from summer to winter. Part of the local operability must therefore be the rescheduling of duty cycles in the subnet based on information the Base Station gleans from the Microserver.


Survivability

Typically hardware--nodes and associated sensors--in Southeast Alaska will be exposed to extended sub-zero temperatures, heavy snow and rain, wind and occasional sunlight over the course of a year, and some immersion. Nodes (as is typical) communicate ad hoc one to another (not server/client mode) to reduce the possibility of single node failure causing network failure. This means there are ideally multiple possible data recovery paths making the system more survivable, combined with NEMA enclosures and what we hope will be a smart power management plan.


One important detail in microserver implementation is a two-computer approach also found elsewhere in the sensor net community. A very low-power microcontroller resets the system no matter what every R hours to avoid main-computer hangs crippling the node indefinitely. The microcontroller also implements a watchdog timer to help avoid hanging itself. This failsafe mechanism constrains the main system operation in that it must periodically and gracefully cope with a system reset.


Energy Consumption

Network components use an energy budget scheduler that has an initial simple implementation and a to-be-developed smarter implementation. Both require the node to have a self-awareness of energy consumption as a function of operational state, awareness of its power supply capacity, and an estimator (or direct measurement) of re-charge, typically from photovoltaic panels. The latter two comprise an energy-over-time E(t) estimation function. This generalization applies to both Tier 1 and Tier 2 components with the complication that (since Tier 2 rests on Tier 1) synchronized duty cycling is necessary to pass data.


The simple implementation is to base duty cycle schedule D(t) on an a priori E(t). Since we are at relatively high latitude in the northern hemisphere we can assume for example that from November through February there is no available recharging. As the earth moves to better-north-illuminated parts of its orbit we get more recharging and can operate more often.


The more complicated implementation integrates power usage and maintains a dynamic estimate of available power. This strategy should really be combined with measurement of available system power (prima facia: voltage). The benefit is simply better adaptation of data acquisition to available energy and the drawback is considerable development effort required to do this successfully.


Data

To state the obvious: The objective is to enable each sensing device to sample at the highest feasible (useful) rate and return this data to the NSRL sink as quickly as possible. There is considerable theoretical effort devoted to understanding this problem. We anticipate empirically discovering our sensor web's dynamic resident location in deployment parameter space.


We want data volume and precision to be constrainted only by motivation and bandwidth. In most cases: Between one and ten data points per minute is the desirable 'adequate' sampling rate range. Advanced operation will enable changing this dynamically.


An exception is of course high-frequency sensors, for example a seismic array sampled at 3000 samples per second. This case is treated as a separate 'embedded processing problem'.

Data and Storage

Databases

The data bases we implement should have decent access speed and should support spatial queries. We have in mind two parallel systems, one running PostgreSQL (which has spatial search intrinsic) and another running Sequel Server (which has spatial search by means of Heirarchical Triangular Mesh algorithm). Our sensor web is initially small enough that we do not anticipate data volume problems in Year 1.


Information Access

Data Sharing

This project is open-data across the board. The only impediments we will put in place will be for the purpose of protecting system integrity from attack.


Visualization tools and APIs

Our development here is unfortunately constrained by time/people/money limitations but we will implement (at least) Google Earth and Virtual Earth visualizations of SEAMONSTER.


Near-Real-Time

Assume that people and applications will be reaching in to the Seamonster databases--updated in near-real-time--on some Server. This expression 'near-real-time' means that if the last temp sensor measurement was made ten seconds ago, that data value will show up on my time/temperature plot when I click an <update> button in the Graphical User Interface... That is the ideal case, but in reality we expect delays due to outages and so on. We can define a worst-case latency of 24 hours and design towards that, always with the goal being 'a few seconds'.


None-of-the-Above Categories and Remarks

The Working Dry Lab

We begin by placing a set of antennas on the roof of the Natural Science Research Laboratory (NSRL) on the UAS campus. These will wait for microservers to appear in the distance; when they do their signals will drain into NSRL servers. NSRL is also a dry-lab for doing pre-deployment testing. Once the pieces of SEAMONSTER work reliably in the dry lab, one by one we export them into the field. Each step of the way we ensure connectivity so that we can unfold a functional network. In Biology this is called Gradualism and in Engineering its called Good Practice.


Software Concept of Operation

Motes acquire data on their D(t) schedule and flow data to Base Station motes as noted. The microservers may collect more data, and all of this information flows along the microserver backbone to the NSRL Servers for archival (possibly via Anchors). We would like redundancy in the communication scheme so will also consider alternative data recovery systems, for example long-range lower-bandwidth radio modems (FreeWave, SLUGs) and/or a satellite radio modem such as Iridium.


That said, taking our easy-first approach we will implement on our Linux-based microservers:

  • cron jobs using...
  • rsync commands interfacing to a...
  • simple task manager which may also fire...
  • simple sensor interface programs as well as...
  • simple Tier-2 subnet interface programs

To elaborate these a little further:


cron and rsync are pre-existing solution tools intrinsic to UNIX (Linux). cron permits periodic task scheduling with a finest granularity of once per minute, including checking semaphors and launching rsync processes. rsync allows directory synchronization in a from-to sense between network-connected computers. That is: 'rsync From To' permits us to copy the contents of a directory tree on From into a location on To. It can be run twice reversing the From To order to exchange new files in both directions.


The advantage of this approach is that it submerges details of network routing into the "someone else's problem" layer. This approach is not without drawbacks however. An example of the limitation of cron/rsync can be seen in the UCLA-CENS "EmStar" delay-tolerant shell which extends network functionality across many nodes simultaneously with a degree of disruption and delay tolerance (hence the shell's name). EmStar would be a good future project to try and implement since it solves deeper network-specific problems.


More complex logic must be implemented in software. Task manager, sensor interface, and sub-net interface programs are typically written in C or python or perl.


Communication Protocols

  • Tier 2: IEEE 802.15.4 (but not ZigBee at this time).
  • Tier 1: 802.11G by means of Wireless Ethernet Bridges.
  • Later: 920 MHz radio modem lower-bandwidth long-range.
  • Later: SLUGs?
  • Later: Satellite modem (Iridium)


Layering and Modularity

Layers and modularity are deemed to be Good Things. Facilitated by this philosophy and intrinsic to the SEAMONSTER project is a friendly comparison between Linux and (for lack of a better term) related Open Community tools and corresponding tools provided by Microsoft.


Performance Metric

Imagine every sensor in the system as a line plotted in time. The color of the line changes based on latency on a scale from 'no-latency' to 'sensor unavailable'. (Once latency reaches 24 hours the sensor is considered unavailable.) A 1 minute latency is considered 'no-latency', close enough to real time. Each sensor measurement packet includes a time tag so that latency can be determined upon packet arrival at the Server. Latency on the [unavailable, no-latency] range is converted into a number on [0., 1.], which varies over time. This gives an average availability for that sensor over time. To aggregate this into a single system health number each sensor can be assigned a relative weight or importance and the weighted mean calculated.