BME280 BMP280 on Wemos Lolin32 with Mongoose OS

The weekend IoT warrior is back again!

I bought a pack of 5 BME280s from ebay, unfortunately they sent BMP280s instead and wouldn’t offer a refund. A colleague at work had the same issue, it’s actually really hard to get the BME and not get fobbed off with the BMP. Another colleague mentioned to buy them from this seller on Aliexpress as they’re legit (he had a pack of them in his hand, the right ones!). Anyway, I haven’t ordered them yet so I’ll play with the BMP280 for now, same interface, just lacks the humidity sensor.

I ordered the BME280s but got sent the BMP280s, annoying!

Finding the I2C pins

Every board has the potential to use different SDA/SCL pins, fortunately for me, it was printed on the reverse of the board, but you might have to hunt around in the specification sheet of your board to find them.

Wemos Lolin32 board

Finding the I2C device

You can have many devices on the I2C bus, think of it like a public highway with many cars (devices) travelling on it at any time. Each device will have it’s own ID which you’ll need to know to communicate with it in your application. If the manufacturer doesn’t specify the device ID, fortunately there is an easy way to find it using the I2C scanner that Arduino provide, run this sketch and you should see something like the following

Arduino code for the I2C Scanner

Note that I had to override the I2C pins on the Wire.begin(), you might need to do that too.

I2C device found at address 0x76  !

Writing some code to read data from the sensor

Now I know the I2C pins, and the device address, I can start to write some code. I started off by looking at the example from Mongoose OS, but it seems theres an issue with the Arduino compat library, fortunately theres an answer on the forums.

Here’s where I ended up

#include <mgos.h>
#include <Adafruit_BME280.h>

#define SENSOR_ADDR 0x76

static Adafruit_BME280 *s_bme = nullptr;

// Couldn't get bme280 example app to work due to arduino compat
// here's a modified version 
// Credit to nliviu from MOS Forums for help with this 

void readTimerCB(void *arg)
    printf("Temperature: %.2f *C\n", s_bme->readTemperature());
    // Humidity will only work on BME280, not BMP280
    printf("Humidity: %.2f %%RH\n", s_bme->readHumidity());
    printf("Pressure: %.2f kPa\n\n", s_bme->readPressure() / 1000.0);
    (void) arg;

enum mgos_app_init_result mgos_app_init(void)
    s_bme = new Adafruit_BME280();

    // Initialize sensor
    if (!s_bme->begin(SENSOR_ADDR)) {
        printf("Can't find a sensor\n");
        return MGOS_APP_INIT_ERROR;

    mgos_set_timer(2000, 1, readTimerCB, NULL);


Grab the full source here


Once again, Mongoose OS makes it REALLY easy to get started, the hardest part was figuring out the I2C connections but there’s plenty of resources online for helping with that.

My plan is to have a few of these setup, 2 inside my lizard enclosure (for hot and cool sides), a few around the house (humidity would be great, had a few issues with mould) and one for outside temperature readings. The next challenge I have is understanding how I can deploy the same application to multiple devices but with different config, so I can name the MQTT topics for each board. Time to get reading!

Building a water tank sensor using ESP32, JSN SR04T sensor, Mongoose OS and AWS IoT

There’ll be quite a lot going on in this post, I’ve had a crash course in Mongoose OS the past few weeks so this serves as a brain dump!

The goal

To create a low powered, wifi enabled device that I can install under the access hatch in my rainwater tank, that will periodically take readings of the water level via an ultrasonic sensor and report that data to AWS IoT.

With no IoT experience, I was inspired by a colleague at work who has also done this, I thought it’d be a good way to get stuck in and learn something! I wrote a previous post about getting started with Mongoose OS

Wiring things up

First things first, the JSN SR04T sensor that I’m using here is the waterproof equivalent of the HC SR04, I didn’t realise this at first, but it’ll help you find resources easier knowing the interfaces are the same. One thing to note is that the HC SR04 will be able to take measurements at closer distances than the JSN SR04, this is because the HC SR04 has separate transmitter and receiver modules, whereas the JSN SR04 is a combined unit, and there is a delay in switching from the transmit to the receive signals, so it can’t read below about 20cm.

The sensor operates on 5v, which is great as the ESP32 has a 5v out, but we need to convert that down to 3.3v on the input (echo) as the ESP32 operates at 3.3v.

Kolbans’ ESP32 book has a wiring diagram for this, here’s a screenshot (also go check out the book and send him a few $, well worth it!)

HC SR04 wiring diagram from Kolbans book

I’m using a breadboard and jumper cables to make it easier, it’s not pretty but it does the job!

ESP32 setup with JSN SR04 using a breadboard. The helping hand is just there to hold the sensor

Interacting with the sensor

There are plenty of examples of how to use this sensor using Arduino, but I couldn’t find any for using Mongoose OS. I initially tried to use the Mongoose OS Arduino compatibility library, but unfortunately it wasn’t just a case of dropping in some Arduino code and it working as others have suggested, because there are certain Arduino functions that need to be wrapped in Mongoose OS (like pulseIn).

Thankfully the kind souls over on the Mongoose OS forums were able to help me with a pulseIn implementation so I could read the sensor data!

Interacting with the sensor is simple, signal the trigger pin for at least 10 microseconds, then read the duration of a signal from the echo pin. The duration is the time it took for the ultrasound signal to return.

Casting our minds back to Maths class all those years ago, with time (duration of the signal) and speed (constant of sound) we can calculate distance. Remember the pyramid? Heres an image from the BBC Maths website to jog your memory

Speed is distance over time
unsigned long duration = pulseInLongLocal(ECHO_PIN, 1, TIMEOUT);
double distance = duration * 0.034 / 2;

The divide by 2 is because the duration caters for the send and receive, we only want 1 way.

Here’s the full source code, however please don’t rely on this page, I’ve likely updated the code since posting this so checkout the project on Github

#include <stdio.h>
#include "mgos.h"
#include "mgos_app.h"
#include "mgos_gpio.h"
#include "mgos_timers.h"
#include "mgos_mqtt.h"
#include "mgos_config.h"

// as pulseIn isn't supported in Mongoose Arduino compatability library yet, here's a local
// implementation of that. Full credit to "nliviu" on Mongoose OS forums for that
static inline uint64_t uptime()
    return (uint64_t)(1000000 * mgos_uptime());

uint32_t pulseInLongLocal(uint8_t pin, uint8_t state, uint32_t timeout)
    uint64_t startMicros = uptime();

    // wait for any previous pulse to end
    while (state == mgos_gpio_read(pin))
        if ((uptime() - startMicros) > timeout)
            return 0;

    // wait for the pulse to start
    while (state != mgos_gpio_read(pin))
        if ((uptime() - startMicros) > timeout)
            return 0;

    uint64_t start = uptime();

    // wait for the pulse to stop
    while (state == mgos_gpio_read(pin))
        if ((uptime() - startMicros) > timeout)
            return 0;
    return (uint32_t)(uptime() - start);

static void timer_cb(void *arg)
    //send trigger
    mgos_gpio_write(mgos_sys_config_get_app_gpio_trigger_pin(), 1);
    // wait 10 microseconds
    // stop the trigger
    mgos_gpio_write(mgos_sys_config_get_app_gpio_trigger_pin(), 0);

    // wait for response and calculate distance
    unsigned long duration = pulseInLongLocal(mgos_sys_config_get_app_gpio_echo_pin(), 1, mgos_sys_config_get_app_pulse_in_timeout_usecs());
    double distance = duration * 0.034 / 2;
    char strBuffer[64];
    snprintf(strBuffer, sizeof(strBuffer), "{\"report\":{\"distance\":%.2f}}\n", distance);

    mgos_mqtt_pub(mgos_sys_config_get_app_mqtt_tank_level_topic(), strBuffer, strlen(strBuffer), 1, 0);  

enum mgos_app_init_result mgos_app_init(void)
    // set the modes for the pins
    mgos_gpio_set_mode(mgos_sys_config_get_app_gpio_trigger_pin(), MGOS_GPIO_MODE_OUTPUT);
    mgos_gpio_set_mode(mgos_sys_config_get_app_gpio_echo_pin(), MGOS_GPIO_MODE_INPUT);
    mgos_gpio_set_pull(mgos_sys_config_get_app_gpio_echo_pin(), MGOS_GPIO_PULL_UP);

    // Every x second, invoke timer_cb. 2nd arg means repeat continuously    
    mgos_set_timer(mgos_sys_config_get_app_sensor_read_interval_ms(), true, timer_cb, NULL);


Sending data to AWS

Sending the actual data to AWS was easy, it was just a case of calling the MQTT publish function like so

mgos_mqtt_pub(mgos_sys_config_get_app_mqtt_tank_level_topic(), strBuffer, strlen(strBuffer), 1, 0);

I’ve chosen to send the data as JSON as it’ll be easier for me to do something with it later on.

The hardest part was understanding how the config works on Mongoose OS, I’ll have to do another post about that, but just remember that after flashing the device, it loses any previous configuration such as AWS config. So if you flash, you need to run this again.

mos build && mos flash
mos aws-iot-setup

Then head over to the AWS console and test this by subscribing to the topic, you should see something like this

Subscribing to a topic via the AWS IoT console

If your device appears to be working, but nothing is making it to the AWS console, check you haven’t blown away the AWS keys after a flash, run “mos ls”, then “mos aws-iot-setup” if theres no AWS pem/key files on the device.


As stated by the Mongoose OS team, it’s really easy to get started with an ESP32 device and AWS IoT. It took me about 2-3 hours getting the AWS IoT working, but admittedly most of that was lost understanding config injection in Mongoose OS, and drinking beer (it is a Saturday night after all).

Leave a comment or find me on Twitter if you have any suggestions/cries for help.


Convert JSON to YAML in one command

I’ve been playing around with AWS CloudFormation recently, which supports both JSON and YAML. I prefer to use YAML, but a lot of the examples I was looking at were based in JSON. Fortunately there’s a quick way to convert a JSON file to it’s YAML equivalent.

ruby -ryaml -rjson -e 'puts YAML.dump(JSON.load(ARGF))' < linux-bastion.template > linux-bastion.yml

Can’t remember where I found it, likely StackOverflow, anyway it’s a useful command to have!

Getting started with the Mongoose OS and ESP32, an easy tutorial

This year, my goal is to learn as much as I can about IoT and AWS, they go well together and a $10 ESP32 board and a few dollars on the AWS account is a great way to get started.

The finished product!

I’ve spent a few weeknights playing around with Mongoose, most of my time was lost searching for documentation and guides, there’s not a lot out there so hopefully if you’re reading this, it’ll save you some time/pain.

Getting setup with Mongoose OS

You can use the Arduino IDE, but I’m going to jump straight in and use Mongoose instead, have a look at the Mongoose site for reasons why it’s a good choice.

Download and install the MOS tool (check their site for latest instructions). Once it’s installed, from the command line run the following to confirm it’s installed OK.

mos --version

The Mongoose OS command line tool
Version: 1.23
Build ID: 20171229-152500/1.23@2bfb56d3+
Update channel: release

If you run mos ui from the terminal you’ll get a browser window open where you can flash/code/view your ESP, I always prefer a CLI if there is one so I’ll skip over the UI stuff here.

In terms of IDE, the best I’ve found so far is Visual Studio Code, it’s free, and is pretty simple to use. Make sure you install the following plugins:

You can also open a terminal, I’ve got mine setup like this, works well for me so far.

My IDE setup for Mongoose OS development

Make sure you go into the settings for VS Code and change the save mode, I’ve been caught several times wondering why my code isn’t working, only to realise it doesn’t auto save by default and I’d just been deploying the old code over and over again!

"files.autoSave": "onFocusChange",

Deploying a hello world app

Now we’ve got our environment setup, lets deploy an app. I’ve created a skeleton hello world app so grab that using

git clone
cd mongoose-os-apps-hello-world

Now build and deploy it using the following:

  1. mos build – Builds the application, brings in dependencies specified in mos.yml, creates artefact
  2. mos flash – deploys the artefact to the device and restarts it
  3. mos console – attaches a console to the device so you can see any output

Check the github, but notice in the code I’ve setup a loop to keep printing hello world, this demonstrates importing some libraries and using a callback. Also notice that there is a newline on the end of the print, I found that if you don’t do this, the print buffers output, so the app will appear like it’s doing nothing for around 5 seconds then you’ll get one line printed with Hello World 50 odd times. The newline forces the buffer to flush to console.

#include <stdio.h>

#include "mgos_app.h"
#include "mgos_timers.h"

static void timer_cb(void *arg) {
  // don't forget the \n to flush print buffer
  printf("Hello world!\n");
  (void) arg;

// Entry point to app
enum mgos_app_init_result mgos_app_init(void) {
  // Repeat the timer_cb every 1 second, second argument is boolean to repeat (1 == true)
  mgos_set_timer(1000 , 1 , timer_cb, NULL);


Get familiar with that, then move onto the next section where we make it more interesting.

Making the app a bit more interesting, adding some LEDs

OK, so we can control the GPIO pins on the board (those with numbers) to either turn them on or off (HI or LOW). What I’ve done here is attach a resistor and LED to 3 of the GPIO pins and we’ll have the code cycle through those every second.

I’ve annotated the code with comments, but one important thing to note is that the GPIO pins can work on either input or output, so we have to set the mode to output (we’re outputting voltage to power the LEDs, not reading input data).

#include <stdio.h>
#include "mgos_app.h"
#include "mgos_gpio.h"
#include "mgos_timers.h"

// Define the GPIO pins for the LEDs
#define RED_LED 16
#define YELLOW_LED 17
#define GREEN_LED 18

// Initialise states for LEDs, start with the red on and others off
bool redOn = 1;
bool yellowOn = 0;
bool greenOn = 0;

// This function gets invoked by the timer, every X seconds
static void timer_cb(void *arg) {
  // Check which LED is on, then flip the next one on
  if (redOn){
    redOn = 0;
    yellowOn = 1;
    greenOn = 0;
  } else if (yellowOn){
    redOn = 0;
    yellowOn = 0;
    greenOn = 1;
  } else if (greenOn){
    redOn = 1;
    yellowOn = 0;
    greenOn = 0;

// Update states of LEDs /GPIO pins
  mgos_gpio_write(RED_LED, redOn);
  mgos_gpio_write(YELLOW_LED, yellowOn);
  mgos_gpio_write(GREEN_LED, greenOn);

  (void) arg;

// Entry point to app
enum mgos_app_init_result mgos_app_init(void) {
  // GPIO pins can work on input or output, as we're lighting LEDs, they
  // are all set to output
  mgos_gpio_set_mode(RED_LED, MGOS_GPIO_MODE_OUTPUT);
  mgos_gpio_set_mode(YELLOW_LED, MGOS_GPIO_MODE_OUTPUT);
  mgos_gpio_set_mode(GREEN_LED, MGOS_GPIO_MODE_OUTPUT);
  // Every 1 second, invoke timer_cb. 2nd arg means repeat continuously
  mgos_set_timer(1000 , 1 , timer_cb, NULL);


Also note that some of pins are reserved, here’s a quote from Kolbans book (well worth checking out if you haven’t already)

There are 34 distinct GPIOs available on the ESP32.
They are identified as:
• GPIO_NUM_0 – GPIO_NUM_19 Page 232
The ones that are omitted are 20, 24, 28, 29, 30 and 31.
Note that GPIO_NUM_34 – GPIO_NUM_39 are input mode only. You can not use these pins for signal output.
Also, pins 6, 7, 8, 9, 10 and 11 are used to interact with the SPI flash chip … you can not use those for other purposes.

Build and deploy the LED blinker code using

mos build && mos flash && mos console

Now we need to hook up the components, but at least with the above running you can use a multimeter to check the pins are being triggered, put the black probe on GND then the red on one of the specified pins, every 3 seconds you should see it powered on for 1 second, we’re almost there!

Prototyping the electronics

The easiest way of connecting this all up is with a breadboard. Take a positive lead from each of the 3 pins, then pass each one through a resistor, the LED, and then back to ground, you’re setting this up in parallel and only one route will be powered at a time.

LEDs connected via the GPIO pins

Remember to have the longer legs on the LED on the positive side, the resistors can be either way around, it doesn’t matter. You can also route all 3 to the same ground, I’ve used the negative rail on the breadboard to make this a bit cleaner.


That’s it, pretty simple, but it’s covered a lot of groundwork, at least in getting the environment setup, deploying an app, plugging in some electronics and testing it out. Next up I’m planning to use the JSN-SR04 sensor, which is a waterproof ultrasonic sensor. I’m already most of the way there using the LED example, should just be a case of reading from an echo pin!

Feel free to comment or catch me on Twitter if this was helpful!


Listing devices on your local network

Plugged my ancient Raspberry Pi in to my router (yeah the original, that doesn’t have on board wifi) and wanted to SSH into it, found this command to easily show you what devices are on your network, listing the IP address and the hostname

nmap -sL 192.168.1.* | grep \(1
Nmap scan report for D-Link.Home (
Nmap scan report for envoy (
Nmap scan report for Jamess-MBP (
Nmap scan report for LGSmartTV (
Nmap scan report for MeaganAir (
Nmap scan report for LGwebOSTV (
Nmap scan report for Meagans-iPhone (
Nmap scan report for retropie (

There we go, last one on the list. The piping to grep just filters out addresses that are in use, otherwise it’d list all 255 addresses.

Exporting data from Enphase APIs

I bought an Enphase solar powered system in early 2017, one of the major appeals of the Enphase brand was that is has developer APIs, so I could track my systems power generation and even household usage.

My aim is to get this data out of the Enphase APIs then try to make sense of it, possibly bringing in weather data and water usage from IoT systems on my ever increasing to-do list.

There doesn’t seem to be a way of bulk exporting data, and with rate limiting on the free tier I figured I could write a script that hits the stats API for each day, grabs a whole days data then persist it, wait a period of time, then hit the next day. By delaying, I don’t break the rate limit, I can just kick the script off and have a coffee!

import pendulum
import time
import requests
import json
import os

userId = os.environ['ENHPASE_USER_ID']
key = os.environ['ENPHASE_KEY']
systemId = os.environ['ENPHASE_SYSTEM_ID']
# set tz code from
tzCode = os.environ['TIME_ZONE']
# free tier only allows so many requests per minute, space them out with delays
sleepBetweenRequests = int(os.environ['SLEEP_BETWEEN_REQUESTS'])
# Start/end dates to export
startDate = pendulum.parse(os.environ["START_DATE"], tzinfo=tzCode)
endDate = pendulum.parse(os.environ["END_DATE"], tzinfo=tzCode)
# Shouldn't need to modify this
url = '' % systemId

print('Starting report between %s and %s' % (startDate.to_date_string(), endDate.to_date_string()))

period = pendulum.period(startDate, endDate)

for dt in period.range('days'):

print('date [%s] START [%s] END [%s]' % (dt, dt.start_of('day'), dt.end_of('day')))
# HTTP Params
params = {'user_id': userId,
'key': key,
'datetime_format': 'iso8601',
'start_at': dt.start_of('day').int_timestamp,
'end_at': dt.end_of('day').int_timestamp}

r = requests.get(url=url, params=params)

if r.status_code == 200:
filename = "out/%s.json" % dt.to_date_string()
os.makedirs(os.path.dirname(filename), exist_ok=True)
with open(filename, 'w') as outfile:
json.dump(r.json(), outfile, indent=2)
print('Success %s' % dt.to_date_string())
print('Failed to get data for %s' % dt.to_date_string())

Run that using python and it’ll dump out a json file in /out/ per day. I’ve only got the stats API setup so far, but feel free to pull request in the others!

Also keen an eye on my github project, I’ll be adding more to it, the plan is to get it running in AWS with scheduled lambdas to pull in data on an hourly basis, mash in data from weather APIs, and any IoT systems I build.

A quick delve into Docker

The problem

When you run an application such as tomcat, you need to make sure you have the correct version of Java installed and configured, and then download the version of tomcat that is compatible with that version of Java. If you upgrade the version of Java, you’ve then got to setup a new JDK and potentially setup a new version of tomcat.

This is quite annoying as you’re going to have multiple versions of Java and tomcat installed on your machine, and at some point you’re going to get confused and have the wrong version running, or an environment variable not set correctly.

When you have an entire dev team doing the same thing, you’ll end up with people on different versions, different environment configurations, and ultimately you’ll get those “but it works on my environment” bugs at some point.

The solution

Docker allows you to run applications in a container. It’s a bit like a VM, but without the OS. That doesn’t make much sense, and it didn’t to me either to start with. When you run a VM you’re running a full blown OS, and the hypervisor layer is bridging the kernel of the guest and the host.

If you’re just running apache on that VM it’s a bit of an overkill.

You can think of a container as being a stripped down linux OS. Theres barely anything there, just the bare bones; a filesystem and networking. Theres no gui, no pre-installed packages like apache, theres literally just enough to start the container. This makes them very lightweight and fast. It’s then up to you to install your own applications in those containers.

Why is this good?

Don’t need a particular application anymore? Fine, delete the container. No need to hunt around your system manually uninstalling packages that you have scattered around.

Need to ship it to another machine? Publish the container and let others download it as an image ready to run.

When you start a new project, get your DevOps guy (or a dev) to build some containers for all of the dependencies of your project, you’ll probably need things like tomcat & mysql which are easy because theres already official docker containers for those, but you may also need to build your own custom containers to stub your integration points, or to install integration point software in a stub mode. Then, when your project kicks off and the devs are ready to get started, all they need to do is pull the images and run them, and they’ve got a full stack dev environment ready to use. Marvellous.

As I’m new to Docker, perhaps I’m not the best at explaining it. I’d highly recommend you watch this:

Lets have a look at a few containers, tomcat and mysql.


You could build your own tomcat container, but theres an official image that you can use, start it up using:

docker run -it --rm -p 8888:8080 tomcat:8.0

The docker run command is going to start a container from an image, as you likely won’t have that image, it will realise this and then pull it. You could pull it separately using a docker pull command, but the run figures this out for us.

The it flag is for interactive mode, so you can see the output, in this case, the output of catalina being executed. An alternative would be to use the d flag which runs it as a daemon (background task).

The rm flag is for automatically removing the container if it exits, you don’t strictly need this, but the tomcat official image page suggests you include it.

The -p flag tells docker to forward port 8080 on the container to 8888 on the host, so we can access tomcat outside of the container, it’d be a bit pointless without this. An alternative flag would be -P which forwards all ports.

The tomcat:8.0 is the image name along with its tag.

Run the docker run command and you should see the output of the catalina start process. You can open another tab and run docker ps to see its process state.

Now lets try and access the tomcat manager page, in order to do so you need to get the IP of the boot2docker instance. Remember, that boot2docker is the docker host, not the laptop, so you need to access containers via boot2dockers vm. It took me a little while to realise this, I was running a docker inspect on the container, finding the network settings/IP and trying to access that, not realising that its actually the boot2docker vm you need to access.

You can easily do this by obtaining the ip using  boot2docker ip. Then you can access:

Thats it, you should be on the tomcat page now. I’ll leave it up to you to make use of it, perhaps extend the tomcat image and deploy your own applications?


As with tomcat, there is an officially supported MySql container, download it using “docker pull” like so.

docker pull mysql:5.7
Pulling repository mysql
463d9ebad128: Download complete

Run up that mysql image using

docker run -d -p 3306:3306 mysql

The -d runs the container as a daemon (background task) and returns you the container id.

Next have a look at docker ps to confirm its running

docker ps
2eac3ea6b64e mysql:latest "/usr/bin/mysqld_saf 5 seconds ago Up 2 seconds;3306/tcp pensive_heisenberg

Now lets try to connect to it

docker run -it --link myfirstmysql:mysql --rm mysql sh -c 'exec mysql -h"$MYSQL_PORT_3306_TCP_ADDR" -P"$MYSQL_PORT_3306_TCP_PORT" -uroot -p"$MYSQL_ENV_MYSQL_ROOT_PASSWORD"'

This is spawning another container which will run the mysql client. At first, I was trying to use the mysql client on my local machine to connect to it, but then I realised that I was missing the point of docker, why install mysql locally, just to use the client, when really I could be doing that via docker?


Round up

I’m yet to use this in anger, but I can already think of applications on previous projects that would have benefitted from this. I’ve been using virtualisation with vagrant and chef for a while, so I’m interested to see how different things will work out by using docker.

Backing up wordpress automatically

I’ve had some difficulties getting the BackWPup plugin to work, it seems that you can’t backup everything in one job as the script takes too long to run and the server will terminate it, causing a failed job.

The 2 errors I was seeing are

  1. WARNING: Job restart due to inactivity for more than 5 minutes.
  2. ERROR: Uploaded file size and local file size don’t match.

Which led me to this post, which recommends a better way to structure your backup jobs, basically just split them out into content, plugins, install etc so they finish within the timeout threshold.

Pig Latin parsing CSV files with quoted commas

In the not too distant past, I was working on a BigData engagement using Apache Pig. I took CSV parsing for granted and expected it to just work, however if you have quoted strings with commas, it won’t behave as you’d expect.


1,"This is a sample sentence, same sentence, just happens to include a few commas" 

When you use:

load 'input/oneLiner.txt' using PigStorage(',') 

It delimits based on the comma, regardless of it being in a quoted string, so you end up with 4 fields;

This is a sample sentence
same sentence
just happens to include a few commas

The solution to this is to use a custom loader, such as

To get started with this, I had to clone the piggybank repository (collection of user defined functions, why this didn’t make it to the base release I’m not entirely sure) and build from source, unfortunately I didn’t keep any notes for this, but its relatively straightforward, see the Apache Pig wiki page here

Getting your head around the Couchbase SyncGateway

I like Couchbase. One of the things that really appeals to me is the sync gateway. As a mobile developer I often find that the apps I’m developing are just interfaces into some backend service. Somewhere out there in the cloud I’ll have a web application that sits on top of some database (nodejs/mongoDB is a combo I’ve been using recently). Then there comes the mobile app, which will be consuming these services, which would be fine if 4G/wifi was everywhere (I can’t even get a cellular signal at my place, let alone dream of 4G).

We’re into the realms of apps working offline, you then have the pain of syncing data and dealing with conflicts. You can make your life easier by using a SyncAdapter on Android, or perhaps a framework like Restkit if you’re developing on iOS, heck, you can even implement the syncing yourself (don’t do that, that road leads to madness..speaking from experience)…OR…you can just use Couchbase and the SyncGateway.

In short, the SyncGateway is an application that sits between your Couchbase server, and your Couchbase Lite enabled mobile apps. This means you can access your data on your local CBLite database, and not have to worry (too much) about syncing this to the Couchbase server.

Getting setup

I have to admit, the documentation is a little confusing when it comes to explaining how the components hang together, but after attending Couchbase Live in London a month or so back I was able to track down those who are in the know, and put the missing piece into my puzzle of confusion; bucket syncing.

For the purpose of explaining how this works, I’ll use my “Coin Collector” android app as the example. The app needs to get its data on coins from a couchbase server. It should be able to work offline and sync periodically. I’m using bucket syncing so I can have a web page to administer coins such as adding new coins to altering market values.
The documentation is really missing a diagram like the following


Let me cover the 4 points in blue numbers:

  1. Regardless of which mobile platform you’re using, it’ll be connecting to the sync gateway via the REST apis, this is where “json over the wire” comes into play.
  2. As the mobile apps use their own bucket, you need to configure the gateway to tell it where to put documents. If you check my config below; then this is done by the “aussie-coins-syncgw” configuration element, you can see that the bucket is set to “aussie-coins-bucket-sync-db” on the localhost couchbase server (sync and db are running on my local vm)
  3. This is where the magic happens. Bucket shadowing in the later releases of the Sync Gateway allow it to sync changes between your “mobile” bucket, and your “backend” bucket. You can see this configured by the “shadow” element in my config.json
  4. Your backend server apps can just connect to the “aussie-coins-bucket” and be totally oblivious to what is happening in the mobile side of your architecture.
    "interface": ":4984",
    "adminInterface": ":4985",
    "log": ["CRUD", "CRUD+", "HTTP", "HTTP+", "Access", "Cache", "Shadow", "Shadow+", "Changes", "Changes+"],
    "databases": {
        "aussie-coins-syncgw": {
            "server": "http://localhost:8091",
            "bucket": "aussie-coins-bucket-sync-db",
            "sync": `function(doc) {channel(doc.channels);}`,
            "users": {
                "GUEST": {
                    "disabled": false,
                    "admin_channels": ["*"]
            "shadow": {
                 "server": "http://localhost:8091",
                 "bucket": "aussie-coins-bucket"

Some other points to notice in the configuration:

  • The interface port is the port the apps will connect on, the adminInterface is for administering the sync gateway, such as dynamically adding new databases, or altering channels.
  • Logs, I’ve chosen to log everything, you can restrict these if you need, check the Couchbase documentation for further info.
  • I’ve enabled the guest user access on all channels for the purpose of evaluating this, ideally we’d need to restrict the channels that users can use to stop any potential abuse.

Testing it out

As I mentioned above, since the mobile apps will be connecting to the Sync Gateway via a REST api, we can take the mobile app out of the picture and test using a rest client (I’m using Postman for Google Chrome). Lets cover 2 scenarios.

Server Producing

This scenario involves a new document being created on the server, and it being synced to the mobile bucket and available to view on the mobile apps.

Firstly, let me show you what I have in the “aussie-coins-bucket”.


Next, lets create a new document with an ID of 5, for the Ten Cent coin. We should then see it listed in our “aussie-coins-bucket” like so:


Now lets have a look at the log output from the Sync Gateway.

22:49:33.826838 Shadow+: Pulling "5", CAS=1e2dd7153a ... have UpstreamRev="", UpstreamCAS=0
22:49:33.826894 Shadow: Pulling "5", CAS=1e2dd7153a --> rev "1-1d7a1a352c0abb293fdd16883ef6985b"
22:49:33.826909 CRUD+: Invoking sync on doc "5" rev 1-1d7a1a352c0abb293fdd16883ef6985b
22:49:33.903707 Cache: SAVING #8
22:49:33.903984 CRUD: Stored doc "5" / "1-1d7a1a352c0abb293fdd16883ef6985b"
22:49:34.768280 Cache: Received #8 after 864ms ("5" / "1-1d7a1a352c0abb293fdd16883ef6985b")
22:49:34.768305 Cache:     #8 ==> channel "*"
22:49:34.768322 Changes+: Notifying that "aussie-coins-bucket-sync-db" changed (keys="{*}") count=3
22:49:59.849578 Shadow+: Pulling "5", CAS=2423d4b93a ... have UpstreamRev="1-1d7a1a352c0abb293fdd16883ef6985b", UpstreamCAS=c21019dd68
22:49:59.849623 Shadow: Pulling "5", CAS=2423d4b93a --> rev "2-971b4b3009127da5ed2a4770cb45cfe7"
22:49:59.849637 CRUD+: Invoking sync on doc "5" rev 2-971b4b3009127da5ed2a4770cb45cfe7
22:49:59.849749 CRUD+: Saving old revision "5" / "1-1d7a1a352c0abb293fdd16883ef6985b" (68 bytes)
22:49:59.849891 CRUD+: Backed up obsolete rev "5"/"1-1d7a1a352c0abb293fdd16883ef6985b"
22:49:59.850068 Cache: SAVING #9
22:49:59.850207 CRUD: Stored doc "5" / "2-971b4b3009127da5ed2a4770cb45cfe7"
22:50:00.790818 Cache: Received #9 after 940ms ("5" / "2-971b4b3009127da5ed2a4770cb45cfe7")
22:50:00.790838 Cache:     #9 ==> channel "*"
22:50:00.790868 Changes+: Notifying that "aussie-coins-bucket-sync-db" changed (keys="{*}") count=4

As we can see, the Sync Gateway has detected that there is a new document and that it needs to shadow it across, which is does successfully.

On the couchbase server, we can view that document in the mobile bucket, “aussie-coins-sync-db” like so:


Finally, just to prove the mobile clients can see that document via the API, do a GET on http://localhost:4984/aussie-coins-syncgw/5 and you’ll see the following:

    "_id": "5",
    "_rev": "2-971b4b3009127da5ed2a4770cb45cfe7",
    "coin": "Ten Cent"

Mobile Producer

Now we’ll try the opposite, producing documents from the mobile clients and seeing them synced across to the Couchbase server. From a REST client, do a PUT to http://localhost:4984/aussie-coins-syncgw/6 with a json body of:

  "coin":"Twenty Cent"

You should see a response of

    "id": "6",
    "ok": true,
    "rev": "1-e9c16d3887a3958314adff1e3cbd6097"

What we’ve done is to create a document with the ID of 6, for “Twenty Cent”.

Lets have a look at the Sync Gateway logs:

23:19:56.860618 HTTP:  #003: PUT /aussie-coins-syncgw/6
23:19:56.971056 CRUD+: Invoking sync on doc "6" rev 1-e9c16d3887a3958314adff1e3cbd6097
23:19:57.023839 Cache: SAVING #10
23:19:57.024110 CRUD: Stored doc "6" / "1-e9c16d3887a3958314adff1e3cbd6097"
23:19:57.024161 HTTP+: #003:     --> 201   (0.0 ms)
23:19:57.616316 Cache: Received #10 after 592ms ("6" / "1-e9c16d3887a3958314adff1e3cbd6097")
23:19:57.616340 Cache:     #10 ==> channel "*"
23:19:57.616353 Shadow: Pushing "6", rev "1-e9c16d3887a3958314adff1e3cbd6097"
23:19:57.616367 Changes+: Notifying that "aussie-coins-bucket-sync-db" changed (keys="{*}") count=6
23:19:57.852304 Shadow+: Pulling "6", CAS=1c6f07c3ce2 ... have UpstreamRev="", UpstreamCAS=0
23:19:57.852327 Shadow+: Not pulling "6", CAS=1c6f07c3ce2 (echo of rev "1-e9c16d3887a3958314adff1e3cbd6097")
23:19:57.852337 CRUD+: Invoking sync on doc "6" rev 1-e9c16d3887a3958314adff1e3cbd6097
23:19:57.865669 CRUD+: updateDoc("6"): Rev "1-e9c16d3887a3958314adff1e3cbd6097" leaves "1-e9c16d3887a3958314adff1e3cbd6097" still current
23:19:57.865751 Cache: SAVING #11
23:19:57.866050 CRUD: Stored doc "6" / "1-e9c16d3887a3958314adff1e3cbd6097"
23:19:58.617446 Cache: Received #11 after 751ms ("6" / "1-e9c16d3887a3958314adff1e3cbd6097")
23:19:58.617463 Cache:     #11 ==> channel "*"
23:19:58.617482 Changes+: Notifying that "aussie-coins-bucket-sync-db" changed (keys="{*}") count=7

We can then see the document in the “aussie-coins-bucket-sync-db”:


…and then in the “aussie-coins-bucket”:

The Sync Gateway is a useful application, and really does make the Couchbase offering even more appealing. Once you can get your head around the bucket shadowing (which you should if you’ve made it this far) then it can be easy to work with.

Comment or find me on Twitter (@jameselsey1986) if you have any questions!

Why the duplication?

Having 2 buckets for the same data had me raise an eyebrow initially, but after asking on Google Groups, it does make sense. You can’t expect the backend app servers to maintain sync meta data on new documents it creates. Perhaps Couchbase will alter this in the future.