[KOSD Series] Discussion about Cosmos DB Performance

KOSD, or Kopi-O Siew Dai, is a type of Singapore coffee that I enjoy. It is basically a cup of coffee with a little bit of sugar. This series is meant to blog about technical knowledge that I gained while having a small cup of Kopi-O Siew Dai.

kosd-cosmos-db.png

During a late dinner with my friend on 12 January last month, he commented that he encountered a very serious performance problem in retrieving data from Cosmos DB (pka DocumentDB). It’s quite strange because, in our IoT project which also stores millions of data in Cosmos DB, we never had this problem.

Two weeks later, on 27 January, he happily showed me his improved version of the code which could query the data in about one to two seconds.

Yesterday, after having a discussion, we further improved the code. Hence, I’d like to write down this learning experience here.

Preparation

Due to the fact that we couldn’t demonstrate using the real project code, I thus created a sample project getting data from database and collection on my personal Azure Cosmos DB account. The database contains one collection which has 23,967 records of Student data.

The Student class and the BaseEntity class that it inherits from are as follows.

public class Student : BaseEntity
{
    public string Name { get; set; }

    public int Age { get; set; }

    public string Description { get; set; }
}
public abstract class BaseEntity
{
    [JsonProperty(PropertyName = "id")]
    public string Id { get; set; }

    public string Type { get; set; }

    public DateTime CreatedAt { get; set; } = DateTime.Now;
}

You may wonder why I have Type defined.

Type and Cost Saving

The reason of having Type is that, before DocumentDB was rebranded as Cosmos DB in May 2017, the DocumentDB pricing is based on collections. Hence, the more collection we have in the database, the more we need to pay.

confused-about-documentdb-pricing.png

DocumentDB was billed per collection in the past. (Source: Stack Overflow)

To overcome that, we squeeze the different types of entities in the same collection. So, in the example above, let’s say we have three classes — Students, Classroom, Teacher that inherit from BaseEntity, then we will put the data of the three classes in the same collection.

Then here comes a problem: How do we know which document in the collection is Student, Classroom or Teacher? There is where the property Type will help us. So in our example above, the possible value for Type will be Student, Classroom, and Teacher.

Hence, when we add a new document through repository design pattern, we have the following method.

public async Task<T> AddAsync(T entity)
{
    ...

    entity.Type = typeof(T).Name;

    var resourceResponse = await _documentDbClient.CreateDocumentAsync(UriFactory.CreateDocumentCollectionUri(_databaseId, _collectionId), entity);

    return resourceResponse.StatusCode == HttpStatusCode.Created ? (dynamic)resourceResponse.Resource : null;
}

Original Version of Query

We used the following code to retrieve data of a class from the collection.

public async Task<IEnumerable<T>> GetAllAsync(Expression<Func<T, bool>> predicate = null)
{
    var query = _documentDbClient.CreateDocumentQuery<T>(UriFactory.CreateDocumentCollectionUri(_databaseId, _collectionId));

    var documentQuery = (predicate != null) ?
        (query.Where(predicate)).AsDocumentQuery():
        query.AsDocumentQuery();

    var results = new List<T>();
    while (documentQuery.HasMoreResults)
    {
        results.AddRange(await documentQuery.ExecuteNextAsync<T>());
    }

    return results.Where(x => x.Type == typeof(T).Name).ToList();
}

This query will run very slow because the line where it filters the class is after querying data from the collection. Hence, in the documentQuery, it may already contain data of three classes (Student, Classroom, and Teacher).

Improved Version of Query

So one obvious way is to move the line of filtering by Type above. The improved version of code now looks as such.

public async Task<IEnumerable<T>> GetAllAsync(Expression<Func<T, bool>> predicate = null)
{
    var query = _documentDbClient
        .CreateDocumentQuery<T>(UriFactory.CreateDocumentCollectionUri(_databaseId, _collectionId))
        .Where(x => x.Type == typeof(T).Name);

    var documentQuery = (predicate != null) ?
        (query.Where(predicate)).AsDocumentQuery():
        query.AsDocumentQuery();

    var results = new List<T>();
    while (documentQuery.HasMoreResults)
    {
        results.AddRange(await documentQuery.ExecuteNextAsync<T>());
    }

    return results;
}

By doing so, we managed to reduce the query time significantly because all the actual filtering will be done at Cosmos DB side. For example, there was one query I managed to reduce the query time of it from 1.38 minutes to 3.42 seconds using the 23,967 records of Student data.

Multiple Predicates

The code above however has a disadvantage. It cannot accept multiple predicates.

I thus changed it to be as follows so that it returns IQueryable.

public IQueryable<T> GetAll()
{
    return _documentDbClient
        .CreateDocumentQuery<T>(UriFactory.CreateDocumentCollectionUri(_databaseId, _collectionId))
        .Where(x => x.Type == typeof(T).Name);
}

This has another inconvenience is there whenever I call GetAll, I need to remember to load the data with HasMoreResults as shown in the code below.

var studentDocuments = _repoDocumentDb.GetAll()
    .Where(s => s.Age == 8)
    .Where(s => s.Name.Contains("Ahmad"))
    .AsDocumentQuery();

var results = new List<T>();
while (studentDocuments.HasMoreResults)
{
    results.AddRange(await studentDocuments.ExecuteNextAsync<T>());
}

Conclusion

This is just an after-dinner discussion about Cosmos DB between my friend and me. If you have any better idea of designing repository for Cosmos DB (pka DocumentDB), please let us know. =)

Advertisements

TCP Listener on Microsoft Azure for IoT Devices

cloud-service-worker-role-automation-runbook.png

After working on the beacon projects back half a year ago, I was given a new task which is building a dashboard for displaying data collected from IoT devices. The IoT devices basically are GPS tracker with a few other additional sensors such as temperature and shaking detection.

I’m new to IoT field, so I’m going to share in this article what I had learnt and challenges I faced in this project so that it would benefit to juniors who are going to do similar things.

Project Requirements

We plan to have the service to receive data from the IoT devices to be on Microsoft Azure. There will be thousands or even millions of the same devices deployed eventually, so choosing cloud platform to help us scaling up easily.

We also need to store the data in order to display it on dashboard and reports for business use cases.

Challenge 1: Azure IoT Hub and The Restriction of Device Firmware

In the documentation of the device protocol, there is a set of instructions as follows.

First when device connects to server, module sends its IMEI as login request. IMEI is sent the same way as encoding barcode. First comes short identifying number of bytes written and then goes IMEI as text (bytes).

After receiving IMEI, server should determine if it would accept data from this module. If yes server will reply to module 01 if not 00.

I am not sure who wrote the documentation but I am certain that his English is not that easy to comprehend in the first read.

Anyway, this is a good indication that Azure IoT Hub will be helpful because it provides secure and reliable C2D (Cloud-to-Device) and D2C communication with HTTP, AMQP, and MQTT support.

However, when I further read the device documentation, I realized that the device could only send TCP packets over in a protocol the device manufacturer defined. In addition, the device doesn’t allow us to update its firmware at this moment, making it to send data using protocols accepted by Azure IoT Hub is impossible.

There is a fierce discussion about this on Stack Overflow. Unfortunately, none of the respondents understood what the OP was trying to say.

So, I have to say bye-bye to Azure IoT Hub and move on to build TCP Listener myself on Azure.

Challenge 2: Hosting TCP Listener on Azure

There is a great code sample on how to build a TCP listener in C# to listen for connections from TCP network clients.

So, where could we put this code at?

Could we use Azure App Service, such as Functions or Web Apps? Unfortunately, no. This is because only 80/TCP and 443/TCP are exposed publicly and the only protocol that works is HTTP. In addition, App Service is all IIS, the web server provides the entire platform, there is no room for long running processes or threads that can sit and wait for communication on another port outside of IIS.

The only easy option we have now is to use Azure Cloud Service with Worker Role. Worker Role does not use IIS and it can run our app standalone.

creating-worker-role.png

Creating a new Cloud Service project with one Worker Role on Visual Studio 2017.

A default template of WorkerRole class will be provided.

public class WorkerRole : RoleEntryPoint
{
    private readonly CancellationTokenSource cancellationTokenSource = new CancellationTokenSource();
    private readonly ManualResetEvent runCompleteEvent = new ManualResetEvent(false);

    public override void Run()
    {
        Trace.TraceInformation("TrackerTcpListener is running");

        try
        {
            this.RunAsync(this.cancellationTokenSource.Token).Wait();
        }
        finally
        {
            this.runCompleteEvent.Set();
        }
    }

    public override bool OnStart()
    { 
        // Set the maximum number of concurrent connections
        ServicePointManager.DefaultConnectionLimit = 12;

        // For information on handling configuration changes
        // see the MSDN topic at https://go.microsoft.com/fwlink/?LinkId=166357.

        bool result = base.OnStart();

        Trace.TraceInformation("TrackerTcpListener has been started");

        return result;
    }

    public override void OnStop()
    {
        Trace.TraceInformation("TrackerTcpListener is stopping");

        this.cancellationTokenSource.Cancel();
        this.runCompleteEvent.WaitOne();

        base.OnStop();

        Trace.TraceInformation("TrackerTcpListener has stopped");
    }

    private async Task RunAsync(CancellationToken cancellationToken)
    {
        // TODO: Replace the following with your own logic.
        while (!cancellationToken.IsCancellationRequested)
        {
            Trace.TraceInformation("Working");
            await Task.Delay(1000);
        }
    }
}

It’s obvious that the first method we are going to work on is the RunAsync method with a “TODO” comment.

However, before that, we need to define an IP Endpoint for this TCP listener so that we can tell the IoT device to send the packets to the specified port on the IP address.

worker-role-endpoints.png

Configuring Endpoints of a Cloud Service.

With endpoints defined, we can then proceed to modify the code.

private async Task RunAsync(CancellationToken cancellationToken)
{
    try
    {
        TcpClient client;

        while (!cancellationToken.IsCancellationRequested)
        {
            var ipEndPoint = RoleEnvironment.CurrentRoleInstance.InstanceEndpoints["TcpListeningEndpoint1"].IPEndpoint;
            
            var listener = new System.Net.Sockets.TcpListener(ipEndPoint) { ExclusiveAddressUse = false };
            listener.Start();

            // Perform a blocking call to accept requests.
            client = listener.AcceptTcpClient();

            // Get a stream object for reading and writing
            NetworkStream stream = null;

            try
            {
                stream = client.GetStream();

                await ProcessInputNetworkStreamAsync(stream);
            }
            catch (Exception ex)
            {
                // Log the exception
            }
            finally
            {
                // Shutdown and end connection
                if (stream != null)
                {
                    stream.Close();
                }

                client.Close();

                listener.Stop();
            }
        }
    }
    catch (Exception ex)
    {
        // Log the exception
    }
}

The code for the method ProcessInputNetworkStreamAsync above is as follows.

private async Task ProcessInputNetworkStreamAsync(string imei, NetworkStream stream)
{
    Byte[] bytes = new Byte[5120];
    int i = 0;
    byte[] b = null;
    var receivedData = new List<string>();

    while ((i = stream.Read(bytes, 0, bytes.Length)) != 0)
    {
        receivedData = new List<string>();

        for (int reading = 0; reading < i; reading++)
        {
            using (MemoryStream ms = new MemoryStream())
            {
                ms.Write(bytes, reading, 1);
                b = ms.ToArray();
            }
            
            receivedData.Add(ConvertHexadecimalByteArrayToString(b));
        }

        Trace.TraceInformation("Received Data: " + string.Join(",", receivedData.ToArray()));

        // Respond from the server to device
        byte[] serverResponse = ConvertStringToHexadecimalByteArray("<some text to send back to the device>");
        stream.Write(serverResponse, 0, serverResponse.Length);
    }
}

You may wonder what I am doing above with ConvertHexadecimalByteArrayToString and ConvertStringToHexadecimalByteArray methods. They are needed because the packets used in the TCP protocol of the device is in hexadecimal. There is a very interesting discussion about how to do the conversion on Stack Overflow, so I won’t repeat it here.

Challenge 3: Multiple Devices

The code above is only handling one port. Unfortunately, the IoT device doesn’t send over the IMEI number or any other identification number of the device when the actual data pack is sent to the server. Hence, that means if there is more than one IoT device sending data to the same port, we will have no way to identify who is sending the data at the server side.

Hence, we need to make our TCP Listener to listen on multiple ports. The way I chose is to use List<Task> in the Run method as shown in the code below.

public override void Run()
{
    try
    {
        // Reading a list of ports assigned for trackers use
        ...

        var tasks = new List<Task>();
        
        foreach (var port in trackerPorts)
        {
            tasks.Add(this.RunAsync(this.cancellationTokenSource.Token, port));
        }
 
        Task.WaitAll(tasks.ToArray());
    }
    finally
    {
       this.runCompleteEvent.Set();
    }
}

Challenge 4: Worker Role Not Responding Irregularly

This turns out to be the biggest challenge in using Worker Role. After receiving data from the IoT devices for one or two days, the server was not recording any further new data even though the devices are working fine. So far, I’m still not sure about the cause even though there are people encountering similar issues as well.

Hence, I have to find a way to automatically restart the Worker Role for me. Thus, I decided to use PowerShell script to reboot the instance. There is a sample code on Microsoft Technet Gallery – Script Center which does similar thing.

I proceed to use Azure Automation which provides Runbooks to help handling the creation, deployment, monitoring, and maintenance of Azure resources. The Powershell Workflow Runbook that I use for rebooting the worker role daily is as follows.

workflow Reboot-CloudService
{
    Write-Output "Started!"
    
    $azureSubscriptionId = Get-AutomationVariable -Name "AzureSubscriptionId"
    $cloudServiceName = Get-AutomationVariable -Name "CloudServiceName"
    $workerRoleInstanceName = Get-AutomationVariable -Name "WorkerRoleInstanceName" 
    
    $myCredential = Get-AutomationPSCredential -Name "Chun Lin"
    Add-AzureAccount -Credential $myCredential
    
    Select-AzureSubscription -SubscriptionId $AzureSubscriptionId

    Write-Output "Restarting for cloud service: $cloudServiceName."

    ReSet-AzureRoleInstance -ServiceName $cloudServiceName -Slot "Production" -InstanceName $workerRoleInstanceName -Reboot

    Write-Output "Restarted successfully!"
}

In case you wonder where I defined the values for variables such as AzureSubscriptionId, CloudServiceName, and WorkerRoleInstanceName, as well as automation PowerShell credential, there are all easily found in the Azure Portal under “Share Resources” section of Azure Automation Account.

variables-and-credentials-in-automation.png

Providing credentials and variables for the Runbook.

After setting up the Runbook, we need to define schedules in Automation Account and then link it to the Runbook.

setting-schedules-for-automation.png

Setting up schedule and linking it to the Runbook.

There is another tool in the Azure Portal that I find it to be very useful to debug my PowerShell script in the Runbook. It is called the “Test Pane”. By using it, we can easily find out if the PowerShell script is correctly written to generate desired outcome.

test-pane.png

Test Pane available in Runbook.

After that, we can easily get a summary of how the job runs on Azure Portal, as shown in the following screenshot.

azure-automation.png

Job Statistics of Azure Automation.

Yup, that’s all what I had learnt in the December while everyone was enjoying the winter festivals. Please comment if you find a better alternative to handle the challenges above. Thanks in advance and happy new year to you!

References

[KOSD Series] IP Addresses of Our Azure App Services that need to be Whitelisted by Our API Providers

KOSD, or Kopi-O Siew Dai, is a type of Singapore coffee that I enjoy. It is basically a cup of coffee with a little bit of sugar. This series is meant to blog about technical knowledge that I gained while having a small cup of Kopi-O Siew Dai.

kosd-azure_web_app-powershell

It is a common scenario for developers to integrate with different parties by using their APIs. Most of the time, the APIs are located in a locked-down network environment where only whitelisted IP addresses are allowed to access their APIs. We will then be asked to give the API providers the IP addresses of our servers.

If it’s our web back-end calling the APIs and we host our web applications on Microsoft Azure App Services, then how could we get the IP addresses?

As mentioned in a discussion about inbound IP address by Benjamin Perkins, the Escalation Engineer on the Azure team, there are about 4 outgoing IP addresses for an Azure Web Apps normally. To retrieve the outbound IP addresses of an Azure web app, we simply need to get it from the Properties of the web app on Azure Portal.

outbound-ip-addresses-in-azure-app-service.png

Locate the outbound IP addresses here.

We can also get the same result if we use the Azure Resource Explorer which is still in preview now.  Benjamin covered this in a video clip on his article too.

For PowerShell lovers, as pointed out by Adrian Calinescu, one of the commenters on Benjamin’s article, we can use PowerShell to find out the outbound IP addresses too. With the new Azure Cloud Shell, we can simply use the following command to retrieve directly the outbound IP addresses of an Azure web app on Azure Portal directly.

Get-AzureRmResource -ResourceGroupName  -ResourceType Microsoft.Web/sites -ResourceName  | select -expand Properties | Select-Object outboundIpAddresses
outbound-ip-addresses-in-azure-app-service-using-powershell.png

Managing Azure resources using shell directly on a browser.

For those who would like to have your own set of outbound IP addresses, please check out ASE (App Service Environment) which grants users control over inbound and outbound application network traffic.

Finally, we can also whitelist all the IP addresses of the Azure datacentres, which can be downloaded here.

azure-datacenter-ip-range-download.png

List of Microsoft Azure Datacentre IP addresses are available on Microsoft website.

References

Load Balancing Azure Web Apps with Nginx

nginx-ubuntu-azurevm.png

This morning, my friend messaged me a Chinese article about how to do clustering with Linux + .NET Core + Nginx. As we are geek first, we are going to try it out with different approaches. While my friend was going to set up on RaspberryPi, as a developer who loves playing with Microsoft Azure, I proceed to do load balancing of Azure Web Apps in different regions with Nginx.

Setup Two Azure Web Apps

Firstly, I deployed the same ASP .NET Core 2 web app to two different Azure App Services. One of them is deployed at Australia East; another one is deployed at South India (Huuray, Microsoft opens Azure India to the world in April 2017!).

The homepage of my web app, Index.cshtml, is as follows to display the information in Request.Headers.

 

Index.png

Since WordPress cannot show the HTML code properly, I show the code as an image here.

 

In the code above, Request.Headers[“X-Forwarded-For”] is used to get the actual visitor’s IP address instead of the IP address of the Nginx load balancer. To allow this to work, we need to have the following codes added in Startup.cs.

app.UseForwardedHeaders(new ForwardedHeadersOptions
{
    ForwardedHeaders = 
        ForwardedHeaders.XForwardedFor | ForwardedHeaders.XForwardedProto
});
azure-regions.png

In this article, we will set up load balancer in Singapore for websites hosting in India and Australia.

Configure Linux Virtual Machine on Azure

Secondly, as described in the Chinese article mentioned above, the Nginx needs to be set up on a Linux server. The OS used in my case is Ubuntu 17.04.

installing-ubuntu-server-17-on-azure.png

Creating a new Ubuntu server running on Microsoft Azure virtual machine.

The Authentication Type that was chosen is the SSH Public Key option. Hence, we need to create public and private keys using OpenSSL tool. There is a tutorial from Microsoft showing steps on how to generate the keys using Git Bash and Putty.

Installing Nginx

After that, I installed Nginx by using the following command.

sudo apt-get install nginx

After installing it, in order to test whether Nginx is installed properly, I visited the public IP address of the virtual machine. However, it turns out that I couldn’t visit the server because the port 80 by default is not opened on the virtual machine.

Hence, the next step I need to do is opening port using Azure Portal by adding a new inbound security rule for the port 80 and then associate it to the subnet of the virtual network of the virtual machine.

Then when I revisited the public IP of the server, I could finally see the “Welcome to Nginx” success page.

successfully-opened-port-and-installed-nginx.png

Nginx is now successfully running on our Ubuntu server!

Mission: Load Balancing Azure Web Apps with Nginx

As the success page mentioned, further configuration is required. So, we need to edit the configuration file by first opening it up with the following command.

sudo nano /etc/nginx/sites-available/default

The first section that I added is the Cache Configuration.

# Cache configuration
proxy_temp_path /var/www/proxy_tmp;
proxy_cache_path /var/www/proxy_cache levels=1:2 keys_zone=my_cache:20m inactive=60m max_size=500m;

The proxy_temp_path is the path to the directory where the temporary files should be stored at when the response from the upstream server cannot fit into the configured buffers.

The proxy_cache_path is about in which directory the cache should be stored at. The levels=1:2 means that the cache will be stored in a single-character directory with a two-character subdirectory. The keys_zone parameter defines a my_cache cache zone which can store 20MB of keys at most but with the maximum size of the actual data to be 500MB. The inactive=60m means the maximum inactive time cache can be stored, which is 60 minutes in this case.

Next, upstream needs to be defined as follows.

# Cluster sites configuration
upstream backend {
    server dotnetcore-clustering-web01.azurewebsites.net fail_timeout=30s;
    server dotnetcore-clustering-web02.azurewebsites.net fail_timeout=30s;
}

For the default server configuration, we need to make a few modifications to it.

# Default server configuration
# 
server {
    listen 80 default_server;
    listen [::]:80 default_server;
    server_name localhost;
    
    ...
    
    location / {
        proxy_pass http://backend;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        try_files $uri $uri/ =404;
    }
}

Now, we just need to restart the Nginx with the following command.

sudo service nginx restart

Then when we visit the Ubuntu server again, we will realize that we sort of able to reach Azure Web Apps but not really so because it says 404!

404-on-azure.png

Oops, the Nginx routes the visitor to 404 land.

Troubleshooting 404 Error

According to another article which is written by Issac Lázaro, he said this was due to the fact that Azure App Service uses cookies to do ARR (Application Request Routing), hence we need to have the Ubuntu server to pass the header to the web apps by modifying our Nginx configuration to the following.

# Cluster sites configuration
upstream backend {
    server localhost:8001 fail_timeout=30s;
    server localhost:8002 fail_timeout=30s;
}
...

server {
    listen 8001;
    server_name web01;

    location / {
        proxy_set_header Host dotnetcore-clustering-web01.azurewebsites.net;
        proxy_pass http://dotnetcore-clustering-web01.azurewebsites.net;
    }
}

server {
    listen 8002;
    server_name web02;
    
    location / {
        proxy_set_header Host dotnetcore-clustering-web02.azurewebsites.net;
        proxy_pass http://dotnetcore-clustering-web02.azurewebsites.net;
    }
}

Then when we refresh the page, we shall see the website is loaded correctly with the content will be delivered from either web01 or web02.

success.png

Yay, we make it!

Yup, that’s all about setting up a simple Nginx to load balance multiple Azure Web Apps. You can refer to the following articles for more information about Nginx and load balancing.

References

  1. How to open ports to a virtual machine with the Azure portal
  2. Can’t start Nginx – Job for nginx.service failed
  3. Linux+.NetCore+Nginx搭建集群
  4. Understanding Nginx HTTP Proxying, Load Balancing, Buffering, and Caching
  5. Module ngx_http_upstream_module
  6. How To Set Up Nginx Load Balancing with SSL Termination

 

IoT Hub First Peek

The Internet of Things (IoT) is here today, and it begins with the data, devices, and services already at work in your organization. When your “things” are connected to each other and to the cloud, you create new ways to improve efficiency, enable innovation, and transform your business.

This line is printed on the front page of a Microsoft booklet distributed during the lunchtime workshop “Connecting and Building the Internet of Things (IoT)” conducted by Gerald Goh, Microsoft Technical Evangelist. Gerald shared with us technologies such as AMQP, MQTT, Message Broker in Azure, Device Explorer, and so on.

gerald-goh.png

Gerald is sharing Azure IoT Hub during the lunchtime workshop.

IoT hasn’t gone totally mainstream, however, and we have yet to feel its impact. In many ways it is roughly where the big data movement was few years ago — consisting mainly of a buzzword that’s not yet widely understood.

Nevertheless, Gerald’s workshop does give me, a web developer who doesn’t know much about this field, a helpful quick start about IoT. After reading and experimenting, I learn more about the capability of Microsoft Azure in IoT and thus I’d like to share with you about what I’ve learnt so far about Azure IoT Hub.

Message Broker

I’m working in Changi Airport. In the airport, we have several shops serving the travelers and visitors. Most of the shops have a back-end system that integrates several systems such as the retail system, e-commerce website, payment system, Changi Rewards system, inventory management system, the finance system.

So there will be cases where, when a customer buys something at the shop, the retail system needs to send as request to the payment system. Then when the purchase is successful, another purchase request will be sent to the inventory management system and the finance system.

I’m not too sure how the shops link different systems, especially this kind of point-to-point integration will cause a large number of connections among the systems. Hence, the developers of their system may find Message Broker useful.

Message Broker is a physical component that handles the communication between systems. A system sends a message to the message broker, providing the logical name of the receiving systems. The message broker will then search for the receiving systems and then passes the message to them.

message-broker

A message broker mediating the communication between systems. (Image Credit: Message Broker – MSDN)

Messaging Protocols: AMQP and MQTT

Sending a message between systems seems to be an easy task, however, doing it in a reliable and secure manner can be a challenging work.

As shown in the article “Scalable Eventing over Mesos!”, Autodesk is using AMQP (Advanced Message Queuing Protocol) as messaging protocols between two parties with the following main characteristics as goals.

  • Security
  • Reliability
  • Interoperability
  • Standard
  • Open
autodesk-messaging-protocol

AMQP communication between two parties (Image Credit: Autodesk)

AMQP 1.0 is the current specification version. It is also the primary protocol of Azure Event Hubs and Azure Service Bus Messaging after the SBMP (Service Bus Messaging Protocol), the TCP-based protocol which is used inside of .NET client library, is phased out.

Besides AMQP, MQTT (Message Queue Telemetry Transport) is another open protocol based on TCP/IP for asynchronous message queuing which has been developed and matured over past few years.

ibm.png

Dr Andy Stanform-Clark from IBM invented the MQTT protocol. (Image Source: IBM – Wikipedia)

While AMQP is designed to provide the full vibrancy of messaging scenarios, MQTT is designed as an extremely lightweight publish/subscribe message transport for small and simple devices sending small messages on low-bandwidth networks. Hence, MQTT is said to be ideal for mobile applications because of its low power usage and minimized data packets.

MQTT is also simple because it just has five API methods:

  • Connect to an MQTT broker;
  • Disconnect from an MQTT broker;
  • Subscribe to an MQTT topic filter;
  • Unsubscribe from an MQTT topic filter;
  • Publish MQTT messages.

If you are interested to know more about the comparison of AMQP and MQTT, there is a detailed white paper from StormMQ discussing the difference between AMQP and MQTT.

Brokered Messaging – Service Bus Messaging

When two or more systems want to exchange information, they need a communication facilitator. This is where Microsoft Azure Service Bus comes into picture.

Azure Service Bus is a reliable information delivery service, which is similar to a postal service in the physical world.

One of the messaging patterns offered in Azure Service Bus is called Service Bus Messaging, or Brokered Messaging. By using it, both senders and receivers do not have to be available at the exact same time.

AMQP 1.0 support is available in the Service Bus SDK since its version 2.1. Since the Service Bus .NET client library by default using a dedicated SOAP-based protocol, to use AMQP 1.0, we need to specify in the Service Bus Connection String as highlighted below in bold.

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
    <appSettings>
        <add 
            key="Microsoft.ServiceBus.ConnectionString" 
            value="Endpoint=sb://[namespace].servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=[SAS key];TransportType=Amqp" /> 
    appSettings> 
configuration>

In AMQP transport mode, the client library of sender will serialize the brokered message into an AMQP message so that the message can be received and interpreted by a receiver running on a different platform.

Azure Event Hub

When our event-based messaging needs to be handled at a very huge scale, we can either continue to pay even more to use Azure Service Bus or we can switch to use Event Hub. Event Hub is a cheaper way for us to be able to deal with huge bursts of messages and retain messages for a longer period of time.

event-hub-is-cheaper.png

Event Hub is cheaper, reliable and also fully managed. (Full video: Azure Service Bus Event Hubs 101 with Dan Rosanova)

Although Event Hub does not support MQTT, it does support AMQP (and HTTP) where there could be at most 5,000 concurrent AMQP connections.

Event Hubs event and telemetry handling capabilities, such as ingesting millions of events per second, make it especially usefu for IoT scenarios. However, since it is ingestion only thus Event Hub has no facility for sending traffic, for example, from the cloud back to the devices (C2D).

Azure IoT Hub

Since Event Hubs only enable event ingress, i.e. C2D, Azure offers another service, IoT Hub, for both C2D and D2C (Device-to-Cloud) communications which are reliable and secure. Not only allowing bi-directional communication, IoT Hub also supports AMQP, HTTP, and MQTT.

IoT Hub has an identity registry storing information about devices which are given the permission to connect to the IoT Hub. Before a device can connect to an IoT Hub, there must be an entry for that device in the identity registry of the IoT Hub.

In a Hello World tutorial of connecting stimulated device to IoT Hub using C#, there is a way to add device and retrieve device identity programmatically as shown below.

private static async Task AddDeviceAsync()
{
    string deviceId = "gclRasPi2";
    Device device;

    try
    {
        device = await registryManager.AddDeviceAsync(new Device(deviceId));
    }
    catch (DeviceAlreadyExistsException)
    {
        device = await registryManager.GetDeviceAsync(deviceId);
    }

    Console.WriteLine("Generated device key: {0}", device.Authentication.SymmetricKey.PrimaryKey);
}

The Registry Manager, which is connecting to the IoT Hub using a Connection String with proper Policy, will add an device identity with the Device ID “gclRasPi2” to the Device Explorer in Azure.

azure-iot-hub-device-explorer.png

The device “gclRasPi2” is now in the Device Explorer.

After doing so, a message then can be sent from (stimulated) device to the IoT Hub. For example, the device wants to send data about the temperature and humidity at that moment using MQTT, we can use the following code.

var deviceClient = DeviceClient.Create(
    iotHubUri, 
    new DeviceAuthenticationWithRegistrySymmetricKey("gclRasPi2", deviceKey), 
    TransportType.Mqtt);

var telemetryDataPoint = new
{
    deviceId = "gclRasPi2",
    temperature = currentTemperature,
    humidity = currentHumidity
};

var messageString = JsonConvert.SerializeObject(telemetryDataPoint);

var message = new Message(Encoding.ASCII.GetBytes(messageString));
message.Properties.Add("temperatureAlert", (currentTemperature > 30) ? "true" : "false");

await deviceClient.SendEventAsync(message);

To read the message, please follow the steps shared by the tutorial on setting up to read data-point messages.

Message Routing

Besides reading normal data-point messages, what really interests me is another tutorial about message processing with Message Routing.

iot-hub-routing

Message Routing (Image Source: Microsoft Azure Blog)

According to the tutorial, we first need to setup a Service Bus queue in the same Azure subscription and region as our IoT Hub.

service-bus-queue.png

Created a Queue in the Service Bus.

We can then add an Endpoint in the IoT Hub for the queue we just created. As shown in the following screenshot, there is a message saying that “You may have up to 1 endpoint on the IoT hub.” This is because I am using the free IoT Hub. For its paid versions, only at most 10 custom endpoints are allowed.

Interestingly, each Azure subscription can only have at most 10 IoT Hubs, and only 1 free IoT Hub.

iot-hub-endpoint.png

Adding a new endpoint to the IoT Hub.

After adding endpoint, we need to setup the Message Routing. For free version, we can only have 5 routing rules.

iot-hub-route.png

Creating new route with query string following special syntax.

In the query string, I used temperatureAlert = “true” as the condition. Also, as shown on the screenshot above, there is a line saying “Messages which do not match any rules will be written to the ‘Events (messages/events)’ endpoint.” Hence, the following two console applications will show different results: The left one is connecting to the messages/events endpoint while the right one is showing messages that match the CustomizedMessageRoutingRule created above.

consoles-results.png

Only data with temperatureAlert = “true” will be sent to the “CustomizedMessageRoute”.

Now if we visit the Service Bus Queue page and IoT Hub page again, we will see some updates on the numbers.

queue-results.png

Usage statistics in Service Bus Queue.

iot-hub-usage.png

2% of 8k messages sent from the stimulated device console application.

Conclusion

That’s all about my first try of Azure IoT Hub after attending the workshop delivered by Gerald. It’s a great lunchtime workshop.

For those who are interested, there is an article on Microsoft sharing the benefits of using Azure IoT Hub service, you can read it to understand more.

This is just the beginning of my IoT learning journey. There are still more things for me to learn, such as Azure Stream Analysis and Microsoft Azure IoT Suite which is briefly brought up in the booklet mentioned above.

If you spot any mistake in this article or you have more to talk about IoT and in particular IoT in Azure ecosystem, please share with me. =)

Exploring Azure Functions for Scheduler

azure-function-documentdb.png

During my first job after finishing my undergraduate degree in NUS, I worked in a local startup which was then then the largest bus ticketing portal in Southeast Asia. In 2014, I worked with a senior to successfully migrate the whole system from on-premise to Microsoft Azure Virtual Machines, which is the IaaS option. Maintaining the virtual machines is a painful experience because we need to setup the load balancing with Traffic Manager, database mirroring, database failover, availability set, etc.

In 2015, when I first worked in Singapore Changi Airport, with the support of the team, we made use of PaaS technologies such as Azure Cloud Services, Azure Web Apps, and Azure SQL, we successfully expanded our online businesses to 7 countries in a short time. With the help of PaaS option in Microsoft Azure, we can finally have a more enjoyable working life.

Azure Functions

Now, in 2017, I decided to explore Azure Functions.

Azure Functions allows developers to focus on the code for only the problem they want to solve without worrying about the infrastructure like we do in Azure Virtual Machines or even the entire applications as we do in Azure Cloud Services.

There are two important benefits that I like in this new option. Firstly, our development can be more productive. Secondly, Azure Functions has two pricing models: Consumption Plan and App Service Plan, as shown in the screenshot below. The Consumption Plan lets us pay per execution and the first 1,000,000 executions are free!

Screen Shot 2017-02-01 at 2.22.01 PM.png

Two hosting plans in Azure Functions: Consumption Plan vs. App Service Plan

After setting up the Function App, we can choose “Quick Start” to have a simpler user interface to get started with Azure Function.

Under “Quick Start” section, there are three triggers available for us to choose, i.e. Timer, Data Processing, and Webhook + API. Today, I’ll only talk about Timer. We will see how we can achieve the scheduler functionality on Microsoft Azure.

Screen Shot 2017-02-05 at 11.16.40 PM.png

Quick Start page in Azure Function.

Timer Trigger

Timer Trigger will execute the function according to a schedule. The schedule is defined using CRON expression. Let’s say if we want our function to be executed every four hours, we can write the schedule as follows.

0 0 */4 * * *

This is similar to how we did in the cron job. The CRON expression consists of six fields. The first one is second (0-59), followed by minute (0 – 59), followed by hour (0 – 23), followed by day of month (1 – 31), followed by month (1 – 12) and day of week (0-6).

Similar to the usual Azure Web App, the default time zone used in Azure Functions is also UTC. Hence, if we would like to change it to use another timezone, what we need to do is just add the WEBSITE_TIME_ZONE application setting in the Function App.

Companion File: function.json

So, where do we set the schedule? The answer is in a special file called function.json.

In the Function App directory, there always needs a function.json file. The function.json file will contain the configuration metadata for the function. Normally, a function can only have a single trigger binding and can have none or more than one I/O bindings.

The trigger binding will be the place we set the schedule.

{
    "bindings": [
        {
            "name": "myTimer",
            "type": "timerTrigger",
            "direction": "in",
            "schedule": "0 0 */4 * * *"
        },
        ...
    ],
    ...
}

The name attribute is to specify the name of the parameter used in the C# function later. It is used for the bound data in the function.

The type attribute specifies the binding time. Our case here will be timerTrigger.

The direction attribute indicates whether the binding is for receiving data into the function (in) or sending data from the function (out). For scheduler, the direction will be “in” because later in our C# function, we can actually retrieve info from the myTimer parameter.

Finally, the schedule attribute will be where we put our schedule CRON expression at.

To know more about binding in Azure Function, please refer to the Azure Function Developer Guide.

Function File: run.csx

2nd file that we must have in the Function App directory is the function itself. For C# function, it will be a file called run.csx.

The .csx format allows developers to focus on just writing the C# function to solve the problem. Instead of wrapping everything in a namespace and class, we just need to define a Run method.

#r "Newtonsoft.Json"

using System;
using Newtonsoft.Json;
...

public static async Task Run(TimerInfo myTimer, TraceWriter log)
{
    ...
}

Assemblies in .csx File

Same as how we always did in C# project, when we need to import the namespaces, we just need to use the using clause. For example, in our case, we need to process the Json file, so we need to make use of the library Newtonsoft.Json.

using Newtonsoft.Json;

To reference external assemblies, for example in our case, Newtonsoft.Json, we just need to use the #r directive as follows.

#r "Newtonsoft.Json"

The reason why we are allowed to do so is because Newtonsoft.Json and a few more other assemblies are “special case”. They can be referenced by simplename. As of Jan 2017, the assemblies that are allowed to do so are as follows.

  • Newtonsoft.Json
  • Microsoft.WindowsAzure.Storage
  • Microsoft.ServiceBus
  • Microsoft.AspNet.WebHooks.Receivers
  • Microsoft.AspNet.WebHooks.Common
  • Microsoft.Azure.NotificationHubs

For other assemblies, we need to upload the assembly file, for example MyAssembly.dll, into a bin folder relative to the function first. Only then we can reference is as follows.

#r "MyAssembly.dll"

Async Method in .csx File

Asynchronous programming is recommended best practice. To make the Run method above asynchronous, we need to use the async keyword and return a Task object. However, developers are advised to always avoid referencing the Task.Result property because it will essentially do a busy-wait on a lock of another thread. Holding a lock creates the potential for deadlocks.

Inputs in .csx File and DocumentDB

latest-topics-on-dotnet-sg-facebook-group

This section will display the top four latest Facebook posts pulled by Azure Function.


For our case, the purpose of Azure Function is to process the Facebook Group feeds and then store the feeds somewhere for later use. The “somewhere” here is DocumentDB.

To gets the inputs from DocumentDB, we first need to have 2nd binding specified in the functions.json as follows.

{
    "bindings": [
        ...
        {
            "type": "documentDB",
            "name": "inputDocument",
            "databaseName": "feeds-database",
            "collectionName": "facebook-group-feeds",
            "id": "41f7adb1-cadf-491e-9973-28cc3fca57df",
            "connection": "dotnetsg_DOCUMENTDB",
            "direction": "in"
        }
    ],
    ...
}

In the DocumentDB input binding above, the name attribute is, same as previous example, used to specify the name of the parameter in the C# function.

The databaseName and collectionName attributes correspond to the names of the database and collection in our DocumentDB, respectively. The id attribute is the Document Id of the document that we want to retrieve. In our case, we store all the Facebook feeds in one document, so we specify the Document Id in the binding directly.

The connection attribute is the name of the Azure Function Application Setting storing the connection string of the DocumentDB account endpoint. Yes, Azure Function also has Application Settings available. =)

Finally, the direction attribute must be “in”.

We can then now enhance our Run method to include inputs from DocumentDB as follows. What it does is basically just reading existing feeds from the document and then update it with new feeds found in the Singapore .NET Facebook Group

#r "Newtonsoft.Json"

using System;
using Newtonsoft.Json;
...

private const string SG_DOT_NET_COMMUNITY_FB_GROUP_ID = "1504549153159226";

public static async Task Run(TimerInfo myTimer, dynamic inputDocument, TraceWriter log)
{
    string sgDotNetCommunityFacebookGroupFeedsJson = 
        await GetFacebookGroupFeedsAsJsonAsync(SG_DOT_NET_COMMUNITY_FB_GROUP_ID);
    
    ...

    var existingFeeds = JsonConvert.DeserializeObject(inputDocument.ToString());

    // Processing the Facebook Group feeds here...
    // Updating existingFeeds here...

    inputDocument.data = existingFeeds.Feeds;
}

Besides getting input from DocumentDB, we can also have DocumentDB output binding as follows to, for example, write a new document to DocumentDB database.

{
    "bindings": [
        ...
        {
            "type": "documentDB",
            "name": "outputDocument",
            "databaseName": "feeds-database",
            "collectionName": "facebook-group-feeds",
            "id": "41f7adb1-cadf-491e-9973-28cc3fca57df",
            "connection": "dotnetsg_DOCUMENTDB",
            "createIfNotExists": true,
            "direction": "out"
        }
    ],
    ...
}

We don’t really use this in our dotnet.sg case. However, as we can see, there are only two major differences between DocumentDB input and output bindings.

Firstly, we have a new createIfNotExists attribute which specify whether to create the DocumentDB database and collection if they don’t exist or not.

Secondly, we will have to set the direction attribute to be “out”.

Then in our function code, we just need to have a new parameter with “out object outputDocument” instead of “in dynamic inputDocument”.

You can read more at the Azure Functions DocumentDB bindings documentation to understand more about how they work together.

Application Settings in Azure Functions

Yes, there are our familiar features such as Application Settings, Continuous Integration, Kudu, etc. in Azure Functions as well. All of them can be found under “Function App Settings” section.

Screen Shot 2017-02-18 at 4.40.24 PM.png

Azure Function App Settings

As what we have been doing in Azure Web Apps, we can also set the timezone, store the App Secrets in the Function App Settings.

Deployment of Azure Functions with Github

We are allowed to link the Azure Function with variety of Deployment Options, such as Github, to enable the continuous deployment option too.

There is one thing that I’d like to highlight here is that if you are also starting from setting up your new Azure Function via Azure Portal, then when in the future you setup the continuous deployment for the function, please make sure that you first create a folder having the same name as the name of your Azure Function. Then all the files related to the function needs to be put in the folder.

For example, in dotnet.sg case, we have the Azure Function called “TimerTriggerCSharp1”. we will have the following folder structure.

Screen Shot 2017-02-18 at 4.49.11 PM.png

Folder structure of the TimerTriggerCsharp1.

When I first started, I made a mistake when I linked Github with Azure Function. I didn’t create the folder with the name “TimerTriggerCSharp1”, which is the name of my Azure Function. So, when I deploy the code via Github, the code in the Azure Function on the Azure Portal is not updated at all.

In fact, once the Continuous Deployment is setup, we are no longer able to edit the code directly on the Azure Portal. Hence, setting up the correct folder structure is important.

Screen Shot 2017-02-18 at 4.52.17 PM.png

Read only once we setup the Continuous Deployment in Azure Function.

If you would like to add in more functions, simply create new folders at the same level.

Conclusion

Azure Function and the whole concept of Serverless Architecture are still very new to me. However, what I like about it is the fact that Azure Function allows us to care about the codes to solve a problem without worrying about the whole application and infrastructure.

In addition, we are also allowed to solve the different problems using the programming language that best suits the problem.

Finally, Azure Function is cost-saving because we can choose to pay only for the time our code is being executed.

If you would like to learn more about Azure Functions, here is the list of references I use in this learning journey.

You can check out my code for TimerTriggerCSharp1 above at our Github repository: https://github.com/sg-dotnet/FacebookGroupFeedsProcessor.

Never Share Your Secrets (Secret Manager and Azure Application Settings)

secret-manager-tool-azure-app-service-2

It’s important to keep app secrets out of our codes. Most of the app secrets are however still found in .config files. This way of handling app secrets becomes very risky when the codes are on public repository.

Thus, they are people put some dummy text in the .config files and inform the teammates to enter their respective app secrets. Things go ugly when this kind of “common understanding” among the teammates is messed up.

i-made-a-mistake-cannot-be-reversed

The moment when your app secrets are published on Github public repo. (Image from “Kono Aozora ni Yakusoku o”)

Secret Manager Tool

So when I am working on the dotnet.sg website, which is an ASP .NET Core project, I use the Secret Manager tool.It offers a way to store sensitive data such as app secrets in our local development machine.

To use the tool, firstly, I need to add it in project.json as follows.

{
    "userSecretsId": "aspnet-CommunityWeb-...",
    ...
    "tools": {
        ...
        "Microsoft.Extensions.SecretManager.Tools": "1.0.0-preview2-final"
    }
}

Due to the fact that the Secret Manager tool makes use of project specific configuration settings kept in user profile, we need to specify a userSecretsId value in the project.json as well.

After that, I can start storing the app secrets in the Secret Manager tool by entering the following command in the project directory.

$ dotnet user-secrets set AppSettings:MeetupWebApiKey ""

Take note that currently (Jan 2017) the values stored in the Secret Manager tool are not encrypted. So, it is just for development only.

As shown in the example above, the name of the secret is “AppSettings:MeetupWebApiKey”. This is because in the appsettings.json, I have the following.

{
    "AppSettings": {
        "MeetupWebApiKey": ""
    },
    ...
}

Alright, now the API key is stored in the Secret Manager tool, how is it accessed from the code?

By default, appsettings.json is already loaded in startup.cs. However, we still need to add the following bolded lines in startup.js to enable User Secrets as part of our configuration in the Startup constructor.

public class Startup
{
    public Startup(IHostingEnvironment env)
    {
        var builder = new ConfigurationBuilder()
            .SetBasePath(env.ContentRootPath)
            .AddJsonFile("appsettings.json", optional: true, reloadOnChange: true)
            .AddJsonFile($"appsettings.{env.EnvironmentName}.json", optional: true);
            
        if (env.IsDevelopment())
        {
            builder.AddUserSecrets();
        }

        builder.AddEnvironmentVariables();

        Configuration = builder.Build();
    }
    ...
}

Then in the Models folder, I create a new class called AppSettings which will be used later when we load the app secrets:

public class AppSettings
{
    public string MeetupWebApiKey { get; set; }

    ...
}

So, let’s say I want to use the key in the HomeController, I just need to do the following.

public class HomeController : Controller
{
    private readonly AppSettings _appSettings;

    public HomeController(IOptions appSettings appSettings)
    {
        _appSettings = appSettings.Value;
    }

    public async Task Index()
    {
        string meetupWebApiKey = _appSettings.MeetupWebApiKey;
        ...
    }
    
    ...
}

Azure Application Settings

Just now Secret Manager tool has helped us on managing the app secrets in local development environment. How about when we deploy our web app to Microsoft Azure?

For dotnet.sg, I am hosting the website with Azure App Service. What so great about Azure App Service is that there is one thing called Application Settings.

Screen Shot 2017-01-29 at 11.19.42 PM.png

Application Settings option is available in Azure App Service.

For .NET applications, the settings in the “App Settings” will be injected into the AppSettings at runtime and override existing settings. Thus, even though I have empty strings in appsettings.json file in the project, as long as the correct values are stored in App Settings, there is no need to worry.

Thus, when we deploy web app to Azure App Service, we should never put our app secrets, connection strings in our .config and .json files or even worse, hardcode them.

Application Settings and Timezone

Oh ya, one more cool feature in App Settings that was introduced in 2015 is that we can change the server time zone for web app hosted on Azure App Service easily by just having a new entry as follows in the App Settings.

WEBSITE_TIME_ZONE            Singapore Standard Time

The setting above will change the server time zone to use Singapore local time. So DateTime.Now will return the current local time in Singapore.

References

If you would like to read more about the topics above, please refer to following websites.