Handwritten Text Recognition, OCR, and Key Vault

Recently, I am glad to have help from Marvin Heng, the Microsoft MVP in Artificial Intelligence category, to work with me on building an experiment tool, FutureNow, to recognize handwritten texts as well as apply OCR technology to automate forms processing.

In January 2019, we also successfully presented our solution during the Singapore .NET Developers Community meetup. Taking the opportunity, I also presented how Azure Key Vault is used in our project to centralize our key and secret management.

Marvin is sharing with the audience about Custom Vision during the meetup.

Hence, in this article, I’d like to share about this project in terms of how we use Cognitive Services and Key Vault.

Code Repository

The code of our project is available in both Azure DevOps and Github. I will update both places to make sure the codes are updated.

The reason I have my codes in both places because the project is originally collaborated in Azure DevOps. However, during meetup, I realized majority of the audience still prefer us to have our codes on Github. Well…

Azure DevOps: https://dev.azure.com/gohchunlin/JobCreationAutomation
Github: https://github.com/sg-dotnet/text-recognition-ocr

Our “FutureNow” tool where user can use it to analyze text on images.

Custom Vision

What Marvin has contributed fully is to implement a function to detect and identify the handwritten texts in the uploaded image.

To do so, he first created a project in Custom Vision to train the model. In the project, he uploaded many images of paper documents and then labelled the handwritten texts found on the paper.

The part where the system analyzes the uploaded image and finds the handwriting part is in the TagAndAnalyzeService.cs.

In the AnalyzeImageAsync method, we first use the Custom Vision API which is linked to Marvin’s project to identify which parts in the image are “probably” handwritten.

At this point of time, the system still cannot be hundred-percent sure the parts it identifies as handwritten text really contain handwritten text. Hence, the result returns from the API contains a probability value. That’s why we have a percentage bar on our front-end to control the threshold for this probability value to accept only those results having a higher probability value will be accepted.

Handwritten Text Extraction with Computer Vision

After the previous step is done, then we will crop those filtered sections out from the uploaded image and then send each of the smaller image to the text recognition API in Cognitive Service to process the image and to extract out the text.

Hence in the code, the HandwrittenRecognitionService will be called to perform the image processing with the Computer Vision API version 1.0 recognizeText method.

There is an interesting do…while loop in the method. The loop is basically used to wait for the API to return the image processing results. It turns out that most of the time, the API will not directly return the result. Instead, it will return a JSON object telling us that it’s still processing the image. Only when it returns the JSON object with status set to “Succeeded”, then we know that the analysis result is returned together in the JSON object.

do
{
var textOperation = response.Headers.GetValues("Operation-Location").FirstOrDefault();

var result = await client.GetAsync(textOperation);

string jsonResponse = result.Content.ReadAsStringAsync().Result;

var handwrittenAnalyzeResult = JsonConvert.DeserializeObject(jsonResponse);

isAnalizying = handwrittenAnalyzeResult.Status != "Succeeded";

if (!isAnalizying)
{
return handwrittenAnalyzeResult;
}
} while (isAnalizying);

In order to display to the user in front-end the results, we will store the cropped images in Azure Blob Storage and then display both the images and their corresponding extracted texts on the web page.

Unfortunately, the reading of handwritten text from images is a technology which is still currently in preview and is only available for English text. Hence, we need to wait a while until we can deploy it for production use.

OCR with Computer Vision

Using Computer Vision to perform OCR can better detect and extract text in an image especially when the image is a screenshot of a computer generated PDF file.

In OpticalCharacterRecognitionService, we simply call the Computer Vision API OCR method with the uploaded image and language set to English by default, then we can easily get the result of the OCR back in JSON format.

Key Vault

Key Vault in this project is mainly for managing the keys and connection string to the Azure Blob Storage.

Secrets of the FutureNow project in the Azure Key Vault.

To retrieve any of the secrets, we simply make use of the Microsoft.Azure.KeyVault Nuget package, as shown below.

var azureServiceTokenProvider = new AzureServiceTokenProvider();

var keyVaultClient = new KeyVaultClient(new KeyVaultClient.AuthenticationCallback(azureServiceTokenProvider.KeyVaultTokenCallback));

var secret = await keyVaultClient.GetSecretAsync($"https://futurenow.vault.azure.net/secrets/{ secretName }").ConfigureAwait(false);

According to Microsoft Azure documentation, there are service limits in Key Vault to ensure quality of service provided. Hence, when a service threshold is exceeded, any further requests from the client will not get successful response from Key Vault. Instead, HTTP status code 429 (Too many requests) will be returned.

There is an official guidance to handle Key Vault throttling. Currently, the code sample provided in the sample is not correct because the retry and waitTime variables are not used.

Incorrect sample code provided in Microsoft Docs.

Regarding this problem, I have raised issues (#22859 and #22860) and submitted a pull request to Microsoft on Github. Currently the PR is not yet approved but both Bryan Lamos and Prashanth Yerramilli have agreed that the code is indeed incorrect. Anyway, in our KeyVaultService class, the code has already been corrected.

EDIT (26 January 2019): The pull request has been approved. =)

Conclusion

Even though this project is just an experimental project for us to understand more about the power of Custom Vision and Computer Vision, I am glad that through this project, I manage to learn additional knowledge about Blob Storage, Azure DevOps, Key Vault, etc. and then later share it with the Singapore .NET Developers Community members.

Special thanks to Marvin for helping me in this project.

Advertisements

First Step into Orchard Core

This afternoon, I decided to take a look at Orchard Core, an open-source CMS (Content Management System) built on top of an ASP .NET Core application framework.

Since it is open-source, I easily forked its repository from Github and then checked out its dev branch.

After waiting for less than one minute to get all the Nuget packages restored in the project, I set OrchardCore.Cms.Web as the default project. Then I tried to run it but it failed with tons of errors. One of the major errors is “Assembly location for Razor SDK Tasks was not specified”. According to online discussion, it turns out that .NET Core 2.2 is needed.

After downloading the correct SDK, the projects are now successfully built with the following web page pops out as a result.

Take note that, as shown in the screenshot above, when I fill in Table Prefix, it will throw me exception saying that “SqlException: Invalid object name ‘OrchardroadDocument’” during the setup stage, as shown in the following screenshot.

Hence, the best way to proceed is to not enter anything to the Table Prefix textbox. Then we will be able to setup our CMS successfully. Once we log in to the system as Super User, we can proceed to configure the CMS.

Yup, this concludes my first attempt with the new Orchard Core CMS. =)

#cms, #open-source, #orchard, #technology

Connecting Android App with IdentityServer4

android-identity-server-appauth.png

For those ASP .NET web developers, Identity Server should be quite familiar to them especially they are looking for SSO solution.

After successfully integrating Identity Server in our ASP .NET Core MVC web applications, it is now time for us to research about how our mobile app can be integrating with IdentityServer4 too.

Background

We have two types of users. The admin will be logging in to the system via our web application. The normal staff will log in to the system via mobile app. Different sets of features are provided for both web and mobile apps.

Setting up Client on Identity Server

To begin, we need to add new client to the MemoryClients of Identity Server.

According to the sample code done by Hadi Dbouk, we setup the new client as shown in the following code.

using IdentityServer4.Models;
...
public class ClientStore : IClientStore {
    ...
    var availableClients = new List();
    ...
    availableClients.Add(new Client 
    {
        ClientId = "my-awesome-app",
        ClientName = "My Awesome App",
        AllowedGrantTypes = GrantTypes.Code,
        RequirePkce = true,
        RequireConsent = false,
        ClientSecrets = 
        {
            new Secret("my-secret".Sha256())
        },
        RefreshTokenUsage = TokenUsage.ReUse,
        RedirectUris = { "gclprojects.chunlin.myapp:/oauth2callback" },
        AllowedScopes = 
        {
            StandardScopes.OpenId,
            StandardScopes.Profile,
            StandardScopes.Email,
            StandardScopes.OfflineAccess
        },
        AllowOfflineAccess = true
    }
    );
}

For mobile apps, there are two grant types recommended, i.e. Authorization Code and Hybrid. However, as when this post is written, the support of Hybrid is still not mature in AppAuth for Android, so we decided to use GrantTypes.Code instead.

However, OAuth2.0 clients using authorization codes can be attacked. In the attack, the authorization code returned from an authorization endpoint is intercepted within a communication path that is not protected by TLS. To mitigate the attack, PKCE (Proof Key for Code Exchange) is required.

We don’t have consent screen for our apps, so we set RequireConsent to false.

For the RefreshTokenUsage, there are two possible values, i.e. ReUse and OneTime. The only difference is that ReUse will make the refresh token handle to stay the same when refreshing tokens. OneTime will update the refresh token handle once the tokens are refreshed.

Once the authorization flow is completed, users will be redirected to a URI. As documented in AppAuth for Android Readme, custom scheme based redirect URI (i.e. those of form “my.scheme:/path”) should be used for the authorization redirect because it is the most widely supported across many Android versions.

By setting AllowOfflineAccess to be true and give the client access to the offline_access scope, we allow requesting refresh tokens for long lived API access.

Android Setup: Installation of AppAuth

The version of AppAuth for Android is v0.7.0 at the point of time this post is written. To install it for our app, we first need to set it in build.gradle (Module: app).

apply plugin: 'com.android.application'

android {
    ...    defaultConfig {
        ...
        minSdkVersion 21
        targetSdkVersion 26
        ...
        manifestPlaceholders = [
            'appAuthRedirectScheme': 'gclprojects.chunlin.myapp'
        ]
    }
    ...
}

dependencies {
    ...
    compile 'com.android.support:appcompat-v7:26.+'
    compile 'com.android.support:design:26.+'
    compile "com.android.support:customtabs:26.0.0-alpha1"
    compile 'net.openid:appauth:0.7.0'
    ...
}

 

appauth-code-flow.png

AppAuth for Android authorization code flow. (Reference: The proper way to use OAuth in a native app.)

Android Setup: Updating Manifest

In the AndroidManifest.xml, we need to add the redirect URI to the RedirectUriReceiverActivity, as shown in the following code.

<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
    package="gclprojects.chunlin.myapp">
    ...
    <application...>
        <activity
            android:name="net.openid.appauth.RedirectUriReceiverActivity"
            android:theme="@style/Theme.AppCompact.NoActionBar">
        <intent-filter>
            <action android:name="android.intent.action.VIEW"/>

            <category android:name="android.intent.category.DEFAULT"/>
            <category android:name="android.intent.category.BROWSABLE"/>

            <data android:scheme="gclprojects.chunlin.myapp"/>
        </intent-filter>
    </application>
    ...
</manifest>

Android Setup: Authorizing Users

On the Android app, we will have one “Login” button.

<Button
    android:onClick="Login"
    android:layout_width="wrap_content"
    android:layout_height="wrap_content"
    android:text="Login"
    android:layout_centerInParent="true"/>

By clicking on it, the authorization steps will begin.

public void Login(View view) {
    AuthManager authManager = AuthManager.getInstance(this);
    AuthorizationService authService = authManager.getAuthService();

    AuthorizationRequest.Builder authRequestBuilder = new AuthorizationRequest
            .Builder(
            authManager.getAuthConfig(),
            "my-awesome-app",
            "code",
            Uri.parse("gclprojects.chunlin.myapp:/oauth2callback"))
            .setScope("openid profile email offline_access");

    String codeVerifier = CodeVerifierUtil.generateRandomCodeVerifier();
    SharedPreferencesRepository sharedPreferencesRepository = new SharedPreferencesRepository(this);
    sharedPreferencesRepository.saveCodeVerifier(codeVerifier);

    authRequestBuilder.setCodeVerifier(codeVerifier);

    AuthorizationRequest authRequest = authRequestBuilder.build();

    Intent authIntent = new Intent(this, LoginAuthActivity.class);
    PendingIntent pendingIntent = PendingIntent.getActivity(this, authRequest.hashCode(), authIntent, 0);

    authService.performAuthorizationRequest(
            authRequest,
            pendingIntent);
}

The code above uses some other classes and interact with other activity. I won’t talk about them here because the codes can be found on my Github repository which is forked from Hadi Dbouk’s.

Android Setup: Post Authorization and Refresh Token

According to the code in the LoginAuthActivity.java, if the login fails, the user will be brought back to the Login Activity. However, if it succeeds, the user can then reach to another activities in the app which require user to login first. We can also then get Access Token, Refresh Token, and ID Token from authManager. With the Access Token, we then can access our backend APIs.

Since access tokens have finite lifetimes, refresh tokens allow requesting new access tokens without user interaction. In order to have the client to request Refresh Token, we need to authorize it by setting AllowOfflineAccess to true. When we make a request to our APIs, we need to check if the Access Token is expired, if it is so, we need to make a new request with the Refresh Token to the IdentityServer to have a new Access Token.

The way how we can retrieve new Access Token with a Refresh Token in AppAuth is shown in the TokenTimer class in TokenService.java using createTokenRefreshRequest.

private class TokenTimer extends TimerTask {
    ...

    @Override
    public void run() {

        if(MyApp.Token == null)
            return;

        final AuthManager authManager = AuthManager.getInstance(TokenService.this);

        final AuthState authState = authManager.getAuthState();


        if(authState.getNeedsTokenRefresh()) {
            //Get New Token

            ClientSecretPost clientSecretPost = new ClientSecretPost("driver008!");
            final TokenRequest request = authState.createTokenRefreshRequest();
            final AuthorizationService authService = authManager.getAuthService();

            authService.performTokenRequest(request, clientSecretPost, new AuthorizationService.TokenResponseCallback() {
                @Override
                public void onTokenRequestCompleted(@Nullable TokenResponse response, @Nullable AuthorizationException ex) {
                    if(ex != null){
                        ex.printStackTrace();
                        return;
                    }
                    authManager.updateAuthState(response,ex);
                    MyApp.Token = authState.getIdToken();
                }
            });

        }

    }
}

Conclusion

Yup, that’s all for integrating the Identity Server in an Android App to provide a seamless login experience to our users. If you find any mistake in this article, kindly let me know in the comment section. Thanks in advance!

References

 

#android, #c, #java, #microsoft, #openid, #technology

[KOSD Series] Certificate for Signing JWT on IdentityServer

KOSD, or Kopi-O Siew Dai, is a type of Singapore coffee that I enjoy. It is basically a cup of coffee with a little bit of sugar. This series is meant to blog about technical knowledge that I gained while having a small cup of Kopi-O Siew Dai.

kosd-identity-server-dotnet-core-openssl-app-service

Last year, Riza shared a very interesting topic twice during Singapore .NET Developers Community in Microsoft office. For those who attended the meetups, do you still remember? Yes, it’s about IdentityServer.

riza.png

Riza delivered a free hands-on training about IdentityServer 4 in August 2017.

IdentityServer 4 is a middleware, an OpenID Connect provider built to spec, which provides user identity and access control in ASP .NET Core applications.

In my example, I will start with the simplest setup where there will be one Authenticate Server and one Application Server. Both of them in my example will be using ASP .NET Core.

jwt-with-signin-and-verification-flow.png

How an application uses JWT to authenticate a user.

In the Authenticate Server, I register the minimum required dependencies in ConfigureServices method of its Startup.cs as follows.

services.AddIdentityServer()
     .AddDeveloperSigningCredential()
     .AddInMemoryIdentityResources(...)
     .AddInMemoryApiResources(...)
     .AddInMemoryClients(...)
     .AddAspNetIdentity();

I won’t be talking about how IdentityServer works here. Instead, I will be focusing on the “AddDeveloperSigningCredential” method here.

JSON Web Token (JWT)

By default, IdentityServer issues access tokens in the JWT format. According to the abstract definition in RCF 7519 from Internet Engineering Task Force (IETF) , JWT is a compact, URL-safe means of representing claims between two parties where claims are encoded as JSON objects which can be digitally signed or encrypted.

In the diagram above, the Application Server receives the secret key used in signing the JWT from the Authentication Server when the app sets up its authentication process. Hence, the app can verify whether the JWT comes from an authentic source using the secret key.

AddDeveloperSigningCredential

IdentityServer uses an asymmetric key pair to sign and validate JWT. We can use AddDeveloperSigningCredential to do so. In the previous version of IdentityServer, this method is actually called AddTemporarySigningCredential.

During development, we normally don’t have cert prepared yet. Hence, AddTemporarySigningCredential can be used to auto-generate certificate to sign JWT. However, this method has a disadvantage. Every time the IdentityServer is restarted, the certificate will change. Hence, all tokens that have been signed with the previous certificate will fail to validate.

This situation is fixed when AddDeveloperSigningCredential is introduced to replace the AddTemporarySigningCredential method. This new method will still create temporary certificate at startup time. However, it will now be able to persists the key to the file system so that it stays stable between IdentityServer restarts.

Anyway, as documented, we are only allowed to use AddDeveloperSigningCredential in development environments. In addition, AddDeveloperSigningCredential can only be used when we host IdentityServer on single machine. What should we do when we are going to deploy our code to the production environment? We need a signing key service that will provide the specified certificate to the various token creation and validation services. Thus now we need to change to use AddSigningCredential method.

Production Code

For production, we need to change the code earlier to be as follows.

X509Certificate2 cert = null;
using (X509Store certStore = new X509Store(StoreName.My, StoreLocation.CurrentUser))
{
    certStore.Open(OpenFlags.ReadOnly);
    var certCollection = certStore.Certificates.Find(
        X509FindType.FindByThumbprint,
        Configuration["AppSettings:IdentityServerCertificateThumbprint"],
        false);
 
    // Get the first cert with the thumbprint
    if (certCollection.Count > 0)
    {
        cert = certCollection[0];
    }
}

services.AddIdentityServer()
     .AddSigningCredential(cert)
     .AddInMemoryIdentityResources(...)
     .AddInMemoryApiResources(...)
     .AddInMemoryClients(...)
     .AddAspNetIdentity();

We use AddSigningCredential to replace the AddDeveloperSigningCredential method. Now, AddSigningCredential requires a X509Certificate2 cert as parameter.

Creation of Certificate with OpenSSL on Windows

It’s quite challenging to install OpenSSL on Windows. Luckily, Ben Cull, solution architect from Belgium, has shared a tutorial on how to do this easily with a tool called Win32 OpenSSL.

His tutorial can be summarized into 5 steps as follows.

  1. Install the Win32 OpenSSL and add its binaries to PATH;
  2. Create a new certificate and private key;
    openssl req -x509 -newkey rsa:4096 -sha256 -nodes -keyout cuteprogramming.key -out cuteprogramming.crt -subj "/CN=cuteprogramming.com" -days 3650
  3. Convert the certificate and private key into .pfx;
    openssl pkcs12 -export -out cuteprogramming.pfx -inkey cuteprogramming.key -in cuteprogramming.crt -certfile cuteprogramming.crt
  4. Key-in and remember the password for the private key;
  5. Import the certificate to the Current User Certificate Store on developer’s local machine by double-clicking on the newly generated .pfx file. We will be asked to key in the password used in Step 4 above again.

certificate-import-wizard.png

Importing certificate.

Now, we need to find out the Thumbprint of it. This is because in our production code above, we are using Thumbprint to look for the cert.

Thumbprint and Microsoft Management Console (MMC)

To retrieve the Thumbprint of a certificate, we need help from a tool called MMC.

adding-certificate-snap-ins.png

Using MMC to view certificates in the local machine store for current user account.

We will then be able to find the new certificate that we have just created and imported. To retrieve its Thumbprint, we first need to open it, as shown in the screenshot below.

open-new-cert.png

Open the new cert in MMC.

A popup window called Certificate will appear. Simply copy the value of the Thumbprint under the Details tab.

thumbprint.png

Thumbprint!

After keeping the value of the cert thumbprint in the appsettings.Development.json of the IdentityServer project, we can now build and run the project on localhost without any problem.

Deployment to Microsoft Azure Web App

Before we talk about how to deploy the IdentityServer project to Microsoft Azure Web App, do you realize how come in the code above, we are looking cert only My/Personal store of the CurrentUser registry, i.e. “StoreName.My, StoreLocation.CurrentUser”? This is because this is the place where Azure will load the certificate from.

So now, we will first proceed to upload the certificate as Private Certificate that we self-sign above to Azure Web App. After selecting the .pfx file generated above and keying-in the password, the cert will appear as one of the Private Certificates of the Web App.

uploading-certificate-to-azure.png

To upload the cert, we can do it in “SSL certificates” settings of our Web App on Azure Portal.

Last but not least, in order to make the cert to be available to the app, we need to have the following setting added under “Application settings” of the Web App.

application-settings-for-cert.png

WEBSITE_LOAD_CERTIFICATES setting is needed to make the cert to be available to the app.

As shown in the screenshot above, we set WEBSITE_LOAD_CERTIFICATES to have * as its value. This will make all the certificates in the Web App being loaded to the personal certification store of the app. Alternatively, we can also let it load selective certificates by keying in comma-separated thumbprints of the certificates.

Two Certificates

There is an interesting discussion on IdentityServer3 Issues about the certificates used in IdentityServer project. IdentityServer requires two certificates: one for SSL and another for signing JWT.

In the discussion, according to Brock Allen, the co-author of IdentityServer framework, we should never use the same cert for both purposes and it is okay to use a self-signed cert to be the signing cert.

Brock also provided a link in the discussion to his blog post on how to create signing cert using makecert instead of OpenSSL as discussed earlier. In fact, during Riza’s presentation, he was using makecert to self-sign his cert too. Hence, if you are interested about how to use makecert to do that, please read his post here: https://brockallen.com/2015/06/01/makecert-and-creating-ssl-or-signing-certificates/.

Conclusion

This episode of KOSD series is a bit long such that drinking a large cup of hot KOSD while reading this post seems to be a better idea. Anyway, I think this post will help me and other beginners who are using IdentityServer in their projects to understand more about the framework bit by bit.

There are too many things that we can learn in the IdentityServer project and I hope to share what I’ve learnt about this fantastic framework in my future posts. Stay tuned.

References

[KOSD Series] Discussion about Cosmos DB Performance

KOSD, or Kopi-O Siew Dai, is a type of Singapore coffee that I enjoy. It is basically a cup of coffee with a little bit of sugar. This series is meant to blog about technical knowledge that I gained while having a small cup of Kopi-O Siew Dai.

kosd-cosmos-db.png

During a late dinner with my friend on 12 January last month, he commented that he encountered a very serious performance problem in retrieving data from Cosmos DB (pka DocumentDB). It’s quite strange because, in our IoT project which also stores millions of data in Cosmos DB, we never had this problem.

Two weeks later, on 27 January, he happily showed me his improved version of the code which could query the data in about one to two seconds.

Yesterday, after having a discussion, we further improved the code. Hence, I’d like to write down this learning experience here.

Preparation

Due to the fact that we couldn’t demonstrate using the real project code, I thus created a sample project getting data from database and collection on my personal Azure Cosmos DB account. The database contains one collection which has 23,967 records of Student data.

The Student class and the BaseEntity class that it inherits from are as follows.

public class Student : BaseEntity
{
    public string Name { get; set; }

    public int Age { get; set; }

    public string Description { get; set; }
}
public abstract class BaseEntity
{
    [JsonProperty(PropertyName = "id")]
    public string Id { get; set; }

    public string Type { get; set; }

    public DateTime CreatedAt { get; set; } = DateTime.Now;
}

You may wonder why I have Type defined.

Type and Cost Saving

The reason of having Type is that, before DocumentDB was rebranded as Cosmos DB in May 2017, the DocumentDB pricing is based on collections. Hence, the more collection we have in the database, the more we need to pay.

confused-about-documentdb-pricing.png

DocumentDB was billed per collection in the past. (Source: Stack Overflow)

To overcome that, we squeeze the different types of entities in the same collection. So, in the example above, let’s say we have three classes — Students, Classroom, Teacher that inherit from BaseEntity, then we will put the data of the three classes in the same collection.

Then here comes a problem: How do we know which document in the collection is Student, Classroom or Teacher? There is where the property Type will help us. So in our example above, the possible value for Type will be Student, Classroom, and Teacher.

Hence, when we add a new document through repository design pattern, we have the following method.

public async Task<T> AddAsync(T entity)
{
    ...

    entity.Type = typeof(T).Name;

    var resourceResponse = await _documentDbClient.CreateDocumentAsync(UriFactory.CreateDocumentCollectionUri(_databaseId, _collectionId), entity);

    return resourceResponse.StatusCode == HttpStatusCode.Created ? (dynamic)resourceResponse.Resource : null;
}

Original Version of Query

We used the following code to retrieve data of a class from the collection.

public async Task<IEnumerable<T>> GetAllAsync(Expression<Func<T, bool>> predicate = null)
{
    var query = _documentDbClient.CreateDocumentQuery<T>(UriFactory.CreateDocumentCollectionUri(_databaseId, _collectionId));

    var documentQuery = (predicate != null) ?
        (query.Where(predicate)).AsDocumentQuery():
        query.AsDocumentQuery();

    var results = new List<T>();
    while (documentQuery.HasMoreResults)
    {
        results.AddRange(await documentQuery.ExecuteNextAsync<T>());
    }

    return results.Where(x => x.Type == typeof(T).Name).ToList();
}

This query will run very slow because the line where it filters the class is after querying data from the collection. Hence, in the documentQuery, it may already contain data of three classes (Student, Classroom, and Teacher).

Improved Version of Query

So one obvious way is to move the line of filtering by Type above. The improved version of code now looks as such.

public async Task<IEnumerable<T>> GetAllAsync(Expression<Func<T, bool>> predicate = null)
{
    var query = _documentDbClient
        .CreateDocumentQuery<T>(UriFactory.CreateDocumentCollectionUri(_databaseId, _collectionId))
        .Where(x => x.Type == typeof(T).Name);

    var documentQuery = (predicate != null) ?
        (query.Where(predicate)).AsDocumentQuery():
        query.AsDocumentQuery();

    var results = new List<T>();
    while (documentQuery.HasMoreResults)
    {
        results.AddRange(await documentQuery.ExecuteNextAsync<T>());
    }

    return results;
}

By doing so, we managed to reduce the query time significantly because all the actual filtering will be done at Cosmos DB side. For example, there was one query I managed to reduce the query time of it from 1.38 minutes to 3.42 seconds using the 23,967 records of Student data.

Multiple Predicates

The code above however has a disadvantage. It cannot accept multiple predicates.

I thus changed it to be as follows so that it returns IQueryable.

public IQueryable<T> GetAll()
{
    return _documentDbClient
        .CreateDocumentQuery<T>(UriFactory.CreateDocumentCollectionUri(_databaseId, _collectionId))
        .Where(x => x.Type == typeof(T).Name);
}

This has another inconvenience is there whenever I call GetAll, I need to remember to load the data with HasMoreResults as shown in the code below.

var studentDocuments = _repoDocumentDb.GetAll()
    .Where(s => s.Age == 8)
    .Where(s => s.Name.Contains("Ahmad"))
    .AsDocumentQuery();

var results = new List<T>();
while (studentDocuments.HasMoreResults)
{
    results.AddRange(await studentDocuments.ExecuteNextAsync<T>());
}

Conclusion

This is just an after-dinner discussion about Cosmos DB between my friend and me. If you have any better idea of designing repository for Cosmos DB (pka DocumentDB), please let us know. =)

#azure-cosmos-db, #c, #kopi-o-siew-dai, #linq, #programming, #technology

Create a Docker Image from CentOS Minimal ISO

virtual-box-centos-docker.png

When we are dockerizing an ASP .NET Core application, there will be a file called Dockerfile. For example, the Dockerfile in my previous project, Changshi, has the following content.

FROM microsoft/aspnetcore:2.0
ARG source
WORKDIR /app
EXPOSE 80
COPY ${source:-obj/Docker/publish} .
ENTRYPOINT ["dotnet", "changshi.dll"]

The Dockerfile basically is a set of instructions for Docker to build images automatically. The FROM instruction in the first line initializes a new build stage and sets the Parent Image for subsequent instructions. In the Dockerfile above, it is using microsoft/aspnetcore, the official image for running compiled ASP .NET Core apps, as the Parent Image.

If we need to control the contents of the image, then one way that we can do is to create a Base Image. So, in this post, I’m going to share about my journey of creating a Docker image from CentOS Minimal ISO.

Step 1: Setting up Virtual Machine on VirtualBox

We can easily get the minimal ISO of CentOS on their official website.

download-centos-iso.png

Minimal ISO is available on CentOS Download Page.

After successfully downloading the minimal ISO, we need to proceed to launch the Oracle VM VirtualBox (Download here if you don’t have one).

turn-off-hyperv.png

Switching off Hyper-V.

For Windows users who have Hyper-V enabled because of Docker for Windows, please disable it first otherwise you will either not able to start a VM with 64-bit guest OS even though your host OS is 64-bit Windows 10 or simply encounter a BSOD.

bsod.png

Please switch off Hyper-V before running CentOS 64-bit OS on VirtualBox.

Funny thing is that after switching off Hyper-V, Docker for Windows will make noise saying that it needs Hyper-V to be enabled to work properly. So currently I have to keep switching on and off the Hyper-V feature option depends on which tool I’m going to use.

the-conflict-of-virtualbox-and-docker-between-hyperv.png

VirtualBox vs. Docker for Windows. Pick one.

There is one important step on running CentOS on the VM. We need to remember to configure the Network of the VM to use network adapter attached to “Bridged Adapter”. This is to connect the VM through the host to whatever is our default network device that allocates IP addresses for our physical network. Doing so will help us to retrieve the Docker image tar file via SCP later.

Then in the Network & Host Name section of the installation, we shall see the IP address allocated to the VM.

centos-7-network-and-host-name.png

The IP Address should be available when Ethernet is connected.

To verify whether it works or not, we simply need to use the following command to check if an IP address is successfully allocated to the VM or not. In the minimal installation of CentOS 7, the command ifconfig is already not in use.

# ip a

We then can get the IP Address which is allocated to the VM. Sometimes, I need to wait for about 5 minutes before it can display the IP address successfully.

getting-ip-address.png

The IP address!

Step 2: Installing Docker on VM

After we get the IP address of the VM, we then can SSH into it. On Windows, I use PuTTY, a free SSH client for Windows, to easily SSH to the VM.

ssh-into-vm.png

SSH to the VM with the IP address using PuTTY.

We proceed to install EPEL repository before we can install Docker on the VM.

Since we are going to use wget to retrieve EPEL, we first need to install wget as following.

# yum install wget

Then we can use the wget command to download EPEL repository on the VM.

# wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

The file will be downloaded to the temp folder. So, to install it we will do the following.

# cd /tmp
# sudo yum install epel-release-latest-7.noarch.rpm

After the installation is done, there should be a success message as following showing on the console.

Installed:
    epel-release.noarch 0:7-11
Complete!

Now if we head to /etc/yum.repos.d, we will see the following files.

CentOS-Base.repo        CentOS-fasttrack.repo       CentOS-Vault.repo
CentOS-CR.repo          CentOS-Media.repo           epel.repo
CentOS-Debuginfo.repo   CentOS-Sources.repo         epel-testing.repo

In the CentOS-Base.repo, we need to enable the CentOS Plus repository which is by default disabled. To do so, we simply change the value of enabled to 1 under [centosplus] section.

Then we can proceed to install docker on the VM using yum.

# yum install docker

Step 3: Start Docker

Once docker is installed, we can then start the docker service with the following command.

# service docker start

So now if we list the images and containers inside the docker, the results should be 0 image and 0 container, as shown in the screenshot below.

docker-installed-without-images-and-containers (2)

No image and no container.

Step 4: Building First Docker Image

Thanks to the people in Moby Project, a collaborative project for the container ecosystem to assemble container-based systems, we have a script to create a base CentOS Docker image using yum.

The script is now available on Moby Project Github repository.

We now need to create a folder called scripts in the root and then create a file called createimage.sh in the folder. This step can be summarized as the following commands.

# mkdir scripts
# cd scripts
# vim createimage.sh

We then need to copy-and-paste the script from Moby Project to createimage.sh.

After that, we need to make createimage.sh executable with the following command.

# chmod +x createimage.sh

To run this script now, we need to do as follows, where centos7base is the name of the image file.

# ./createimage.sh centos7base

After it is done, we will see the centos7base image added in docker. The image is very, very small with only 271MB as its size.

first-docker-image.png

First docker image!

Step 5: Add Something (.NET Core SDK) to Container

Since now we have our first Docker image, then we can proceed to create a container with the following command.

# docker run -i -t  /bin/bash

We will be brought into the container. So now we can simply add something, such as the .NET Core SDK to the container by following the .NET Core installation steps for CentOS 7.1 (64-bit) which can be summarized as the following commands to execute.

# sudo rpm --import https://packages.microsoft.com/keys/microsoft.asc

# sudo sh -c 'echo -e "[packages-microsoft-com-prod]\nname=packages-microsoft-com-prod \nbaseurl=https://packages.microsoft.com/yumrepos/microsoft-rhel7.3-prod\nenabled=1\ngpgcheck=1\ngpgkey=https://packages.microsoft.com/keys/microsoft.asc" > /etc/yum.repos.d/dotnetdev.repo'

# sudo yum update
# sudo yum install libunwind libicu
# sudo yum install dotnet-sdk-2.0.0

# export PATH=$PATH:$HOME/dotnet

We then can create a new image from the changes we have done on the container using the following command where the centos_netcore is the repository name and 1.0 is its tag.

docker commit  [centos_netcore:1.0]

We will then realize the new image container will be quite big with 1.7GB as its size. Thanks to .NET Core SDK.

Step 6: Moving the New Image to PC

The next step that we are going to do is exporting the new image as a .tar file using the following command.

docker save  > /tmp/centos_netcore.tar

Now, we need to launch WinSCP to retrieve the .tar file via SCP (Secure Copy Protocol) to local host.

login-as-root-on-winscp.png

Ready to access the VM via SCP.

Step 7: Load Docker Image

So now we can shutdown the VM and enable back the Hyper-V because the subsequent steps will need Docker for Windows to work.

After restarting our local computer with Hyper-V enabled, we can launch Docker for Windows. After that, we load the image to the Docker using the following command in the directory where we keep the .tar file in local host.

docker load < centos_netcore.tar

Step 8: Running ASP .NET Core Web App on the Docker Image

Now, we can change the Dockerfile to use the new image we created.

FROM centos_netcore:1.0
ARG source
WORKDIR /app
EXPOSE 80
COPY ${source:-obj/Docker/publish} .
ENTRYPOINT ["dotnet", "changshi.dll"]

When we hit F5 to make it run in Docker, yup, we will get back the website.

No, just kidding. We will actually get an error message that says localhost doesn’t send any data.

localhost-did-not-send-any-data.png

Localhost did not send any data. Why?

So if we read the messages in Visual Studio Output Window, we will see one line of message saying that it’s unable to bind to http://localhost:5000 on the IPv6 loopback interface.

error--99-eaddrnotavail.png

Error -99 EADDRNOTAVAIL

According to Cesar Blum Silveira, Software Engineer from Microsoft ASP .NET Core Team, this problem is because “localhost will attempt to bind to both the IPv4 and IPv6 loopback interfaces. If IPv6 is not available or fails to bind for some reason, you will see that warning.

ipv6-problem-explanation.png

Explanation of Error -99 EADDRNOTAVAIL by Microsoft engineer. (Link)

Then I switch to view the output from Docker on the Output Window.

output-docker.png

Output from Docker

It turns out that the port on docker is port 80. So I tried to add the following line in Program.cs.

public static IWebHost BuildWebHost(string[] args) =>
    WebHost.CreateDefaultBuilder(args)
    .UseUrls("http://0.0.0.0:80") // Added this line
    .UseStartup()
    .Build();

Now, it works again with the beautiful web page.

launched-at-localhost

Success!

Containers, Containers Everywhere

containers-containers-everywhere.png
The whole concept of Docker images, containers, micro-services are still very new to me. Hence, if you spot any problem in my post, feel free to point out. Thanks in advance!

References

#centos, #docker, #microservices, #technology, #virtualbox

[KOSD Series] First Attempt of Deploying ASP .NET Core to Azure Container Service

KOSD, or Kopi-O Siew Dai, is a type of Singapore coffee that I enjoy. It is basically a cup of coffee with a little bit of sugar. This series is meant to blog about technical knowledge that I gained while having a small cup of Kopi-O Siew Dai.

kosd-docker-azure_container_registry-vsts

Last month, after sharing the concepts and use cases of Domain Driven Development, Riza moved on to talk about Containers in the sharing session of Singapore .NET Developers Community.

microservices-not-equal-to-containers.png

Riza’s talking about Containers. Yes, microservices are not containers!

Learning Motivation

In the beginning of Riza’s talk, he mentioned GO-JEK, an Indonesia ride-hailing phone service. Due to their rapid growth, the traditional monolithic architecture can no longer support their business. Hence, they switched to use a modern approach which includes moving apps to containers.

Hence, after the meetup, I was very excited to find out more about micro-services and Docker containers. With the ability of .NET Core to be cross-platform, as a Azure lover, I am interested to find out more how I can deploy ASP .NET Core web app to a container in Azure. So, I decided to write this short article to share with my teammates about this that they can learn while drinking a cup of coffee.

Creating New Project with Docker Support

Since I am trying it out as personal project, I choose to start it with a new ASP .NET Core project. Then in the Visual Studio, I can easily turn it to be a Docker supporting app easily by checking the “Enable Docker Support” option.

enable-docker-support.png

Enable Docker Support

For existing web application projects, we will not have the screen above. Luckily, it is still easy to add Docker Support to an existing ASP .NET Core project on Visual Studio.

add-docker-support-to-existing-project

Enabling Docker Support in existing projects.

Then by clicking on the “F5” button to run the project, I manage to get the following screen (The background is customized by me). The message is displayed using the following line.

System.Runtime.InteropServices.RuntimeInformation.OSDescription;

launched-at-localhost.png

Yay, we managed to run the web app inside a Linux container locally.

Publishing to Microsoft Azure with Continuous Delivery

Without Continuous Delivery, we also can easily right-click the web application to publish it to the Container Registry on Azure.

publishing-to-container-registry

Creating a new Azure Container Registry which will have the Docker image published to.

Then, on Azure Portal, we will see three new resources added. Firstly, we will have the Container Registry.

Then, we will also have an app service site which is running the image downloaded from the Container Registry. Finally, we have an App Service Plan which needs to be at least B1 because free and shared SKUs are not available for apps running on Linux (The official Microsoft documentation says we should have the VM size of the App Service Plan to be S1 or larger though).

container-registry-on-azure.png

Container Registry for my new web app, Changshi.

To enable Continuous Delivery, I choose to use Github + Visual Studio Team Services (VSTS). By doing so, build and release will be automatically started whenever I check in code to Github.

build-on-vsts

Build history and details on VSTS.

Yup, this is so far what I have tried out in my first step of playing with containers. If you are interested, please check out the references listed below.

References

#net-core, #azure, #docker, #kopi-o-siew-dai, #microsoft