Azure Custom Vision – Technology deep-dives Part 2

[Reading Time: 14 minutes]

Introduction – I know what a post bus is!

Why is this post about a post bus? What is Custom Vision? Long story short: I want to know when the postman is coming or there was. And without having to look out the window all the time. This is supposed to make a machine for me! For me, there are several reasons for this. Here are two that you might like to see 🙂

You have small children, who should take an afternoon nap and better not be disturbed, because otherwise the peace at home is endangered. … The postman is not interested in your suffering (bell).

Tracking indicates a delivery, but your package is nowhere to be found.

As far as this is concerned, I can think of many other good reasons why you should scan your home environment automatically (here only postmen).
If you have some cool ideas, please give your comment below!

To build a system that recognizes the postman/Postbus, it takes something more than just a video camera. I wanted to replace the human in front of the camera, and that goes with AI – better, with a sub-discipline of it, the machine learning. And that’s where Custom Vision comes in.

Azure Cognitive Services

This is a service of Azure Cognitive Services and makes machine learning (ML) models usable via API. There are different services for different domains, some of which simply need to be consumed to achieve respectable results. Azure Cognitive Services currently consists of five segments (Decision, Speech, Language, Vision, Search). For me, however, the category “Vision” is currently the most interesting in this project.

One service I used experimentally for an older postman project is the Computer Vision API. This can access general models, i.e. ready-trained models of no specific domain. A domain is a kind of “department” or discipline (otherwise Wikipedia will certainly help you 😉 ). In combination with the tool ISpy, this was a very good construct (check out the link here! If you can do C’s, that’s for you). Unfortunately, the Computer Vision API doesn’t know what a postman is. That’s why I took a closer look at the Custom Vision API.

Custom Vision API

Like the Computer Vision API, the Custom Vision API can detect/process general images, but additionally, as the name suggests, can also specialize in the ML models. The models of the Custom Vision API are pre-trained and can learn their own classes or objects with more or less little effort. So I can teach the machine what a postman, or Postbus, is. Here we go! Let’s build an ML model for post buses!

Create a service

In the Azure Portal, you create a Cognitive Service (“Custom Vision API”).

In the portal, search for Custom Vision and confirm in the appearing mask with “Create”.

Then fill out the form.

The F0 plan is sufficient for initial experiments.

After the service has been created, you can jump into it by clicking on the link at point 2 “Custom Vision Portal”.

Alternatively, you can enter the URL https://customvision.ai/ in your browser and sign in with your Azure account.

In the Custom Vision Portal, you are now creating a new project.

For our project, we create an object detection, because we want to recognize one or more specific objects in one image.

As a domain, we take a generally pre-trained model.

You create the project with “Create Project”

Preparation is Alpha and Omega …

Now we can start to teach our post bus to the machine. What do we need to do this? Pictures! Normally you need a lot of pictures for a really good result… a LOT of pictures. Surprisingly, the Custom Vision Service is satisfied with relatively few images and delivers passable results for this.

For a good result, you should pay attention to a few things in advance:

Get lots of pictures anyway

see that you have different images

you need pictures with different perspectives of your Postbus and the surroundings

it should also be possible to identify different weather conditions

the quality of the images should perhaps also vary

you can get a lot out of it with a little experimentation

Here I have a screenshot of my service of highly loaded pictures. You will see that I also include other objects in my model. So I can also identify more than just post car in my environment and the model gets a better training result. However, it is important that you always have about the same number of pictures in each category, otherwise, a good result will not come out in the end.

As a base setup, we need at least two tags (a tag denotes a category of objects and is often called a label).
Postcar
NoPost

This allows the machine to distinguish between two different objects. Alternatively, like me, you can assign several other tags (which you can then, of course, fill with corresponding pictures). But the learning effort is higher and you have to invest more time for your project!

Here’s how you go:
Upload one or more images from your collection of images.

Tip: prepare your folder structure so that the tags are shown here.

Then you will be offered a surface with a loaded image. In this one, you mark the corresponding object and forgive the tag. Repeat this for each image you’ve uploaded.

When you’re done with it, you understand what the main work of a Data Scientiest is.

A data scientist deals with the processing of data, which serves as the basis for ML, for example. AI developers/engineers as well as data scientists create machine learning models from this data using various ML algorithms (the topic is far too big to go into here).

Our Custom Vision Service abstracts the work steps to a large extent in such a way that a software developer can concentrate almost alone on consuming the model. This is the true power of cognitive services.

Success through training … not only in Sports

Now we move on. After organizing our pictures and tags, we leave everything else to the service. To do this, switch to the performance view in the portal (at the top of the menu).

What you can still influence here is the probability (in the picture above “Probability Threshold”) with which the machine checks the training result and recognizes it as acceptable. In my example, it is 70%.

The slider below – for me it is “Overlap Threshold” – determines how high the deviation between the defined and the recognized bounding box is – it’s all about object recognition in a scene. Visually described, the bounding box is the rectangle that marks the object in the image (see screenshot above). The selection of the objects you made in the previous steps is therefore validated against the rectangle recognized by the machine.

Training an ML model always works (with supervised learning approaches) – roughly speaking – according to the same principle. Data sets (here images) are divided into training data and test data. The percentage distribution of the data is often in the order of 70: 30. After the training, an evaluation is carried out to assess how good the result is. The test data, which the system does not yet know, is then checked against the generated model. Applied to my example: if the result of my test data has a 70% hit rate on Postauto, it is a valid result – that is, a result that the machine accepts as a post bus.

Now that you have set your values (I recommend you to leave the default setting this way), you can click on the “Train” button.

In the following dialog, you configure the training resources. Since this is an intense mathematical process, you usually need a lot of power, i.e. good hardware, if you don’t want to wait forever for your finished model and expect good quality. Now, through the dialog, you can make the decision whether to go fast (less good results), or you can provide computing power, and thus also expect better results (takes longer). Since we’re in the cloud here, you’ll need to deploy resources in the form of VMs. The advantage of a Platform/Software as a Service (Paas/Saas), however, is that you don’t get much of it and you get a lot of sliders for it. Creating a model is an iterative process that consists of many repetitions with different varying parameters. For the first rash, it is still enough. to throw on the fast training. This way you get an indication and can improve with some experience on different parameters (duration, number of images, quality,….).

After the training, before the training,

Evaluation of the prediction quality of the ML model

Maybe you get something like presented above as a result; your numbers may be different. This depends on your selected images, duration and much more. . But what you can take with you is a statement about how good your result (your trained ML model) is. Therefore, briefly on what the individual statistics say:

Precision
This graphic shows you how likely/accuracy your model has been correctly detected when it detects objects in one of the test data images (regardless of whether the object was really a Postbus or something else). E.g.: has a picture 10 Postbuses and were of which 5 real, as well as 3 trucks, were wrongly recognized as a post bus, then I have a Precision of 5 / (5+3) = 62.5%
Recall
The Recall tells you how likely your model will find a real post bus from all sorts of post cars in your picture collection. (simple: how good is your model) E.g. like Precision (5 real postcars + [false postauto]3 trucks out of a total of 10 real postcars) 5/10 = 20%
mAP (mean Average Precision)
Simply put, and so you can read it even when hovering via the info icon in the portal, this value reflects the performance of the object detector. If you want to deal with it in more detail, you can read here, for example. Basically, it is a question of how exactly the detector can perceive the detected bounding box to the specified box (the one that one has “painted” in the portal) as recognized. This is evaluated and averaged several times under different parameters.

I had used several tags for my project, and this was the result:

there are german tags, but this makes no difference to the numbers 😉 You may see that Postauto is still the same.

What you can see is the individual evaluation of the tags. Contrary to my recommendation, I did not use the same number of images for each day. I only trained postcars in an iteration (training run) and did not consider all other objects.

You can and should repeat the training process with different variations of images and parameters until you have a good result. You can also use the “Advanced Training” option.

Caution! It can sometimes go into the money.

Before you can really use your model, you should publish it. With this step, you can access your model even without the Custom Vision Portal.

To do this, you click on “Publish” at the top left of the menu and select your Custom Vision Service in the following dialog, which you created at the very beginning (“PostCarService”). This gives your iteration the “Published” badge.

Next, click on “Predictions” in the menu above. Here you will be offered two possibilities for further work. O
n the one hand, you can display the public endpoints for your Postcar service and on the other hand you can make a “Quick Test” over the interface. In this test, you have the option to test images that are not part of your training and test record against your model via the UI. Try it out!

Applying the trained model

This would finally be finished with a usable model. What remains is the response and queries. In this post, I’ll explain to you how to make general queries to your model. In another, I’ll show you how, in combination with ISpy, you use the ML model to build postman recognition (or similar). You can be excited!

But now first to address. If you click on “View Endpoint”, a dialog opens, which offers you two URLs. This is the API/interface for detection. You can send an image to the API via a request, or a link to an image.

The dialog also tells you how to formulate your request to address the respective API.

I assume you know how to place a request to an API. If not, I recommend you to take a look at the topic REST and the tools Postman or the Powershell. Alternatively, visit Microsoft’s Docs. There are still tons of instructions here.

If you have successfully dropped a request, you will receive a response from the API that is similar to the following one.

{
    "id": "0fbda4ee-8956-4979-bf57-a252441af98d",
    "project": "9ca4032b-beeb-40ad-9396-1c3fcfd9ba89",
    "iteration": "27c85265-a158-4fc4-b22a-d535dd758d80",
    "created": "2018-06-11T09:34:29.9496528Z",
    "predictions": [
        {
            "probability": 0.7702891214,
            "tagId": "677afcf8-bc4a-493f-b588-707663286125",
            "tagName": "Postauto",
            "boundingBox":
                "left": 0.2889924,
                "top": 0.0169312358,
                "width": 0.7007024,
                "height": 0.8284572
            }
        },
        {
            "probability": 0.012788726,
            "tagId": "ca844f08-b6c0-4d9a-9010-73945d442708",
            "tagName": "Volvo V40",
            "boundingBox":
                "left": 0.304018974,
                "top": 0.413163722,
                "width": 0.299461246,
                "height": 0.436399817
            }
        },
...]
}

With the Json you can then do anything. On the one hand, you get the information about what was recognized in your image and what probability it is the right one. In addition, you will also get the positions of the detected objects. This means that you can mark the object with a rectangle in any programming language/interpreter, for example, or keep an “inventory” of the objects recognized in the image. There are no limits to your creativity.

In the next post I would like to introduce you to my IoT (Internet of Things) variant of a postman recognition system. And thus further deepen the deep-dives already announced in Part 1.

The procedure presented here will be taken up again later. It will be about getting our AI model running as an offline version on a device so as not always working in the cloud.

Stay tuned!