The XLcloud Project Blog and News

To content | To menu | To search

Thursday 27 February 2014

Running an OpenGL application on a GPU-accelerated Nova Instance (Part 2)

I left the presentation of the part 1 with the promise that I will show you a real 3D application running into a GPU-accelerated instance.

And so, here it is: screencast.

In this screencast you will see the bootstrap of an instance in devstack that has a GPU attached to it. From within the instance, I run an OpenGL benchmark called Heaven from Unigine. The screencast shows that the GPU load is for real thanks to a Ganglia monitoring dashboard. The software at work in this demo is comprised of the NVIDIA GPU GRID driver, TurboVNC, VirtualGL, the Unigine benchmark and Ganglia to visualize the ongoing workload.

Cherry on the cake, the VM is deployed with Heat and Chef Solo using a template you can download from [1].

I hope you will enjoy the Unigine video. It's pretty cool.

Tuesday 25 February 2014

How we plan to manage autoscaling using the new notification alarming service of Ceilometer

In this post, I'd like to describe how we plan to use the new alarming capabilities offered in Heat and Ceilometer to be notified of stack state changes resulting from an autoscaling operation. Indeed, with Icehouse, it will be possible to specify an new type of alarm whereby you can associate a user-land defined webhook with an autoscaling notification.

There are three different types of autoscaling notifications you will able to subscribe to.

  • orchestration.autoscaling.start
  • orchestration.autoscaling.error
  • orchestration.autoscaling.end

The first two notifications are self explanatory. The third one orchestration.autoscaling.end is sent by Heat when an auto-scaling-group resize has completed successfully. Which more specifically means, when the state of the (hidden) stack associated with an autoscaling group has effectively transitioned from UPDATE_IN_PROGRESS state to UPDATE_COMPLETE state.

The Ceilometer blueprint which introduces the feature in Icehouse is here.

We tested it, and it seems to work fine as shown in the screen scraping below.

The CLI looks like this:

ceilometer --debug alarm-notification-create  --name foo --enabled True --alarm-action "http://localhost:9998?action=UP" --notification-type  "orchestration.autoscaling.end" -q "capacity>0"

Then the curl equivalent:

curl -i -X POST -H 'X-Auth-Token: a-very-logn-string' -H 'Content-Type: application/json' -H 'Accept: application/json' -H 'User-Agent: python-ceilometerclient' -d '{"alarm_actions": ["http://localhost:9998?action=UP"], "name": "foo", "notification_rule": {"query": [{"field": "capacity", "type": "", "value": "0", "op": "gt"}], "period": 0, "notification_type": "orchestration.autoscaling.end*"}, "enabled": true, "repeat_actions": false, "type": "notification"}

And the callback handling:

nc -l 9998

POST /?action=UP HTTP/1.1
Host: localhost:9998
Content-Length: 1650
Accept-Encoding: gzip, deflate, compress
Accept: */*
User-Agent: python-requests/2.2.1 CPython/2.7.3 Linux/3.2.0-48-virtual

{"current": "alarm", "alarm_id": "e7dafd2d-18a3-4c9d-a4af-efe927007ae6", "reason": "Transition to alarm from insufficient data due to notification matching the defined condition for alarm  foo.end with type orchestration.autoscaling.start and period 0", "reason_data": {"_context_request_id": "req-480768ed-c5a2-46f6-b720-8ac2542e3eb8", "event_type": "orchestration.autoscaling.start", "_context_auth_token": null, "_context_user_id": null, "payload": {"state_reason": "Stack create completed successfully", "adjustment": 1, "user_id": "admin", "stack_identity": "arn:openstack:heat::6db81240677b4326b94a595c0159baa5:stacks/AS4/5eef9488-5274-4305-bedb-91f5ed45cdd6", "stack_name": "AS4", "tenant_id": "6db81240677b4326b94a595c0159baa5", "adjustment_type": "ChangeInCapacity", "create_at": "2014-02-19T10:35:59Z", "groupname": "AS4-ASGroup-nyywzf4x5hif", "state": "CREATE_COMPLETE", "capacity": 1, "message": "Start resizing the group AS4-ASGroup-nyywzf4x5hif", "project_id": null}, "_context_username": "admin", "_context_show_deleted": false, "_context_trust_id": null, "priority": "INFO", "_context_is_admin": false, "_context_user": "admin", "publisher_id": "orchestration.ds-swann-precise-node-s3fbwjntypxv", "message_id": "738be905-1ec3-47e3-811a-ab7975426567", "_context_roles": [], "_context_auth_url": "", "timestamp": "2014-02-19 10:42:23.960329", "_unique_id": "a32ff8a1a8144532b6312ef36790acec", "_context_tenant_id": "6db81240677b4326b94a595c0159baa5", "_context_password": "password", "_context_trustor_user_id": null, "_context_aws_creds": null, "_context_tenant": "demo"}, "previous": "insufficient data"}

At first glance, it may seem as a minor feature but it's not. Hence this post. For us, it is a significant stride toward closing the implementation gap we used to have with the integrated lifecycle management operations we want to support for the clusters we deploy in our platform. To help with the explanation, I sketched a diagram that shows how we handle the deployment orchestration and configuration management automation workflow (that I will call contextualization for short) which is taking place when an autoscaling condition occurs. The use case of choice is the remote rendering cluster that we already used in some cool cloud gaming demos.

__Figure 1: Remote Rendering Cluster contextualization workflow upon autoscaling

RRVC Auto-Scaling Workflow

The XLcloud Management Service (XMS) sits on top of OpenStack. It is responsible for supporting the seamless integration between resource deployment orchestration and configuration management automation. Autoscaling is just an example of a state change affecting condition that may occur in the platform. There are other state change affecting conditions such as deploying a new application onto the cluster or upgrading the software that we handle using the same contextualization mechanism. Note that a cluster, as we call it, is nothing more than a relatively complex multi-tiered Heat stack which lifecycle management operations are handled by XMS throughout its lifespan.

In (1) the deployment of the cluster is initiated by XMS which in turn delegates to Heat for the deployment orchestration. The cluster is created by submitting a master template which itself references embedded templates we call layers. Also, not shown here, a layer can benefit from interesting capabilities such as being attached to a specific subnet. Layers can be chosen from a catalog. They are used as blueprints of purpose-built instances to compose a given stack. The remote rendering cluster is therefore a stack composed of layers including in particular an auto-scaling-group layer composed of GPU-accelerated rendering node instances. They are all created and configured using the same parameters and set of Chef recipes. There are two types of alarm resources we specify in the rendering nodes layer template.

  • The OS::Ceilometer::Alarm resource type introduced in Havana which allows to associate an alarm with an auto-scaling-group policy
  • The OS::Ceilometer::Notification resource type which will allow to associate an alarm with a notification.

Note that the OS::Ceilometer::Notification is a new resource type proposal. It doesn't exist yet. It is intended to declaratively represent an alarm that is triggered by Ceilometer when a notification matching certain criteria is met. In our particular use case, when Ceilometer receives an orchestration.autoscaling.end notification that is sent by Heat when an auto-scaling-group resize has completed successfully. The alarm specification allows to distinguish between scale-up (capacity > 0) and scale-down (capacity <0).

Here is an example of how it would be used:

     Type: OS::Ceilometer::Notification
       description: Send an alarm when Ceilometer receives a scale up notification
       notification_type: orchestration.autoscaling.end
       capacity: '0'
       comparison_operator: gt
       - {  a user-land webhook URL... }
       matching_metadata: {'metadata.user_metadata.groupName': {'Ref':'compute-nodes-layer'}
     Type: OS::Ceilometer::Notification
       description: Send an alarm when Ceilometer receives a scale down notification
       notification_type: orchestration.autoscaling.end
       capacity: '0'
       comparison_operator: lt
       - {  a user-land webhook URL... }
       matching_metadata: {'metadata.user_metadata.groupName': {'Ref':'compute-nodes-layer'}@@

In (2) Heat creates these two alarms through the Ceilometer Alarming Service API

In (3) all the instances of the cluster execute their initial setup recipes. The role of the initial setup is to bring the cluster in a state that can be remotely managed by XMS. That is, download all the cookbooks from their respective repositories and along that line resolve their dependencies, install the MCollective Agent, Chef Solo and the rendering engine middleware. During the setup phase, the metadata associated with the stack are exposed as ohai facts through the MCollective Agent. Certain ohai facts, such as the stack id, will be used as MColletive filters to selectively reach a particular instance, a layer or the entire cluster.

In (4) a workload is generated against the cluster by gamers who want to play. A cloud gaming sessions load balancer running in the Virtual Cluster Agent takes those requests to further dispatch them across the auto-scaling-group of the cluster. Gaming sessions are dispatched according to their processing requirements, which may vary quite a lot, depending on the game being played and the viewing resolution.

In (5) a gmond which runs on every GPU-accelereated instance uses a specific Ganglia GPU module to monitor the GPU(s) that are attached to the rendering instances via PCI-passthrough. In another layer of the cluster, a Ganglia Collector, which runs gmetad, collects the GPU usage metrics to pass them through a Ganglia Pollster we developed for that cluster, which in turn, pushes them (after some local processing) as Ceilometer samples. You can observe that we have chosen not to use the cfn-push-stats helper within the monitored instances to rely instead on the Ganglia monitoring framework and the Ceilometer API. A direct benefit of this is that we get a nice Ganglia monitoring dashboard.

In (6) the Alarm Service of Ceilometer detects that a resource usage alarm condition caused by the current workload is met. We found, for example, that an increase of the GPU temperature is very representative of the ongoing GPU load. As a result, Ceilometer calls the webhook in the Auto Scaling Service of Heat, that was defined in the OS::Ceilometer::Alarm resource, which in turn will initiate a scale-up operation.

In (7) Heat spawns one or several new instances in the auto-scaling-group of the cluster

In (8) the new instance(s) execute the initial setup as above. Once the setup is complete, the auto-scaling-group enters in the UPDATE_COMPLETE state which makes the Auto Scaling Service of Heat trigger an orchestration.autoscaling.end notification.

In (9) the Alarm Service of Ceilometer detects that an autoscaling alarm condition is met. Ceilometer calls the webhook in XMS that was defined in the ''autoscaling-alarm-up" resource' of the template.

in (10) XMS makes an MCollective RPC call that will direct the instances of the cluster (expected those that are not concerned by the contextualization) that they must execute the recipes associated with an autoscaling event which we refer in the template as the 'configure' recipes.

In (11) the load-balancer can now dispatch the new incoming gaming session to the newly provisioned instance(s)

Note that the same workflow would roughly apply for a scale-down notification.

Do not hesitate to leave a note if you have a comment or suggestion to make.

Monday 10 February 2014

A GPU in your instance with Xen hypervisor

This is just a link to Running an OpenGL application on a GPU-accelerated Nova Instance (Part 1)

NOTE: We do this indirection because I didn't see the dot at the end of the original URL. So to be able to reach the URL without a dot at the end without breaking everything I added this ugly entry.

Monday 3 February 2014

Running an OpenGL application on a GPU-accelerated Nova Instance (Part 1)

Have you ever dreamed of running graphic apps in OpenStack? Now, it's possible!

About a month ago we published a blueprint [1] to enable the support of PCI-passthrough in the XenAPI driver of Nova [2]. Our primary objective was to enable GPU-accelerated instances but we nonetheless scoped the blueprint with the intent to support "any" kind of PCI device. Since then, we published two patches [3] [4] that you can readily try using the trunk version of Nova. I would like to say that this work couldn't have been done without the help of the OpenStack and Xen communities.

In part 1 of this post, I will go through a step-by-step instruction that shows how to boot a Nova instance that has direct access to a GPU under Xen virtualization. In our particular setup, we used an Nvidia K2 graphic card but it should work equally well for other Nvidia GPUs like the Nvidia K520 or M2070Q that we booted successfully too in our lab.

First you need a working devstack into a domU. To do this, you must install Xenserver 6.2 on the machine that has the GPU installed then boot the domU with an Ubuntu Saucy (but other distribution should work as well) and install a devstack all-in-one in it. When you boot the dom0, you need to prepare the device for PCI passthrough. You do this by adding "pciback.hide=(87:00.0)(88:00.0)" to the dom0 Linux kernel command line. This will assign the pciback driver to the devices with BDF 87:00.0 and 88:00.0. Information about PCI passthrough with Xen are available on Xen wiki [5].

The next step is to download the code for the PCI passthrough.

  # cd /opt/stack/nova
  # git review -d 67125

This will download the two patches that are needed and will switch on the correct git branch. Before restarting the nova services you need to configure the nova scheduler and the compute node to be able to use PCI passthrough. For further information, check the wiki [6].

On the compute node you need to select which devices are eligible for passthrough. In our case we added the K2 cards. You do this by adding those devices into a list in /etc/nova/nova.conf

  # cat /etc/nova/nova.conf
  pci_passthrough_whitelist = [{"vendor_id":"10de","product_id":"11bf"}]

The vendor ID and the product ID of the K2 GPU are respectively 10de and 11bf. Thus we need to configure the scheduler as follows:

  # cat /etc/nova/nova.conf
  scheduler_driver = nova.scheduler.filter_scheduler.FilterScheduler

The pci_alias is used to match the extra parameters of a flavor with the selected PCI device. Hence, you need create a flavor that will be associated with the PCI devices that you want to attach:

  # nova flavor-key  m1.small set "pci_passthrough:alias"="k2:1"

Last but not least. You need to copy plugin files from /opt/stack/nova/plugins/xenserver/xenapi/etc/xapi.d/ of your devstack installation into the /etc/xapi.d/plugins/ directory of dom0. Overlooking this step would most probably result in plugin errors.

Restart the nova services. On your n-cpu screen you should see your PCI resources as available as shown below:

    2014-01-30 19:20:48.340 DEBUG nova.compute.resource_tracker [-] Hypervisor: assignable PCI devices: [{"status": "available", "dev_id": "pci_87:00.0", "product_id": "11bf", "dev_type": "type-PCI", "vendor_id": "10de", "label": "label_10de_11bf", "address": "87:00.0"}, {"status": "available", "dev_id": "pci_88:00.0", "product_id": "11bf", "dev_type": "type-PCI", "vendor_id": "10de", "label": "label_10de_11bf", "address": "88:00.0"}] from (pid=10444) _report_hypervisor_resource_view /opt/stack/nova/nova/compute/

If it is not the case, then check that your file nova.conf is correctly configured as described above.

Now, when you boot an instance using the flavor m1.small, one k2 will be attached to this instance. To be noted that the resources tracker will keep track of the PCI devices that you attached to your instances and so, creating a new GPU-accelerated instance will return an error when those resources are exhausted on all the compute nodes.

Now, everything should be ready to boot a GPU-accelerated instance:

  # nova boot --flavor m1.small --image centos6 --key-name mykey testvm1
  xlcloud@devstackvm1:~$ nova list
  | ID           | Name    | Status | Task State | Power State | Networks           |
  | 92f4...f081a | testvm1 | ACTIVE | -          | Running     | private= |

Log into your instance to check which PCI devices are available:

  xlcloud@devstackvm1:~$ ssh  -l cloud-user
  Last login: Thu Jan 30 18:26:32 2014 from
  [cloud-user@testvm1 ~]$ lspci 
  00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
  00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
  00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
  00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
  00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
  00:02.0 VGA compatible controller: Cirrus Logic GD 5446
  00:03.0 SCSI storage controller: XenSource, Inc. Xen Platform Device (rev 01)
  00:05.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K2] (rev a1)

As you can see in the output above, my instance is attached to the K2 GPU. The next step to run an actual graphic application, since in the end that's what we want to do, requires to install drivers of your graphic card manufacturer into your GPU-accelerated instance (in this case that would be the Nvidia driver for the K2).

  [cloud-user@testvm1 ~]$ nvidia-smi 
  Mon Feb  3 08:51:42 2014       
  | NVIDIA-SMI 331.38     Driver Version: 331.38         |                       
  | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
  | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
  |   0  GRID K2             Off  | 0000:00:05.0     Off |                  Off |
  | N/A   32C    P0    37W / 117W |      9MiB /  4095MiB |      0%      Default |
  | Compute processes:                                               GPU Memory |
  |  GPU       PID  Process name                                     Usage      |
  |  No running compute processes found                                         |

Okay, that's probably enough for today. In Part 2 of this post, I will show you how to setup a GPU-accelerated Nova instance to run an OpenGL application like the Unigine benchmark [7].

Guillaume Thouvenin XLcloud R&D

Wednesday 11 September 2013

Baremetal Driver and the Devstack.

You maybe know the Baremetal driver is quite experimental and planned to be replaced by the Ironic project. That said, there were recent improvements from the community which made the baremetal driver still very interesting to test. So as to get the latest updates, I tried to configure a Devstack for provisioning real baremetal hosts. Here are my notes from the install which can help some of you. I hope.

Continue reading...

XLcloud Blog and News has changed home

This is the new home of the XLcloud Project Blog and News. We had to change home because our former blog server had some RSS feed problems and limitations with regard to spam detection and it wasn't possible to post comments neither. Now those impediments are fixed. We are looking forward to reading your comments and suggestions. Our previous OpenStack related post are still accessible on