harbar.net component based software & platform hygiene

Workflow Manager Farms for SharePoint 2013 Part One: Core Concepts, High Availability, Certificate and SharePoint considerations

Print | posted on Friday, July 26, 2013 5:57 AM

Introduction

There’s not a lot of high quality documentation for Workflow Manager 1.0. What exists is generally accurate, however it’s the key missing information and lack of detail which presents challenges in the field. Additionally a couple of the critical details are either hidden away in the murky depths of MSDN, or are incorrect due to copy and paste errors. Much of the TechNet (SharePoint) documentation is either wrong, or will only ever work with a single server farm. During the initial content development work for the MCSM: SharePoint it became clear there is a very large gap with respect to actually implementing the high level deployment guidance provided by the vendor. Following recent discussions in the MCSM: SharePoint community more generally, the topic again raised its head and led to the publication of this article.

This guide is an attempt to help address that gap, and if the wider community feels it is of value, will be the first of a number of Workflow Manager related posts in the context of SharePoint 2013. The objective is not to cover background conceptual or architectural aspects of Workflow Manager or Service Bus specifically, as these are well covered already elsewhere. Rather the objective is to provide clear and accurate explanation along with repeatable deployment and configuration details for common scenarios faced by the SharePoint practitioner.

This article will focus on the deployment and configuration of a highly available, SSL, Workflow Manager Farm for use with SharePoint 2013. In most test and lab environments a single server Workflow Manager farm is deployed and is sufficient. Pretty much everything you will find will talk to this scenario with scant mention of doing it “properly in production”. Pretty much all of the deployment guidance on TechNet and elsewhere actually uses steps that simply will not work in anything other than a single server Workflow Manager farm, from certificates to user accounts. When it comes to production environments, high availability, performance and throughput (and security!) are paramount. Furthermore it is clear that the SSL aspects of a “real” Workflow Manager farm are poorly documented and confusing.

This article is in four parts, due to its length, and also to avoid switching between exposition and configuration:

  1. Core Concepts, High Availability, Certificate and SharePoint considerations
  2. End to End Configuration using auto generated certificates and NLB
  3. Switching an existing farm to use Domain CA issued certificates
  4. End to End Configuration using Domain CA issued certificates

For those comfortable with the technology this first part will be enough. The other two parts will include lots of nice Windows PowerShell, and some not so nice Windows PowerShell!

Let’s get started.

 

High Availability versus Scalability

As hopefully you are all aware a Workflow Manager farm sits on top of a Service Bus farm, and can either be a single server, or three servers. There are no other supported topologies. Service Bus reaches a quorum with three nodes. Whilst the background to that is academically interesting, you don’t really need to know the “why” here, although understanding the concept of quorum in general should be considered a software architecture fundamental. Aside from that it just doesn’t matter, its one or three. Again, nothing else is supported, and thus nothing else matters.

In order to deploy a highly available Workflow Manager farm, that of course means we need three machines in our farm. Because Workflow Manager sits on top of Service Bus, we “get high availability for free” via its scale out model. Service Bus takes care of business behind the scenes for us and the correct node will handle tasks. This is automatic and also provides the performance and throughput benefits of a farm deployment.

Whilst this article is not focused on topology, with SharePoint 2013 the external Workflow Manager Farm deployment should be considered and treated as an “appliance”, albeit a reasonably complicated one. In other words we should accept the required topology design for vendor supportability, set it up and leave it be. That means three servers, each of them running Service Bus and each of them running Workflow Manager. This is what everyone should be doing for SharePoint 2013 On-premises deployments.

However this isn’t high availability, its horizontal scalability. It’s pretty sweet. The trouble is the perception that “Service Bus takes care of everything”. Which is not the case…

When we create a connection to a Workflow Manager farm from a SharePoint farm with the Register-SPWorkflowService cmdlet we pass in a WorkflowHostUri parameter. This typically is the host name of a Workflow Manager host. If we have three Workflow Manager hosts, which host name should we use? Well we can use anyone we like, as long as it’s valid. This will work. But it’s not highly available. If that particular host is down for whatever reason, our Workflow Connection – which is a Service Application Proxy - will be broken and we cannot configure or execute any SharePoint 2013 workflows.

Despite being a Service Application Proxy, the external service (Workflow Manager) does not interact with SharePoint’s Application Addresses Refresh capability to cache a list of endpoints for Workflow Manager.

The solution here is simple. We need to implement some form of load balancing for the Workflow Manager farm so that the consuming SharePoint farms speak to the Workflow Manager farm via a virtual name. We can use Network Load Balancing, Application Request Routing, or if we just want to spend more money than necessary for the sake of it, a “hardware” solution.

The Scaling out Workflow Manager 1.0 documentation does a particularly poor job of detailing this:

The servers should be configured with a software or hardware load balancer for proper load balancing, or can be accessed directly”

That’s all very well and good and to be fair implementing load balancing for Workflow Manager, as we will see in part two, is very easy indeed. However instructions are always useful :) and when we add SSL into the mix it gets a little more interesting.

 

Secure Sockets Layer

Workflow Manager should use Secure Sockets Layer (SSL). Period.

Sure there’s an option in its configuration to allow HTTP connections. Sure there’s an option to make a HTTP connection from SharePoint. Sure there’s an option to configure SharePoint’s STS to allow HTTP. However none of those options should be used, and they certainly shouldn’t be used in production deployments.

Workflow Manager should use SSL. Period. :)

Why? Because it’s a Server to Server (S2S) trust, and Server to Server trusts leverage OAuth2. OAuth2 is an authorization framework, which can at best be described as an insecure authorization framework! Without going into that whole saga, a key consideration is that it presents tokens over the wire in plain text. Thus we leverage the Universal Firewall Bypass Protocol (UFBP) – otherwise known as Secure Sockets Layer (SSL) - to protect those tokens over the wire. Using SSL doesn’t make OAuth2 secure, but it does protect those tokens. And that’s critical for production environments.

SSL certificates are commonly avoided for no good reason, often viewed as “scary”, “expensive” or involving another team (one that manages the Certificate Authority). Lots of people think that their “internal network is secure”. These things combined lead many to avoid the use of SSL certificates. Now the network isn’t secure, not by a long shot, and the use of OAuth2 across the board in SharePoint 2013 effectively means you are now in the business of managing certificates, whether you like it or not.

Workflow Manager’s Configuration Wizard does all the right things and makes it a snap to use SSL, with the correct defaults and by auto generating the required certificates. That’s good because there are a variety of SSL certificates in play and they are at first glance pretty confusing. Let’s take a look at each of the “five” certificates in turn using the common terms for them in the documentation and Windows PowerShell cmdlets.

This list is repeated all over the place on web sites, slide decks and books by “experts”. The trouble is that the details are incorrect in the source it was copied from and a simple list doesn’t actually explain anything. The below version of the list addresses those problems (hopefully).

On the first server in the farm, all of the following certificates are stored in the Local Computer Personal Store.

 

Service Bus Certificates

1. Farm Certificate.
This is used for communication between Service Bus and Workflow Manager.

  • This is a Server Communications Certificate. In other words it’s a regular SSL cert like we’d use in IIS for a web site.
  • By default, its Common Name will be the FQDN of the first server in the farm (e.g. CN=FABWFM1.fabrikam.com).
  • This certificate includes a Subject Alternative Name for the DNS Domain (e.g. DNS Name=*.fabrikam.com).
  • By default, this certificate is issued by AppServerGeneratedSBCA.

2. Encryption Certificate.
This is used for encryption of connection strings stored in the Service Bus Management Database.

  • This is a Server Communications Certificate.
  • By default, its Common Name will be the FQDN of the first server in the farm.
  • This certificate includes a Subject Alternative Name for the DNS domain.
  • By default, this certificate is issued by AppServerGeneratedSBCA.

As you may have figured out these are exactly the same certificate. They have the same thumbprint. It’s just being used for different purposes. Whilst you can configure the system to use a different certificate for each function there is no good reason to do so. And thus when auto generating certificates, the Workflow Manager Configuration Wizard only creates one.

During the initial configuration of the Service Bus Farm, the AppServerGeneratedSBCA certificate is also created and this is used as the Root Certificate for the above Server Certificate. SBCA stands for Service Bus Certificate Authority. This chap never appears in any of the published lists, even thou it’s reasonably important in the grand scheme of things (nothing will work without it)! When additional servers are joined to the farm, this certificate is copied to the Trusted Root Certificate Authorities store.

So the reality is that by default there are two certificates related to Service Bus, but one is the Root Certificate and the other is the Service Bus Certificate, which is used for both farm communication and encryption of connection strings.

 

Workflow Manager Certificates

1. Services SSL Certificate.
This is used for communication between Workflow Manager and its clients (for example SharePoint). When we register a Workflow Service with a SharePoint farm or otherwise interact with Workflow Manager, this is the bad boy that counts.

  • This is a Server Communications Certificate.
  • By default, its common name will be the FQDN of the first server in the farm (e.g. CN=FABWFM1.fabrikam.com).
  • This certificate includes a Subject Alternative Name for the DNS domain (e.g. DNS Name=*.fabrikam.com).
  • By default, this certificate is issued by the same FQDN. This is effectively a Self-signed Certificate.
  • When additional servers are joined to the farm, this certificate is copied to the Trusted Root Certificate Authorities store.

2. Encryption Certificate.
This is used for encryption of connection strings stored in the Workflow Manager Management Database.

  • This is a Server Communications Certificate.
  • By default, its common name will be the FQDN of the first server in the farm.
  • This certificate includes a Subject Alternative Name.
  • By default, this certificate is issued by the same FQDN.
  • When additional servers are joined to the farm, this certificate is copied to the Trusted Root Certificate Authorities store.

3. Outbound Signing Certificate.
This is used for securing communications between workflows and their clients and between different workflows (not between Workflow Manager and its clients as documented). This certificate is used to sign the security token portion of a HTTP activity, which includes the claims of the user that instantiates the workflow.

  • This is a Certificate Signing Certificate. 
  • By default, its Common Name will be CN=WorkflowOutbound.
  • By default this certificate is a CA Root Certificate so it’s issued by itself.
  • This certificate is created when the first host is added to the Workflow Manager farm.

Yup. You guessed it. The Services SSL Certificate and the Encryption Certificate are the same certificate! Again you can separate them out, but again there is no good reason to do so.

For Workflow Manager, there is no AppServerGeneratedWFCA (Workflow Manager Certificate Authority) Certificate generated with default settings, although we can choose to use one if needed. Again in most scenarios this is not necessary. Certainly for a supported “appliance” style Workflow Manager Farm deployment we don’t need this guy.

 

Clear as mud, right?

Phew! Initially, that list of five certificates is a bit scary. It’s actually six! Count ‘em! But the reality is there are actually only four created in a default configuration:

  • AppServerGeneratedSBCA – Root Certificate for Service Bus
  • FABWFM1.fabrikam.com – Service Bus Farm and Encryption Certificate
  • FABWFM1.fabrikam.com – Workflow Manager Services and Encryption Certificate
  • WorkflowOutbound – Workflow Manager Outbound Signing Certificate

What it does is provide us with is the ultimate in deployment flexibility. Service Bus and Workflow Manager have been implemented in such a way to enable us to use as few certificates as we can (one) or as many as we want (seven). This is just the way it should be.

When we add another two servers to the Workflow Manager farm, the exact same four certificates are deployed to the two additional servers. The only difference is that the AppServerGeneratedSBCA is stored in the Trusted Root Certificates store. In other words they are exported from the first server and imported on the second and third servers. This is handled automatically for us by the Workflow Manager Configuration tooling.

This default configuration, using auto generated certificates will work just fine. Service Bus and Workflow Manager can speak to each other and each host can speak to each other. Even though the common name of the Server Communications certificates are the FQDN of the first server in the farms, it all just works because the certificates also include a Subjective Alternative Name of DOMAIN=*.fabrikam.com. This is the “magic” that makes it all hang together. This is what avoids certificate validation failures when the hosts attempts to speak to each other.

One other important point to note about the auto-generated certificates used by Service Bus and Workflow Manager: they all have an expiry date or five years from the point they are created. That means the shelf life before you need to do some SSL operational service management is 5 years.

 

 

High Availability

Enough talk of those pesky certificates already! Let’s get busy connecting SharePoint and Workflow Manager together. As already mentioned Workflow Manager (and Service Bus) take care of resource governance inside the farm. But what about the connections to the farm? For that we need an external (to Workflow Manager) load balancing solution.

Instead of connecting to https://fabwfm1.fabrikam.com:12990 we should connect to something like https://wfm.fabrikam.com:12990. This is trivial to implement using Network Load Balancing (NLB), Application Request Routing (ARR) or slightly less trivial with another load balancing solution. Once we have load balancing in place, and have a DNS record for our new address, we are good to go.

Yup, you read that part correctly, we have absolutely nothing else to configure at this point. Using the default Workflow Manager auto generated certificates and a simple Network Load Balancing cluster, just works. So does AAR, so does something like F5 Local Traffic Manager.

It works regardless of the type of load balancing because of the Subject Alternative Name attribute of the certificates. As long as our address is in the fabrikam.com DNS domain we are all set. You do not need to create your own certificates to deploy a highly available SSL Workflow Manager Farm for SharePoint 2013.

After all that talk of certificates, for 80% of all deployments we don’t even need to care. We just need to configure High Availability.

High Availability must be considered holistically however. A load balancer generally does load balancing. Whether it’s NLB or ARR or an “intelligent”, “hardware” device from the usual suspects, it doesn’t make any difference. They all require configuration and scripting to truly function for HA. If all you need is a solution for when you reboot a host due to Windows Updates or similar scenarios you are good. But for everything else you need to tell the load balancer how to be “intelligent”, it will not happen out of the box. Does your load balancer vendor ship a script that allows it to interrogate Service Bus and Workflow Manager farm status and take the correct actions? Yeah, I didn’t think so. A HTTPS GET is not all it takes here.

Be sensible about this stuff and put in place reasonable Service Level Agreements (SLAs) which can be met without extensive additional operational service management burden and cost. The more complicated your HA, the more likely it is that it will be the thing that breaks and knocks your service out of commission.

If you have the tooling (for example System Center, or Tivoli) leverage it smartly to hook into the very high quality Windows PowerShell module for Workflow Manager.

 

 

But it can’t be that easy, we need our own certificates from our CA

You just can’t get enough certificate talk can you?

If you want to, you can deploy your own certificates. For the vast majority of deployments I see no good reason whatsoever to do this. It obviously becomes a little more complicated, we have to decide to which degree we wish to take things:

  • five different certificates (we don’t need any Root CAs as when auto generating)
  • three different certificates
  • one certificate for everything except the Outbound Signing certificate
  • one CA certificate for Workflow Manager and leave the rest auto generated

Remember you do not need a Domain or Public CA based certificate to have a highly available SSL Workflow Manager farm. Only use other certificates if you have a good reason to do so, and that ideally should be a business reason not a technical one.

From a SharePoint perspective there is one reason to not use auto generated certificates and rather use Domain Certificates, and that is that we won’t have to import them into the SharePoint Certificate Store. But what we gain in ease of configuration on the SharePoint side we lose with complexity on the Workflow Manager side. Everything is a trade-off.

The main reason you may need to change up the certificates is neither a technical nor a business reason. It’s a political one. In many organisations there will be a policy in place which prohibits the use of self signed or auto generated certificates, and that all certificates should be those issued by the organisation’s CA. Generally it’s not the place of a SharePoint practitioner to challenge such a policy and so this is a very common scenario. Interestingly many of these organisations with such a policy are perfectly happy to run their SharePoint web applications over HTTP, but that’s a conversation for another time!

Let’s assume you have a valid reason for going with Domain CA certificates. We can of course leverage the Workflow Manager Configuration to use them. However we have to take care of exporting and importing them across the three machines. The Workflow Manager wizard and/or Windows PowerShell will not take care of that for us.

But the thing here frankly, is the key area where the documentation is just plain hocus pocus.

Multi-Node Farm Certificate Requirements

If you select an existing certificate, you must make sure that the certificate is a domain certificate. A domain validated SSL is a digital certificate in which the validated identifying information of the certificate is limited to the domain name and works across any machine in the domain. For example, the subject name of the certificate has a value of *.domain.

That friends, is complete and utter claptrap. “Domain Certificate” – what does that even mean?! Most people who know about certificates take that to mean one issued by a Domain CA. We can create a valid domain certificate with a common name of claptrap.fabrikam.com. But it won’t work in any of the situations we need with Workflow Manager.

It gets worse when it starts talking to “domain validated SSL”. Domain validated certificates are a way certificate vendors can provide you a certificate more quickly – i.e. there is no need for articles of incorporation to be supplied. All that is needed is to respond to an email of phone call. Not secure. And more importantly, absolutely nothing whatsoever to do with what we need here. The writers have basically got their terms confused and their Bing foo let them down. Sure a domain validated public SSL certificate would work, but this is not what we should be doing. And a domain validated certificate isn’t issued by a Domain CA anyway so this contradicts the first sentence!

The last part (the example) is a bit closer to the reality.

Because the certificate needs to support both the load balanced address of the farm and the individual host names of the farm members, we can use one of three approaches for the certificate:

  • A wildcard certficate (e.g. *.fabrikam.com)
  • A Subject Alternative Name (SAN) certificate which includes *.fabrikam.com
  • *** A Subject Alternative Name (SAN) certificate which includes the host names for every server in our Workflow Manager Farm ***

The second option being what the actual product has chosen to use for its auto generated certificates, and easily creatable in a Windows Domain Certificate Authority.

*** The third option could be considered preferable, and again is easily creatable in a Windows Domain Certificate Authority. However this will NOT work with Workflow Manager and SharePoint. We will be able to create and publish workflows but they will not be able to execute due to a misconfiguration.

I hope that someday soon the nasty snippet above is updated.

 

 

Connecting Workflow Manager and SharePoint 2013 Farms

Assuming we have all our ducks in a row with respect to certificates and load balancing (and HA!) all that is left is to create a Workflow Service Connection in our SharePoint farms. Well not quite…!

First up our SharePoint Content Web Applications should be SSL. Yup I know you don’t want to hear it, but there’s no point in securing one end of the S2S trust if the other end is wide open. SSL everywhere is the name of the game here if you are serious about security. Remember the Universal Firewall Bypass Protocol is the key technology enabling “the Cloud”!

We also need to install the Workflow Manager Client and we must import the root of the Workflow Manager SSL Services Certificate into the SharePoint Certificate Store either via New-SPTrustedRootAuthority or Manage Trusts if the certificate is not Domain CA issued.

Once that’s in place we then need to run the Register-SPWorkflowService cmdlet and pass in some SharePoint site and a WorkflowHostUri. There is more to it for multi tenant and or environments with multiple proxy groups, but that is a topic for another day.

What is important to know here is that in true SharePoint fashion, the cmdlet is pretty broken. When we run the command, the end result will be a Workflow Service Application Proxy. The problem is that even if the Register-SPWorkflowService cmdlet fails to complete (for example SSL Certificate Validation failure, or otherwise invalid WorkflowHostUri) the Service Application Proxy will still be created.

Yup! A broken Service Application Proxy. So we need to be on top of this and make sure we validate success ourselves. Clearly this was not Snover’s intent with Windows PowerShell, but it is what it is. We can click the broken proxy in Manage Service Applications to view a status page, or better yet open a site in SharePoint Designer to see if SharePoint 2013 Worflows are available to be created.

 

User and Service Accounts

Much misinformation and guidance exists around the user and service accounts, and groups required for Workflow Manager. Commonly two accounts are detailed, the Configuration user and the RunAs User.

The Configuration User (also known as the Setup User) is simply the account used to install and configure the Workflow Manager farm. There is absolutely no requirement whatsoever to create a dedicated user account for this. Obviously one can do so if needed, think of this like the SharePoint “setup” user, which is often promoted as a “best practice” but is nothing of the sort. Regardless of if you are using a dedicated account here, or making use of an existing account, for a three server farm it must be a domain account, which is a local machine administrator of the three servers on which you will install and configure Workflow Manager. This account must also be a Security Administrator and Database Creator on the SQL Server instance you will be using to host the Workflow Manager related databases.

The RunAs User is the service account identity of the Workflow Manager services. For a three server farm again this must be a domain account. The necessary permissions for this account are applied when the Workflow Manager farm is created. Other than creating the service account and specifying it during the farm creation, nothing else is needed. If your organisation implements password policy for service accounts, there is a specific procedure using Windows PowerShell to update credentials within the farm.

The Admin Group is a group containing the principals whom will administer the Workflow Manager farm (for example by running Windows PowerShell cmdlets). By default this will be the built in Administrators group on each server in the farm. That works just great as long as you administer the farm using an administrator account! If you wish to have more control across a three server farm, this should be (not must be) a Domain Group and obviously contain the user accounts needed. Under no circumstances whatsoever is it necessary to add SharePoint Service Accounts into this group. That is just bogus guidance.
 

Conclusion

So there you have it, the core concepts along with considerations for high availability, certificates and even a little bit of SharePoint. The key take away here is that it is not necessary to use Domain CA certificates (or any other form of external certificates) to deploy a highly available SSL Workflow Manager Farm. In the next parts, we will walk through end to end configuration examples.

Until then, happy workflowing!

 

s.