Scalability and Throughput Features
Scalability and throughput requirements of services hosted
on a client machine, versus those deployed to server environments, are not equal.
Services hosted in-process are initialized and invoked on demand; those hosted
on client machines at best may be consumed by multiple client threads. Services
deployed to server machines either Web servers exposed to the Internet or
servers behind the firewall that satisfy intranet clients can expect to serve
a significantly higher number of concurrent requests. The number of requests
may be predictable if the number of clients is controlled, or may increase in
exponential proportions due to a much wider client-base with potential for
continued growth.
Ideally, your services will always be ready to process
incoming requests and juggle the expected load, while not maxing out host
machine resources and crippling the system. WCF features that support this need
include instancing mode, concurrency mode, and throttling behaviors. As I
discussed in WCF Service Instancing , instancing mode controls the lifetime of
each service instance, letting you allocate an instance per call, per session,
or a single instance for all clients. Concurrency mode controls how and if each
individual service instance allows concurrent calls, which can affect
throughput. Throttling behaviors allow you to control the request load to each
service, restricting the number of concurrent calls, the number of sessions
allocated, and the number of service instances.
Concurrency Mode
Concurrency issues arise when multiple threads attempt to
access the same resources at run time. When requests arrive to a service, the
service model dispatches the message on a thread from the thread pool.
Certainly, if multiple clients call the same service, multiple concurrent
request threads can arrive for a service. The particular service object
handling each request is based on the instancing mode for the service. For
PerCall services, a new service object is granted for each request. For
PerSession services, the same service object receives requests from the same
client (or, proxy). For Single instancing mode, all client requests are sent to
the same singleton service object. Based on this alone, PerSession services are
at risk of concurrent access when the client is multithreaded, and Single
services are perpetually at risk.
The concurrency setting for a service is controlled by the
ConcurrencyMode property of the ServiceBehaviorAttribute. By default, only one
request thread is granted access to any service object, regardless of the
instancing mode; this is because the default setting for ConcurrencyMode is
Single, as shown here:
[ServiceBehavior(ConcurrencyMode=ConcurrencyMode.Single)]
public class MessagingService : IMessagingService
This property can be set to any of the following
ConcurrencyMode enumeration values:
- Single. A single request thread has access to the service object at a given time.
- Reentrant. A single request thread has access to the service object, but the thread can exit the service to call another service (or client callback) and reenter without deadlock.
- Multiple. Multiple request threads have access to the service object and shared resources must be manually protected from concurrent access.
The following sections briefly describe each mode, and
discuss their relevance to Web service deployments.
Single Concurrency Mode
By default, services are configured for Single concurrency
mode. This means that a lock is acquired for the service object while a request
is being processed by that object. Other calls to the same object are queued in
order of receipt at the service subject to the client s send timeout or the
service s session timeout, if applicable. When the request that owns the lock
has completed, and thus released the lock, the next request in the queue can
acquire the lock and begin processing. This configuration reduces the potential
throughput at the service, when sessions or singletons are involved, but it
also yields the least risk for concurrency issues.
Configuring services for Single access doesn t impact
PerCall services because a new service instance is allocated for each request,
as shown in Figure 1.
Figure 1: PerCall instancing mode with Single concurrency.
For PerSession services, Single concurrency disallows
multiple concurrent calls from the same (multithreaded) client, while not
impacting throughput of multiple clients (see Figure 2); for Single instancing
mode, only one request can be processed across all clients (see Figure 3).
Figure 2: PerSession instancing mode with Single concurrency.
Figure 3: Single instancing mode with Single concurrency.
As I ve said, when you expose WCF services over HTTP as Web
services, chances are you ll be using PerCall configuration. Sessions for WCF Web
services are usually better facilitated by persisting data between calls to a
database, rather than using an application session (which is not durable). That
means the default concurrency mode setting of Single will not reduce the
potential throughput of requests to your application.
Reentrant Concurrency Mode
Reentrant mode is necessary when a service issues
callbacks to clients, unless the callback is a one-way operation. That s
because the outgoing call from service to client would not be able to return to
the service instance without causing a deadlock. This mode is also necessary
when services call out to downstream services, which implies returning to the
same service instance.
Services configured for Reentrant concurrency mode behave
similarly to Single mode, in that concurrent calls are not supported from
clients; however, if an outgoing call is made to a downstream service or to a
client callback, the lock on the service instance is released so that another
call is allowed to acquire it. When the outgoing call returns, it is queued to
acquire the lock to complete its work. Figure 4 illustrates how PerCall
services would behave with and without reentrancy for non-one-way callbacks. In
this case, the only thread that might need to reenter the service is likely an
outgoing callback. Likewise, if the service were to call services downstream
that later attempted to call back into the top-level service, reentrancy would allow
it (however, it is poor design to have circular service references).
Figure 4: Comparing PerCall instancing mode with Single or Reentrant concurrency on non-one-way calls.
Because each request thread gets its own service instance,
callbacks are the primary scenario that applies to your PerCall Web services.
Thus, if you are using WSDualHttpBinding and your callbacks aren t one-way, you ll
set the concurrency mode to Reentrant. You should also pay close attention to
calls to downstream services that may need to call back to upstream services.
Multiple Concurrency Mode
Services configured for Multiple concurrency mode allow
multiple threads to access the same service instance. In this case, no locks
are acquired on the service instance and all shared state and resources must be
protected with manual synchronization techniques. This setting is useful for
increasing throughput to services configured for PerSession and Single
concurrency mode.
Instance Throttling
To increase throughput at the service, multiple concurrent
calls must be allowed to process. PerCall services can support multiple
concurrent calls by default because each call is allocated its own service
instance. PerSession and Single mode services can allow multiple concurrent
requests when configured for Multiple concurrency mode. However, regardless of
the concurrency mode, server resources are not generally capable of servicing
an unlimited number of concurrent requests. Each request may require a certain
amount of processing, memory allocation, hard disk access, network access, and
other overhead.
WCF provides a throttling behavior to manage server load
and resource consumption (with the following properties):
- MaxConcurrentCalls. Limits the number of concurrent requests that can be processed by all service instances. The default value is 16.
- MaxConcurrentInstances. Limits the number of service instances that can be allocated at a given time. For PerCall services, this setting matches the number of concurrent calls. For PerSession services, this setting matches the number of active session instances. This setting doesn t matter for Single instancing mode, because only one instance is ever created. The default value for this setting is 2,147,483,647.
- MaxConcurrentSessions. Limits the number of active sessions allowed for the service. This includes application sessions, transport sessions (for TCP and named pipes, for example), reliable sessions, and secure sessions. The default value is 10.
Each of these settings is applied to a particular service
configured through its ServiceHost instance (associated to the .svc file when
hosting with IIS or WAS). To set these values declaratively you associate a
service behavior and add the <serviceThrottling> section. Figure 5 shows
a service behavior with the default throttling values.
<system.serviceModel>
<services>
<service
name="Counters.CounterService"
behaviorConfiguration="serviceBehavior">
<endpoint
address="CounterService" binding="basicHttpBinding"
contract="Counters.ICounterService" />
</service>
</services>
<behaviors>
<serviceBehaviors>
<behavior
name="serviceBehavior">
<serviceThrottling maxConcurrentCalls="16"
maxConcurrentInstances="2147483647"
maxConcurrentSessions="10" />
</behavior>
</serviceBehaviors>
</behaviors>
</system.serviceModel>
Figure 5: Default
service throttling values.
The appropriate settings for throttling behavior depend on
a number of factors, including the instancing mode for the service, the number
of services exposed by the application, and the desired outcome of throttling.
In the next sections I ll discuss throttling in the context of these different
factors.
MaxConcurrentCalls
The throttle for MaxConcurrentCalls affects the number of
concurrent request threads the service can process to any of its exposed
endpoints. Regardless if the instancing mode is PerCall, PerSession, or Single,
this setting should be approached with the idea of limiting the number of active
threads to a particular service, which allows you to do the math and estimate
the number of requests that can be processed per second. For example, if a
PerCall service with one or more endpoints allows 30 concurrent requests, and
each request averages .2 seconds, roughly 150 requests per second can be
processed by a particular worker process (assuming IIS hosting over HTTP).
Multiply the number of worker processes and that number increases for a single
machine in your Web server tier.
If you host two services in the same application, each
allowing 30 concurrent requests, at full capacity 60 concurrent requests can
execute. As you increase the number of services, this can eventually have a
negative effect on throughput, as an increasing number of threads increase the
context switching required to execute them concurrently. For this reason you ll
want to consider the potential use of each service alongside the total number
of concurrent threads that are optimal. By the same token, you don t want to
limit the number of concurrent requests to a particular service, such that
queued requests begin to time out.
Now, what I just said about the increased number of
concurrent requests as you add services to the application applies only to WCF
services that are NOT hosted by IIS or WAS over HTTP. With IIS and WAS hosting,
ASP.NET is engaged in the processing of requests, at least to forward the
request to the WCF thread from the ASP.NET request thread. If the call is
one-way, the ASP.NET thread is released and the WCF threads will be allocated
according to the throttle setting. If the call is request-reply, WCF blocks the
ASP.NET thread while processing the thread on the WCF thread. That means that
the ASP.NET processing model is responsible for request throttling for
non-one-way calls.
Ideally, you want to reach somewhere between 350 to 500
requests per second on a single CPU. You should be able to achieve this by
allocating 30 request threads across all services, but this is not a guarantee,
as many factors can influence this outcome, including request-processing
overhead and server-machine horsepower.
MaxConcurrentSessions
Some creativity may be involved in setting the correct
throttle value for MaxConcurrentSessions. That s because sessions live longer
than requests, yet they consume more resources so they have conflicting
requirements. On the one hand, a session lives longer than a request; thus, you
don t want to prevent users from connecting to the system if you can afford to
accommodate them. On the other hand, if the nature of the session is allocating
a large amount of memory (or other resources), the server may only be able to
accommodate so many. The number of active application sessions is traditionally
low compared to the number of users in the system but if you have one million
users, at 5 percent online, that still means 50,000 sessions might be requested
at a given time.
For BasicHttpBinding and WSHttpBinding without reliable
sessions or secure sessions, this is a non-issue because sessions are not
supported for these configurations. Thus, the setting for concurrent sessions
has no impact. In the case of outward-facing PerCall services that also support
reliable sessions or secure sessions (via WSHttpBinding), the overhead of the
session is minimal compared to application sessions that could maintain significant
state. These sessions default to a 10-minute expiry, and if your service
receives close to 300 requests per second, that could mean up to 180,000
requests in 10 minutes (some percentage of which are in the same session). Even
at 5 percent, that s 9,000 concurrent sessions that might need to be supported
to allow unique clients to get in the door. The bottom line is that you must be
well aware of the usage patterns of your clients, and make sure you have the
right balance to prevent request timeouts (waiting for a new session), while
also preventing excessive use of server resources.
For application sessions or transport sessions used in a
traditional client-server scenario, the number of active sessions allowed
should be weighed against the amount of resources consumed by each session.
Ultimately, the purpose of the throttle in this case is to prevent the server
from maxing out its memory usage, or that of other limited resources consumed
by each session. Similarly, downstream services exposed over NetNamedPipeBinding
or NetTcpBinding require a transport session that is another resource that has
configurable limits on Windows systems.
MaxConcurrentInstances
The appropriate setting for MaxConcurrentInstances varies
based on the instancing mode for the service. For PerCall services it should be
equal to or greater than MaxConcurrentCalls. For PerSession services,
MaxConcurrentInstances should meet or exceed MaxConcurrentSessions where
application sessions are involved. That s because the value actually limits the
number of concurrent service instances that can be kept active to support
application sessions, which is much different than the number of concurrent,
short-lived requests. For singleton services, MaxConcurrentInstances is
irrelevant, because only one instance of the singleton is ever created.
Conclusion
Because your Web services are typically configured as
PerCall services over HTTP bindings, you should take from this discussion that
the default concurrency mode (Single) is acceptable unless callbacks are
involved. You should also have some idea how to assess the appropriate
throttling behaviors for your Web services exposed over HTTP: for concurrent
requests, by assessing expected load across all services; for concurrent
sessions, based on use of reliable or secure sessions; and for concurrent
instances, based on the setting for concurrent requests. In the rare case you
employ application sessions for services, you must also consider resource
allocation for those resources. In addition, you should be mindful of
appropriate configurations for downstream services invoked by your Web
services.
References:
No comments:
Post a Comment