Piotr

Jun 11, 2026

(1 votes)

Unlock Experimentation with Content Variations in CMS 13

Part 1 argued that Content Variations is the CMS 13 feature that didn't get the keynote but should have. This is the follow-up: wiring those variations to Optimizely Feature Experimentation, measuring whether the traffic split actually holds, and serving the same experiment from two heads (classic MVC and headless Next.js) without the experiment noticing. Everything below ran on a real CMS 13.0.2 instance with FX SDK 4.3.0; the numbers are from those runs, not from a slide.

Part 1 ended with variations sitting in the CMS, published, versioned, queryable. Nice. Also useless, commercially, until something decides who sees which arm and whether it moved a number. The CMS deliberately has no opinion about that. There is no traffic-splitting UI anywhere near the Variations dropdown, and the absence is a design decision, not a gap. Delivery belongs to an experimentation engine.

The entire integration between the two products is one string.

FX VariationKey = CMS Content Variation name

That's the entire contract. FX's Decide() returns a variation key; you load the CMS variation with the same name; if the names drift apart, nothing throws and visitors just quietly get the master page. Everything else in this article is plumbing around that one equality, and most of the sharp edges live exactly where you'd expect: in what happens when the equality silently fails.

Who does what

Three roles, three tools. Settle the division of labor before the code, because it's the actual selling point:

Role	Tool	Job
Editor	CMS Variations dropdown	Creates and publishes the content arms. Never sees the FX dashboard.
Marketer / analyst	FX dashboard	Owns the rule: traffic split, ramp, pause, Results. Never touches the CMS.
Developer	This article	Wires the plumbing. Once.

After the wiring ships, the next experiment needs no developer at all: an editor publishes new arms, a marketer points a rule at them, and the string contract does the rest. Stopping a test is one click in the dashboard, no deploy, no content rollback - the arms stay in the CMS, versioned, ready for the next round. If you have to sell this internally, that's the sentence to take into the budget meeting.

The shape of the experiment

One flag, demo_ab_test. Three FX variations: original, variant_a, variant_b. Two CMS Content Variations: variant_a and variant_b.

Two, not three. The control arm gets no CMS variation. When FX returns original, the loader finds no variation by that name and falls back to the master page. The master is the control. Creating a third CMS variation called original is the first mistake I see teams make: it duplicates the master's content, and the two copies drift apart the first time an editor fixes a typo in one of them. Your control arm is then no longer your canonical page, and every lift number you report afterwards is quietly wrong. Let the fallback do its job.

From the editor's chair, the whole feature is part 1's workflow with one naming rule attached: Variations dropdown, New Variation, name it exactly variant_a - the key is case-sensitive, and Variant_A buys you the silent-master failure described below - then change the content and publish. Each arm publishes independently, so an unpublished draft of variant_b simply means that arm's visitors keep seeing master until it ships. No FX login, no flag keys, no deploys. The naming rule is the only place an editor can break the experiment, which makes it the one thing worth putting in their runbook.

Flag list in the FX dashboard — The demo_ab_test flag in the FX dashboard.

The variation keys are defined on the flag, and each arm has its own flag on/off toggle:

Variation keys on the flag - these names must match the CMS variation names exactly.

That per-variation toggle is the second quiet trap. The serving code treats Enabled == false as "no decision, serve master." If variant_a has its toggle off, a third of your traffic is bucketed into an arm that renders the control while the dashboard reports them as exposed to the variant. All three arms stay on; the rule, not the variation toggle, is your kill switch.

An A/B Test rule won't save without a metric, so an event has to exist first, even if you wire conversions later:

Creating the conversion event — The conversion event has to exist before the rule will save.

Then the rule itself: audience Everyone, traffic allocation 100%, distribution mode Manual at 33.33/33.33/33.34, baseline original:

The A/B Test rule with manual distribution — The A/B Test rule: Everyone, 100% allocation, manual 33.33/33.33/33.34 distribution.

Most write-ups skip one field on the ruleset entirely: "Then, for everyone else." It's the fallthrough for visitors who don't match the rule, either because of the audience or because they fall outside the traffic allocation. At Everyone/100% it never fires, but it documents intent for the day someone ramps the test down to 50%. Set it to Off. Off means Enabled = false means master means control. Setting it to original works (the loader falls back identically), but those visitors get reported as if the experiment decided for them, which pollutes the boundary between "in the test" and "not in the test." Off keeps the boundary honest.

Ruleset with the everyone-else fallthrough — The "Then, for everyone else" fallthrough - set it to Off to keep the in-test boundary honest.

Serving it: one decision, one load, one fallback

The controller-side code is small, and it should be. Controllers depend on an SDK-agnostic abstraction, one Decide per flag, one user context per request:

public IActionResult Index(DemoPage currentPage)
{
    var decision = _decisions.Decide("demo_ab_test");
    var variationKey = decision.Enabled ? decision.VariationKey : null;

    var pageToRender = string.IsNullOrEmpty(variationKey)
        ? currentPage
        : _loader.Load<DemoPage>(currentPage.ContentLink, variationKey) ?? currentPage;
    // ...
}

The loader is the same VariationLoaderOption mechanism from part 1, wrapped with a fallback:

var options = new LoaderOptions { VariationLoaderOption.With(variationKey) };
if (contentLoader.TryGet<T>(link, options, out var variant) && variant is not null)
    return variant;
return contentLoader.TryGet<T>(link, out var master) ? master : null;

The decision service: the only class that knows the SDK

_decisions above is deliberately not the Optimizely client. It's a thin abstraction, and the SDK types live in exactly one implementation behind it. That's less about architectural piety than about two practical facts: controllers become trivially testable (fake a FlagDecision, done), and when SDK 5.x changes a signature you edit one file.

The implementation is where the docs' most important serving-side rule lands: one user context per request, shared by every decision in that request. The SDK's OptimizelyUserContext carries the visitor ID and attributes; creating it once per request (the service is DI-scoped) means the banner flag, the experiment flag, and the conversion all agree on who the visitor is:

public class FeatureDecisionService(
    OptimizelyClient optimizely,
    IVisitorIdProvider visitorIdProvider,
    IVisitorAttributesProvider attributesProvider) : IFeatureDecisionService
{
    private OptimizelyUserContext? _userContext;

    public FlagDecision Decide(string flagKey)
    {
        var userContext = EnsureUserContext();
        if (userContext is null)
        {
            return new FlagDecision(false, null, null,
                ["User context could not be created - SDK client unusable (e.g. missing datafile)."]);
        }

        var decision = userContext.Decide(flagKey, [OptimizelyDecideOption.INCLUDE_REASONS]);
        return new FlagDecision(
            decision.Enabled,
            string.IsNullOrEmpty(decision.VariationKey) ? null : decision.VariationKey,
            decision.Variables?.ToDictionary(),
            decision.Reasons);
    }

    public void Track(string eventKey) => EnsureUserContext()?.TrackEvent(eventKey);

    private OptimizelyUserContext? EnsureUserContext()
        => _userContext ??= optimizely.IsValid
            ? optimizely.CreateUserContext(visitorIdProvider.GetOrCreateVisitorId(), BuildAttributes())
            : null;
}

A few lines in there do more work than they look like. The IsValid guard: an SDK client without a datafile still hands out user contexts whose every Decide is an error decision; checking validity up front turns "mysteriously always master" into an explicit fallback with a reason string. INCLUDE_REASONS is one of five documented decide options (DISABLE_DECISION_EVENT, ENABLED_FLAGS_ONLY, IGNORE_USER_PROFILE_SERVICE, EXCLUDE_VARIABLES are the others) and the only one this demo needs. DISABLE_DECISION_EVENT is the other one I reach for in real projects: it asks for a flag's value without recording an impression, for the places where rendering logic needs the flag but the visitor shouldn't count as exposed. And TrackEvent rides the same context, which is the entire attribution story: no variation key travels with the conversion, ever. The stats engine joins exposure to conversion purely on the visitor ID.

Audience attributes enter at context creation too. The demo derives them from the request instead of hardcoding values - device class from the User-Agent, country from CDN geo headers with an Accept-Language fallback:

return new Dictionary<string, object?>
{
    ["device"]   = ResolveDevice(request),   // "mobile" | "tablet" | "desktop"
    ["location"] = ResolveLocation(request)  // "PL", "SE", ... or "unknown"
};

With those flowing, an FX Audience like device = mobile targets a rule with zero code changes; the rule evaluates against whatever attributes the context carried. One honesty note the docs won't volunteer: geo headers like CF-IPCountry are only trustworthy when your edge injects them and strips client-supplied values. Exposed directly, they're attacker-supplied input, and anyone can self-select into your geo audience with a curl flag. Fine for a demo; gate it behind your CDN in production.

Three failure modes funnel into the same outcome, and all three are invisible in production:

What happened	What FX reports	What the visitor sees
Flag off / rule paused / no datafile	no decision	master
FX returned original (control)	exposed to original	master - by design
FX returned variant_a but the CMS variation was renamed or deleted	exposed to variant_a	master - silently wrong

The third row is the one that should worry you. The experiment keeps collecting data, the dashboard keeps attributing conversions to variant_a, and every visitor in that arm is actually looking at the control. Nothing throws. No log line by default (more on the SDK's default logger below). The only place the truth surfaces is Decide's INCLUDE_REASONS output, which is why the demo renders a development-only diagnostics panel with the visitor ID, the resolved attributes, and the SDK's own reasoning: which rule matched, how the visitor was bucketed, or why no decision was made. Build that panel before you need it. It converts "the page always shows master and nobody knows why" from an afternoon of debugging into a single glance.

Bucketing stickiness rides on a first-party cookie (CMS_VisitorId, one year, HttpOnly). One non-obvious attribute: SameSite=None; Secure. The CMS editor previews pages in a cross-origin iframe, and without SameSite=None the browser drops the cookie there. Every preview request then mints a new visitor and a possibly different arm, and your editors file a bug that "the page keeps flickering between variants." The experiment is fine; the cookie policy isn't.

Two server hygiene points that the demo enforces and most tutorials skip: bucketed responses carry Cache-Control: no-store (a shared cache serving one visitor's arm to everyone collapses the split), and this head's conversion endpoint validates an antiforgery token - a need specific to the MVC head, because its SameSite=None cookie rides along on cross-site POSTs. An experimentation demo whose results can be inflated by a hidden form on someone else's page is not a demo you want to give. The headless head solves the same problem differently; details where they belong, in the Next.js section.

From prose to proof, again: does 33/33/34 actually hold?

Part 1 measured storage and cache behavior. The follow-up question here is more basic and more embarrassing if you skip it: does the traffic actually split the way the rule says? "Trust the hash function" is a fine answer until you're presenting and someone asks how you know.

The check is an xUnit test that fires N independent requests at the running page without cookies (the server mints a fresh visitor ID per request, so every request is a fresh bucketing), parses which arm each response declares, and runs a chi-square goodness-of-fit test against 33.33/33.33/33.34.

Three runs at n = 300 against the MVC head:

Run	original	variant_a	variant_b	χ² (df = 2)
1	90 (30.0%)	106 (35.3%)	104 (34.7%)	1.518
2	104 (34.7%)	82 (27.3%)	114 (38.0%)	5.352
3	105 (35.0%)	105 (35.0%)	90 (30.0%)	1.506

All pass. Run 2 is the instructive one: a 27.3% arm against an expected 33.3%, six points off, and the test is right not to flag it. At n = 300 one standard deviation per arm is ≈ 2.7 percentage points, so a six-point swing is unremarkable. If your gut said "27% means the split is broken," your gut would have failed you in front of an audience. That's what the statistics are for.

Steal one calibration decision from this test: the pass threshold is α = 0.001 (critical value 13.816), not the textbook 0.05. At α = 0.05 a perfectly configured rule fails the test one run in twenty by construction, which is terrible odds for a check you might run live. The power loss is irrelevant at this effect size: a genuinely broken split like 50/25/25 produces χ² ≈ 37.5 at n = 300 and sails past either threshold. Strict alpha costs nothing and buys you a test that never cries wolf on stage.

Threats to validity, disclosed as ever: the test verifies the FX bucketing distribution, not content delivery. The page reports the arm FX assigned; if the CMS variation failed to load (the silently-wrong row above), the test still passes while every visitor sees master. It's a distribution check, not an end-to-end content assertion; the diagnostics panel covers the other half.

Conversions close the loop: a CTA posts to an endpoint that calls TrackEvent("demo_conversion") for the same visitor context, and the stats engine attributes it to whatever arm that visitor occupies. No variation key is sent; attribution is entirely on the visitor ID, which is exactly why the cookie's stability matters more than any other moving part here.

The experiment collecting decisions and conversions in the FX Results view.

Plumbing the docs assume you'll remember

Three pieces of lifecycle wiring make the difference between a demo that survives a fortnight and one that fails the morning of the presentation. None of it is exotic and all of it is documented, but each item is easy to skip because the happy path works without it.

The datafile is a living document. Every decision the SDK makes is computed locally against the datafile, a JSON snapshot of your flags, rules, and traffic allocations served from the CDN. OptimizelyFactory.NewDefaultInstance(sdkKey) sets up background polling (five-minute default) so dashboard changes propagate without a deploy; for latency-sensitive setups the documented upgrade is a webhook that pings your app the moment the datafile changes. The corollary people miss: pausing an experiment in the dashboard is itself a datafile change. Whatever your refresh path is, that's also how fast your kill switch is.

Events are batched, and shutdown is part of the contract. TrackEvent doesn't call home synchronously. The C# SDK queues impressions and conversions through a batch processor (ten events or thirty seconds, by default) to keep request latency flat. The documented flip side is that you must dispose the client on shutdown so the tail of the queue flushes. In ASP.NET Core that costs nothing if you let it: register the client as a DI singleton and the container disposes it, queue flush included, when the host stops:

services.AddSingleton<OptimizelyClient>(provider =>
{
    var fxLogger = provider.GetRequiredService<ILoggerFactory>().CreateLogger("OptimizelyFX");
    var sdkLogger = new OptimizelySdkLogger(fxLogger);

    if (string.IsNullOrWhiteSpace(sdkKey))
    {
        fxLogger.LogWarning("Optimizely:SdkKey is not configured - fallback mode, all flags report disabled.");
        return new OptimizelyClient(datafile: null, logger: sdkLogger); // invalid on purpose: no polling, no blocking
    }

    OptimizelyFactory.SetLogger(sdkLogger);
    return OptimizelyFactory.NewDefaultInstance(sdkKey);
});

Logs don't exist until you wire them. That SetLogger call looks decorative. It is the opposite. The C# SDK's shipped default logger formats every message (datafile fetch failures, unknown event keys, audience evaluation warnings) and throws the result away. The adapter is ten lines, and it's the difference between reading the cause in your console and debugging blind:

public class OptimizelySdkLogger(ILogger logger) : OptimizelySDK.Logger.ILogger
{
    public void Log(SdkLogLevel level, string message)
        => logger.Log(level switch
        {
            SdkLogLevel.DEBUG => MelLogLevel.Debug,
            SdkLogLevel.INFO  => MelLogLevel.Information,
            SdkLogLevel.WARN  => MelLogLevel.Warning,
            SdkLogLevel.ERROR => MelLogLevel.Error,
            _ => MelLogLevel.Information
        }, "{Message}", message);
}

The same experiment, headless

Part 1 called the Graph angle "where it gets interesting." Here's the cash value. The repo carries a second head, Next.js on the App Router, serving the same experiment from the same flag against the same content, and the part that surprised people I showed it to is what I didn't have to do: nothing. No shared session service, no API between the heads, no coordination.

It works because two independent guarantees compose:

FX bucketing is a pure function. Every SDK - C#, JS, all of them - runs the same MurmurHash over visitorId + experiment. Same visitor ID in, same arm out, on any stack.
Graph serves variations by name (part 1's contract: opt-in variation argument, arms addressed via _metadata.variation).

The Next.js side is small enough to show whole. Middleware mints or forwards the same CMS_VisitorId cookie and hands the fresh ID to the current render via a request header, because the response cookie isn't visible to the request that set it:

export function middleware(request: NextRequest) {
  const existing = request.cookies.get('CMS_VisitorId')?.value;
  if (existing) return NextResponse.next();

  const visitorId = crypto.randomUUID();
  const headers = new Headers(request.headers);
  headers.set('x-visitor-id', visitorId);

  const response = NextResponse.next({ request: { headers } });
  response.cookies.set('CMS_VisitorId', visitorId, {
    httpOnly: true, sameSite: 'lax',
    secure: process.env.NODE_ENV === 'production',
    maxAge: 60 * 60 * 24 * 365, path: '/',
  });
  return response;
}

One deliberate difference from the MVC head, before someone files it as a bug: this cookie is SameSite=Lax, not None. The .NET head needs None because the CMS editor renders it inside a cross-origin preview iframe; this head doesn't do editor preview (a scope cut, not an accident), and for a top-level site Lax is the safer default - one more wall against the cross-site POST problem from earlier. The day you wire this head into the editor's preview, flip it to sameSite: 'none', secure: true and run local dev over HTTPS (next dev --experimental-https or a proxy), because browsers refuse SameSite=None without Secure.

CSRF splits along the same line. The conversion endpoint here is a plain Route Handler, and Route Handlers get no CSRF protection for free - that's a Server Actions perk (Next.js compares Origin against Host for those automatically). So the handler validates Sec-Fetch-Site/Origin itself before calling trackEvent, with the Lax cookie as the first wall and the header check as the second. Where the MVC head needed an antiforgery token because of its None cookie, this head needs an origin check because of its handler type. Same threat, two idiomatic answers.

A Server Component then calls the JS SDK: same flag, same INCLUDE_REASONS, plus one option the C# factory gives you for free and the JS SDK does not - datafileOptions.autoUpdate. Without it, createInstance fetches the datafile exactly once and a long-running server keeps serving stale flag state until the next restart:

const client = createInstance({
  sdkKey,
  datafileOptions: { autoUpdate: true, updateInterval: 60_000 },
});

const decision = client
  .createUserContext(visitorId)
  .decide(flagKey, [OptimizelyDecideOption.INCLUDE_REASONS]);

One scoping caveat on autoUpdate: it's a background timer, so it assumes a long-running Node.js process - a container, next start on a VM. On ephemeral serverless functions the timer dies with the invocation and every cold start fetches its own datafile; in that world the right delivery path is the webhook-into-Edge-Config pattern from the edge section below, not polling.

And the Graph query asks for the named arm with the disciplined form - include: SOME, the original kept in the result as the fallback:

query DemoPage($variation: [String!]) {
  _Content(
    where: { _metadata: { types: { in: ["DemoPage"] } } }
    variation: { include: SOME, value: $variation, includeOriginal: true }
  ) {
    items {
      _metadata { key displayName variation }
      ... on DemoPage { DemoTitle }
    }
  }
}

When the requested arm is missing from the index, the original arm in the same response is the fallback - the headless mirror of the MVC loader's TryGet-then-master. The selection itself is two lines, and the arm is identified strictly by _metadata.variation, never by dissecting an item ID:

const requested = items.find((i) => i._metadata?.variation === variationKey);
const original  = items.find((i) => !i._metadata?.variation);
const item = requested ?? original;

Same failure semantics on both stacks, which is the property you actually want, because your experiment analysis shouldn't have to care which head served the impression.

Verifying the variation arms in GraphiQL — Verifying the variation arms come back by name in GraphiQL.

The same chi-square test against the Next.js head (n = 300) came back original 106, variant_a 90, variant_b 104, χ² = 1.518. Pass. Two runtimes, two languages, two delivery mechanisms, one distribution, because it's one hash.

Authentication, since this is the part audits ask about: the Next.js server is a Backend-for-Frontend. Graph queries carry the read-only single key in the Authorization: epi-single header (not in the URL, where it lands in every access log), and the key lives in a server-only environment variable that never reaches the client bundle. The HMAC app key and secret exist solely in the CMS's user-secrets for the sync job. The FX SDK key likewise never ships to the browser; decisions happen server-side. Single key for delivery, HMAC for management, neither in the frontend: that's the documented split, and it survives a security review.

Sharp edges, part 2: all of these drew blood

Same tradition as part 1: none of these are hypotheticals. Each cost me real time on a real instance; they're ordered by how much.

The CMS 12 Graph package kills CMS 13 at startup. Optimizely.ContentGraph.Cms 4.4.1 restores fine against CMS 13 (NuGet only warns, NU1608), then dies on boot with a TypeLoadException on PropertyContentArea. CMS 13 wants the version-matched Optimizely.Graph.Cms package (13.0.2 against CMS 13.0.2; the dependency pins are exact, so the package version tracks the CMS version). The registration is services.AddContentGraph() from Optimizely.Graph.DependencyInjection. Treat NU1608 on an Optimizely package as an error, not a warning.

The C# SDK's default logger is a no-op. Not "logs to a place you forgot to check": the shipped DefaultLogger.Log formats the message and discards it. A wrong SDK key, a blocked CDN, an unknown event key in TrackEvent: all invisible. Ten lines of adapter forwarding OptimizelySDK.Logger.ILogger into Microsoft.Extensions.Logging, registered via OptimizelyFactory.SetLogger() before creating the client, and the SDK starts telling you things. Do this on day one.

The JS SDK fetches the datafile once. The C# factory polls in the background by default; the JS createInstance does a one-time fetch unless you pass datafileOptions: { autoUpdate: true }. Without it, your Next.js server keeps serving yesterday's flag state until someone restarts it, and "pause the experiment" in the dashboard pauses nothing.

The empty-SDK-key fallback is a trap with the factory. Feed OptimizelyFactory.NewDefaultInstance a placeholder key and you get a real polling manager hammering the CDN for a datafile that doesn't exist (HTTP 403, every interval, forever), and the first Decide blocks the full default fifteen-second readiness timeout before giving up. First page view after deploy hangs fifteen seconds; everything after is fine; nobody can reproduce it. Construct an intentionally invalid client when the key is absent and Decide fails fast instead.

The sync job and the disabled scheduler. Dev environments routinely run SchedulerOptions.Enabled = false. Content publishes index into Graph incrementally via events, but the initial full sync is a scheduled job, which never fires on a disabled scheduler. Symptom: Graph authenticates, the schema is live, _Content returns total: 0, and every layer of your stack is working correctly. Run the synchronization job manually once from Admin → Scheduled Jobs.

Don't parse the item ID. The pre-release docs show variation items with IDs shaped like Guid_Status_Language_VariantKey. That's an example, not a contract. The supported address is _metadata.variation; the original arm is the item where it's null. String-splitting the ID is the kind of thing that works for eleven months and then doesn't.

What I deliberately left out

Scope is a feature. Here's what this build skips and why, so you can disagree on purpose rather than by accident.

User Profile Service. Out of the box, bucketing is sticky because the hash is deterministic: same visitor ID, same arm. But determinism has limits the docs are upfront about: change the traffic allocation mid-flight and some visitors re-bucket. UPS is the documented fix, a key-value store the SDK consults before hashing, pinning each visitor's first assignment forever. It's the right call for long-running production experiments; for a demo it's a database for a problem you don't have yet.

Allowlists and forced decisions. The rule editor lets you pin specific user IDs to specific arms (up to fifty), and the SDK has forced-decision APIs on top. Invaluable for QA ("make me always see variant_b"), and I'd wire it into any real project's test plan. Here, the New visitor button (delete the cookie, re-roll the dice) covers the demo's need with zero configuration.

Edge decisions. The architecture above decides on the server, per request. The documented next step for latency-obsessed setups is pushing the datafile to the edge (Vercel's reference implementation pipes it through a webhook into Edge Config) and bucketing in middleware, before the request touches your origin. Same SDK, same hash, same arms, just earlier. It composes cleanly with everything in this article precisely because the decision logic has no server-side state to migrate.

The take

The pitch for this pairing is genuinely short. The CMS owns the what: versioned, published, auditable content arms that editors manage with the tools they already know. FX owns the who and whether: deterministic assignment, exposure counting, a stats engine that survives scrutiny. The integration surface between two entire products is a string equality, and the same experiment served an MVC head and a headless Next.js head with zero coordination code, holding 33/33/34 within sampling noise on both.

For the business reader who skimmed to the end: this means an A/B test on real CMS content no longer requires a front-end rebuild, a personalization middleware, or a tag-manager hack that your security team hates. Editors make the arms; the dashboard runs the test; either rendering stack, today's or the one you migrate to next year, serves it unchanged. The experiment outlives your architecture decisions. That's the durable part.

The fragile part is everything in the sharp-edges list, and one sentence summarizes all of it: the integration fails silent, never loud. Wrong package, wrong key, renamed variation, stale datafile, unsynced index: every failure mode degrades to "everyone sees master" with green status everywhere. Wrap it in the diagnostics panel, the startup warnings, the distribution test. The contract is one string; the engineering is making sure you notice when the string stops matching.

Quiet hero, part 2: now with a stats engine. Still quiet. Still worth it.

The github repo: cms13-fx-content-variations.

Comments

Please login to comment.