import React from 'react';

/**

 */
class PlanningFAQ extends React.Component {
  /**
   * Render the planning calculator
   *
   * @return {JSX} The planning calculator
   */
  render() {
    return (
      <div
        style={{
          width: 600,
        }}
      >
        <h2>FAQ</h2>
        <b>What is this?</b>
        <div>
          <p>
            This is a calculator for planning simple A/B tests (or A/B/C tests, or...you
            get the idea). If you want to analyze a test you have already run, click on
            the &lsquo;Analysis&rsquo; link at the top. When planning a test, the most
            important question is: how long do we need to run it, or more specifically,
            how large of a sample size do we need?
          </p>

          <p>
            Or, if you have already decided on a sample size, what kind of effects will
            we be able to detect?
          </p>

          <p>
            Often, planning a test is about finding the right balance between achieving
            a meaningful sensitivity with a reasonable sample size.
          </p>
        </div>
        <b>What do you mean by &lsquo;sensitivity&rsquo;?</b>
        <div>
          <p>
            Any kind of experiment is trying to glean information from observations. If
            we want more information, we need more observations. We can actually
            quantify how many observations we need to get a certain amount of
            information, or how much information we can glean from a particular number
            of observations. We can think of the amount of information we can get from a
            particular experiment as the sensitivity of that experiment.
          </p>

          <p>
            If you have experience with statistics or A/B testing, you may be used to
            hearing this described as the <i>alternative hypothesis</i>, but I think
            sensitivity is a much more intuitive description!
          </p>
        </div>
        <b>Can I see an example?</b>
        <div>
          <p>
            Let&apos;s say we&apos;re doing a test with email subject lines. We have two
            candidate subject lines and we want to find out which one leads to more
            opens. Maybe we have been using one of these for a while, so we know it
            usually has about a 10% open rate. Or maybe they are two completely new
            subject lines, but we know that most of our emails usually have about a 10%
            open rate. One way or another, we need to speculate about what the open rate
            for one of the subject lines will be.
          </p>

          <p>
            Enter 10% as the baseline success rate in the calculator.&nbsp;
            <i>
              Note that you enter &lsquo;10&rsquo;, not &lsquo;0.1&rsquo; which some
              people who are used to Excel might do.
            </i>
          </p>

          <p>
            Then we need to think about how sensitive we want the experiment to be. Of
            course, we want it to be as sensitive as possible! But as a starting point,
            let&apos;s say we want a 1% sensitivity. That would correspond to a
            situation where the second subject line has an open rate that is 1% higher
            (in relative terms) than the first subject line. Since 10.1% is 1% higher
            than 10% (the baseline success rate), if the second subject line has an open
            rate higher than 10.1%, we want to be able to detect it.
          </p>

          <p>
            Enter 1% as the desired sensitivity in the calculator.&nbsp;{' '}
            <i>Note that you enter &lsquo;1&rsquo;, not &lsquo;0.01&rsquo;.</i> Hit the
            calculate button.
          </p>

          <p>
            We see that we need a sample size of about 2.9 million. That&apos;s a lot!
            That number is spread evenly across both groups, so we need about 1.45MM
            people in each experiment group. Now try increasing the desired sensitivity
            to 10%. How large a sample size do we need then?
          </p>

          <p>
            Suppose we decide we can only use a sample size of 100,000, spread evenly
            across both groups. In the drop-down menu, select &lsquo;Sensitivity&rsquo;
            as the thing to calculate. Enter 10% as the baseline success rate and
            100,000 as the sample size. Hit the calculate button. We see that the
            minimum detectable lift is about 5.5% and the minimum detectable drop is
            about 5.2%. Here&apos;s what that means: if the new subject line is in fact
            at least 5.5% better than the baseline, the experiment will be sensitive
            enough to detect it. Or, if the new subject line is at least 5.2% worse than
            the baseline, the experiment will also be able to detect it. If both subject
            lines have about the same performance, it is unlikely we will be able to
            tell which one is better.
          </p>

          <p>
            Of course, we do not know how much better or worse one option is than the
            other (if we did, we wouldn&apos;t need to do the experiment), but we can
            size the test to be able to detect a difference that would be meaningful for
            our use case.
          </p>
        </div>
        <b>What if we are way off on the baseline?</b>
        <div>
          <p>
            The main point of the planning process is to decide on a sample size. Once
            we have run the test and achieved the desired sample size, we stop the test.
            At that point, it does not matter how or why we selected a particular sample
            size&ndash;the analysis conclusions will still be valid. If the observed
            success rates are way different than those used to plan the test, it might
            not be as sensitive as we hoped, but the conclusions are still valid.
          </p>
        </div>
        <b>Why can&apos;t I do uneven splits with more than 2 experiment groups?</b>
        <div>
          <p>
            Because it&apos;s hard to do the UI for that. There&apos;s no technical
            reason. Maybe someday I will think of a good UI for it. You would have to
            specify the percentage of traffic in each individual experiment group, and
            make sure they add up to 100%, and it&apos;s just really annoying. Three
            groups probably isn&apos;t that bad, but ten would be obnoxious.
          </p>
        </div>
        <b>Where can I read more about A/B testing?</b>
        <div>
          <ul>
            <li>
              <a href="https://www.adventuresinwhy.com/post/ab-testing-random-sampling/">
                Adventures in Why
              </a>
              &nbsp;(my blog)
            </li>
            <li>
              <a href="https://www.evanmiller.org/how-not-to-run-an-ab-test.html">
                How Not to Run an A/B Test
              </a>
              &nbsp;(Evan Miller&apos;s blog)
            </li>
            <li>
              <a href="https://www.amazon.com/All-Statistics-Statistical-Inference-Springer/dp/1441923225">
                All of Statistics
              </a>
              &nbsp;by Larry Wasserman. An accessible statistics textbook that discusses
              causal inference and experiment design.
            </li>
            <li>
              <a href="https://www.wiley.com/en-us/Categorical+Data+Analysis%2C+3rd+Edition-p-9780470463635">
                Categorical Data Analysis
              </a>
              &nbsp;by Alan Agresti. An excellent discussion of the analysis of
              contingency tables.
            </li>
          </ul>
        </div>
      </div>
    );
  }
}

export default PlanningFAQ;
