Wednesday, February 1, 2017

Pursuit of Happiness Maximization

Here's a theoretical construct I very loosely have in my head. It starts with functions to define happiness, creates a mechanism, and then tackles the fundamental issue of individual decision making vs. central planning. This has very likely been done before, but I find it interesting to work it out myself.

Define a lifetime happiness function. It's the average of the satisfaction resulting from every choice you make in your life span. Higher overall satisfaction, higher happiness.

Function: H(h) = sigma(h(i))/n, i=0..n

h(i) is the satisfaction from a given choice, i, and n is the total number of choices you make in your lifetime.

Pursuit of happiness means as a general goal is pretty standard, so maximizing this utility function is a very safe bet. Note that this is more general than profit maximization, since satisfaction with a choice doesn't necessarily entail material gain. This accounts for things like self-sacrifice.

That's the top level. For scale, h(i) varies from 0 to 42, where 42 is maximum happiness and 0 is total misery. A neutral mood is 21. (42, being the meaning of life, is clearly the best choice for number here.)

What determines the value of h(i)? Resolution of tension, which is defined here as the difference between the expected outcome, E, and the observed outcome, O (straight out of estimation theory). Assuming every choice is made with the intent of being satisfied with the outcome (assumption of rationality) at least nominally, you'd get:

0 = What was observed fell well short of what was expected
21 = What was observed matched expectations
42 = What was observed exceeded expectations

This works for tallying up someone's existing happiness based on the past. Most people strive for H=21, since we're smart enough to know that it won't be sunshine and lollipops everyday. We know that some choices will be 0, so we seek to maximize individual choices when possible to compensate and keep the average at 21 or higher as much as possible. So we have local maximization with a goal of influencing a moving average upwards.

What about predictions? If the goal is to make each choice so that it maximizes happiness, we need a way of predicting what will do so.

Choices, by their nature, are games of incomplete information. There will always be things the person making the choice does not know and which could, potentially, result in an h(i)=0 situation. The good news is that for every choice similar to ones made previously, the unknowns will tend to decrease (via experience). We also know that a person who strives for the best each time they play a game of incomplete information will trend toward the maximum value over time.

Let's take this construct and see how altering the decision structure influences it. Up to now, it's the person whose happiness is on the line making all choices that affect h(i). Let's turn that over to an external faction. Assume a 3rd party now controls the choices of another. We'll call the 3rd party's happiness G and their individual happiness measured by g(i). we'll refer to the person they make choices for as H with h(i) as the happiness result.

Note that G does not directly set the value of h(i) - what they do is make the choice and H's reaction determines h(i). The expectations, E, that determine the final happiness, h(i), are set by H, not G. G, however, controls its own g(i) expectation values.

Let's set G's goals simply: they want to make H happy. When when h(i) is 21 or higher, g(i) is 21 or higher. What happens? G will try to make choices that it predicts will result in a strong h(i) value. It will base this likely on communication with H about preferences and expectations. However, this process will be guaranteed inefficient: there will be latent variables G cannot anticipate that H could, since only H is completely aware of its own history and mind.

Still, over time, G will be able to approximate a good h(i) value, since it will learn H's preference through testing and become better capable at predicting. This will, however, take longer than H by itself. H only needs to deal with one set of unknowns - those of uncertainty involving the circumstances of the choice. G has to deal with those as well as unknowns about H's expectations.

Complicate things further: now G has to manage the choices of not one H, but 100 H's (H1...H100), all of them unique. G's happiness, g(i), is now based on the aggregate happiness of those G makes decisions for. Even assuming every H makes the same choices in the same order as the others (a simplification that does not hold in the real world), G now has to deal with not just the unknowns of the choice, but now 100 sets of unknown behavioral preference variables. Every single set has to be learned individually over time through testing, consuming more bandwidth.

Now increase this to a thousand H's. A million. More.

Economy of scale requires G to make approximations. Instead of trying to perfectly learn each H, it goes for averages. After all, its own g(i) is satisfied by the overall score. Hit a bell curve with an average at 21 and G is happy. Never mind that means 50% of the H's could very well end up with final happiness tallies of less than 21. Even if G is smart and mixes up who gets what payoffs so there isn't one subset that always gets less than 21, there will still be H's who get H<21 and some who are very near 0. This could, interestingly, lower G's own happiness to less than 21, as well.

Compare this to a model where every H makes their own calls. Without the extra layer of unknowns, each individual is able to trend toward 21 faster than with interference from G. This shorter time frame increases the likelihood of H=21 being the norm. At the very least, it should be sufficient (and there's hand waving here) to make it so the likelihood is greater than when G makes choices for H. This should also hold (more hand waving) when G only makes some choices for H.

Questions and additions for later:
1. What if G has more knowledge about choice outcomes than H? How much would they need to justify interference? And wouldn't communication be more ethical?
2. Ethical and moral constraints, such as disallowing harming or stealing from others to increase h(i).
3. Role of information sharing between H's and increasing the speed of optimizing h(i).
4. Could low levels of happiness within a subset indicate bad choice methods brought on by misinformation?
5. Tyranny. What happens when G's happiness is maximized for things other than H's well being.
6. Regrets. When h(i) is maximized locally in time, but drops in value outside of that time frame when satisfaction criteria changes with time.

There's obviously a lot more work needed to make this conclusive, but it's a start. A mathematical/game theory way to prove that central planning will always be more inefficient at making others satisfied with their lives than letting them make their own choices would be wonderful. I think this may already exist, but it's fun to create my own system.

No comments:

Post a Comment