Information theory
November 29, 2023
Information theory #
This is my note for reading a book Elements of information theory.
Entropy #
The measure of uncertainty of probablity distribution.
If you have 2 side coins, your uncertainty is
If you have 4 side coins, your uncertainty is
And your uncertainty is measured by how many bits (the base can be changed) you need to represent the probability distribution on average.
Conditional expectation #
Ex. Y: weight, x: height
For a given , find the expected value of .
For example, when height is 150, what’s the expected weight? The answer depends on . So it is a function of (possibly a constant function)
If we let be a random,
The answer depends on and it is a random variable which depends on :
You purposefully leave the as random variable, and if you want to consider all possibility of by you get
Conditional Entropy #
Conditional entropy measures the remaining length you need after knowing something. For all cases, you look at the length of .
I.e. if I know how uncertain am I about ?
This is different from conditional expectation in that
- we use the probability of not as a pdf.
- X, Y is summed over the possibilities
Compare with conditional expectation #
Joint Entropy #
Look at two event as single event, and measure the length of the probability.
Chain rule of entropy #
The length I need to know for both is the length I need for single and the remaining.
Relation to probability #
have similar property as the .
Multiplication becomes addition under log #
If we take log,
If we take on both sides:
We can take Conditional on both sides (tautology) #
If we can do
it becomes:
Relative Entropy #
Suppose there’s a distribution , is the minimum average length we can achieve to send .
If we use length (coding scheme) based on , it will be less efficient, the average length will be longer, and relative entropy measures that extra length.
Chain rule for relative entropy #
Mutual Information #
It is the information (length) of the other random variable provides me.
If I know , how much does I know about because of the knowledge of
QUESTION
Another interpretation?
#
The sum expression seems to sugges another interpretation is possible.
By deconstructing
Chain rule for information #
Self Information #
what?
: How much info you can get by knowing about yourself : How much you are uncertain about yourself
The amount you gain by knowing yourself is the same as the amount you need to know about yourself.
Jenson’s inequality #
Convex function #
The line over two points are above the function.
Convexity #
If a 2nd derivative is non negative over an interval, the function is convex over that interval.
Proof uses the Taylor series expansion.
QUESTION
Taylor series?
#
Taylor series is an approximation for the function. Why are we using it to prove a function has a property with the approximation?
Are we assuming the approximation is good enough over the region?
Jensen’s inequality #
If f is a convex function,
QUESTION
What does it mean?
#
Could it mean something like the following? I haven’t proved it.
If we find the mean of the and draw the tangent line on
: the area below the curved line : the area below the tangent line