Information Theory: Mutual Information
Entropy in Neuroimaging
Entropy has three interpretations (three are identical, but in different expressions & relations):
1. Amount of information in an event (N of possible outcomes, or grey value in images)
- The larger the number of possible outcomes, the larger the information gain
- Ex. Information gain from a sentence would exponentially increase with length of sentence
- If outcome is 1, information gain is 0 (i.e. log1 = 0)
- Ex. Information gain from a sentence would exponentially increase with length of sentence
2. Uncertainty of outcome in an event
- Amount of information gain with probability is inversely related to the probability that the event will take place. The information per event is weighted by the probability of the event
- The rarer an event, the more significance the event has
- When all events are likely to occur, uncertainty or entropy is maximum (ie. more possible outcomes)
- Most common entropy form is the Shannon’s entropy: \begin{equation} H = \sum_{i} p_i log(p_i) \end{equation}
Entropy Example
> In a fair coin toss, entropy is maximum. Vice versa, the more unfair the coint toss is, the more definitive the outcome is (which means lower entropy)
> Fair coin toss: P(head) = 0.5, P(tail) = 0.5
> Entropy = -0.5log0.5 - 0.5log0.5 = 0.150 + 0.150 = 0.300 >
> Unfair coin toss: P(head) = 0.8, P(tail) = 0.2
> Entropy = -0.8log0.8 - 0.2log0.2 = 0.077 + 0.140 = 0.217
3. Dispersion of probability distribution
- Shannon’s entropy can be used as a measure of dispersion of a probability distribution. It can be computed on images by looking at their dispersion of grey values
- Image with single intensity will have low entropy value as it contains little information. Conversely, if image with varying intesity will have higher entropy value with more information
- Ex. image with single sharp peak (ie. grey value condensed in small area) will have low entropy value
Mutual Information
The goal of registration is to maximize mutual information or the overlaps. There are three ways of interpreting MI, in which they are identical but in different forms of expression and relation of variables.
I(X, Y) = H(Y) - H(Y | X)
This is the most closest form of mutual information. ie. MI of X and Y is subtracting entropy of H(Y) from the conditional entropy H(Y|X) (or p(Y) given p(X): chance of grey value in B given that corresponding image in A has grey value).
In expression of uncertainty, MI is the amount by which uncertainty about Y changes when the amount of X containing Y is given
I(X, Y) = H(X) + H(Y) - H(X, Y)
- It is the most closest form to joint entropy. H(X | Y) tells us that mutual information is greater when the joint entropy is lower. Small entropy or less dispersion would mean that information overlaps more.
- It is the most closest form to joint entropy. H(X | Y) tells us that mutual information is greater when the joint entropy is lower. Small entropy or less dispersion would mean that information overlaps more.
I(X, Y) = Sum of [p(x,y) log (p(x,y) / p(x)p(y))]
- This formula is analogous to Kullback-Leibler distance, which measures the distance between two distributions. It measures the dependence of the two images by calculating the distance between the joint distribution of the image’s grey values p(x,y) and the joint distribution in case of independence of the two images p(x)p(y)
- We will use this formula to measure MI later in the code
Code Implementation
Now let’s try using Python to measure mutual information of given images. We will be mainly comparing in two ways: comparing the identical images, and two different images.
1. Let’s begin with a setup, and direct the image files
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
import os
import nibabel as nib
# set gray colormap and nearest neighbor interpolation by default
plt.rcParams['image.cmap'] = 'gray'
plt.rcParams['image.interpolation'] = 'nearest'
# set the images
os.chdir('/Users/Jin/Documents/MATLAB')
img1 = '4_23_Drake.nii'
img2 = '4_24_Denzel_Washington.nii'
2. Let’s slice the image and set side by side to display
def get_img_slice(img, size=50):
"""
load the image using nibabel and slice the image by given size
Parameters
----------
img: nii image data read via nibabel
Returns
-------
numpy memmap: ndarray of image slice
"""
img_data = nib.load(img)
img_data = img_data.get_data()
img_slice = img_data[:, :, size] # 50 is arbitrary here
# convert any nans to 0s
img_nans = np.isnan(img_slice)
img_slice[img_nans] = 0
return img_slice
img1_slice = get_img_slice(img1)
img2_slice = get_img_slice(img2)
# display images left and right
def plot_raw(img1, img2):
plt.imshow(np.hstack((img1, img2)))
plt.show()
plot_raw(img1_slice, img1_slice)
3. Let’s plot a 1d histogram for each of the two iamges
def plot_hist1d(img1, img2, bins=20):
"""
one dimensional histogram of the slices
Parameters
----------
img1: nii image data read via nibabel
img2: nii image data read via nibabel
bins: optional (default=20)
bin size of the histogram
Returns
-------
histogram
comparing two images side by side
"""
fig, axes = plt.subplots(1, 2)
axes[0].hist(img1.ravel(), bins)
axes[0].set_title('Img1 histogram')
axes[1].hist(img2.ravel(), bins)
axes[1].set_title('Img2 histogram')
plt.show()
plot_hist1d(img1_slice, img2_slice)
4. Let’s plot the two images against each other on a scatter plot, and calculate correlation coefficient
def plot_scatter2d(img1, img2):
"""
plot the two image's histogram against each other
Parameters
----------
img1: nii image data read via nibabel
img2: nii image data read via nibabel
Returns
-------
2d plotting of the two images and correlation coeeficient
"""
corr = np.corrcoef(img1.ravel(), img2.ravel())[0, 1]
plt.plot(img1.ravel(), img2.ravel(), '.')
plt.xlabel('Img1 signal')
plt.ylabel('Img2 signal')
plt.title('Img1 vs Img2 signal cc=' + str(corr))
plt.show()
# this one is comparing two identical images so it should equal 1
plot_scatter2d(img1_slice, img1_slice)
# image 1 vs image 2
plot_scatter2d(img1_slice, img2_slice)
Feature Space
Using MI as a registration measure, we plot a feature space (or joint histogram), a two-dimensional plot showing the combinations of grey values in each of the two images for all corresponding points. For example, for each corresponding point (x, y), in which x and y are coordinates of first and second images respectively,
As the alignment of the two images change, the feature space changes. The more correctly registered the two images are, the more anatomical structures will overlap, showing clusters for the grey values. When the images are misaligned, the intensity of the clusters for certain structures will decrease, and a new pair of (x, y) will be matched as the image gets incorrectly aligned with other nearby structures of the other image. This is be shown as the dispersion of the clustering.
5. Let’s plot a joint histogram now
def plot_joint_histogram(img1, img2, bins=20, log=True):
"""
plot feature space. Given two images, the feature space is constructed by counting the number of times a combination of grey values occur
Parameters
----------
img1: nii image data read via nibabel
img2: nii image data read via nibabel
bins: optional (default=20)
bin size of the histogram
log: boolean (default=True)
keeping it true will show a better contrasted image
Returns
-------
joint histogram
feature space of the two images in graph
"""
hist_2d, x_edges, y_edges = np.histogram2d(img1.ravel(), img2.ravel(), bins)
# transpose to put the T1 bins on the horizontal axis and use 'lower' to put 0, 0 at the bottom of the plot
if not log:
plt.imshow(hist_2d.T, origin='lower')
plt.xlabel('Img1 signal bin')
plt.ylabel('Img2 signal bin')
# log the values to reduce the bins with large values
hist_2d_log = np.zeros(hist_2d.shape)
non_zeros = hist_2d != 0
hist_2d_log[non_zeros] = np.log(hist_2d[non_zeros])
plt.imshow(hist_2d_log.T, origin='lower')
plt.xlabel('Img1 signal bin')
plt.ylabel('Img2 signal bin')
# this should print a linear graph as it's comparing it to itself
print(plot_joint_histogram(img1_slice, img1_slice))
None
# compare images 1 and 2
print(plot_joint_histogram(img1_slice, img2_slice))
None
6. Let’s calculate the mutual information of the two images now.
We use the third formula stated above to measure the overlaps of the two images. The goal of registration is to maximize mutual information or the overlaps.
def mutual_information(img1, img2, bins=20):
"""
measure the mutual information of the given two images
Parameters
----------
img1: nii image data read via nibabel
img2: nii image data read via nibabel
bins: optional (default=20)
bin size of the histogram
Returns
-------
calculated mutual information: float
"""
hist_2d, x_edges, y_edges = np.histogram2d(img1.ravel(), img2.ravel(), bins)
# convert bins counts to probability values
pxy = hist_2d / float(np.sum(hist_2d))
px = np.sum(pxy, axis=1) # marginal x over y
py = np.sum(pxy, axis=0) # marginal y over x
px_py = px[:, None] * py[None, :] # broadcast to multiply marginals
# now we can do the calculation using the pxy, px_py 2D arrays
nonzeros = pxy > 0 # filer out the zero values
return np.sum(pxy[nonzeros] * np.log(pxy[nonzeros] / px_py[nonzeros]))
# the MI value of the first should be greater than the second as the first is comparing the image to itself
print(mutual_information(img1_slice, img1_slice))
print(mutual_information(img1_slice, img2_slice))
1.1967155090861803
0.20578049748917815
Calculating the Grand Average of Mutual Information
The codes above are detailed step-by-step processes. We can condense the codes above into one useful function that will allow us to examine the overall average of mutual information of each every scan image to the register image.
For example, given a register type (‘rtf’ for register to first image, or ‘rtm’ for register to mean), it will calculate each scan to the register image, calculate the MI, and return the grand average MI value.
Note: The code is currently based on ACCRE system so the paths directing to the images must be changed for flexible use. The path should direct to the folder which holds all the scan images, and since it requires gigabytes of data, I had to implement the code on ACCRE-use only.
def get_avg_mi(subjID, type='rtf', saveText=False, verbose=False, nScans=187):
"""
calculates the correlation coefficient and mutual information of the registered image to the rest of image files, and returns the grand average and trial averages
Parameters
----------
subjID : str
string of subject data (ie. 'cdcatmr011')
type : str {'rtf', 'rtm'}
registration type that would either register to first scan image or the mean image (default: 'rtf')
saveText : boolean
if True, will save the table report to a separate text file (default: False)
verbose : boolean
if True, will print out which scan file is being worked on (default: False)
nScans : int
integer of how many scans to expect per each run (default: 187)
Returns
-------
cc_all: list
list of all correlation coefficient values
mi_all: list
list of all mutual information values
"""
saveText = False
subj = str(subjID)
baseDir = '/scratch/polynlab/fmri/cdcatmr/'
funcDir = baseDir + subj + '/images/func'
tag = 'func'
if type == 'rtf': # register to first
sourceImg = '/func1/func1_00001.nii' # [rmeanfunc1_00001.nii, meanfunc1_00001.nii]
# tag = 'func'
elif type == 'rtm': # register to mean
sourceImg = '/func1/meanfunc1_00001.nii'
# tag = 'rfunc'
resDir = '/home/jeonj1/proj/mi'
os.chdir(funcDir)
funcs = os.listdir(funcDir)
funcs.sort() # funcs = ['func1', 'func2', ... 'func7', 'func8']
meanImg = mi.get_img_slice(funcDir + sourceImg, verbose=verbose)
mi_listVar = []
cc_listVar = []
# loop by each functional run
for i in range(0, len(funcs)):
curr_func = funcs[i]
# let's first create list variables for each functional run
temp = ''
temp = 'mi_' + curr_func + ' = []'
exec(temp)
mi_listVar.append('mi_' + curr_func)
temp = ''
temp = 'cc_' + curr_func + ' = []'
exec(temp)
cc_listVar.append('cc_' + curr_func)
# now let's read in each functional run folder
cfuncDir = funcDir + '/' + curr_func
os.chdir(cfuncDir)
nii_files = os.listdir(cfuncDir)
nii_files = [x for x in nii_files if x.startswith(tag) and x.endswith('nii')]
nii_files.sort()
# sanity check: count scan files
assert len(nii_files) == nScans, "total scan files found do not match " + str(nScans) + " for func run " + str(i+1)
# loop by each scan within run
for j in range(0, nScans):
if verbose:
print('starting ' + curr_func + ' | scan ' + str(j+1))
curr_nii = mi.get_img_slice(nii_files[j], verbose=verbose)
corr = mi.get_core(meanImg, curr_nii)
mutual_info = mi.mutual_information(meanImg, curr_nii)
# append each list
temp_cc = 'cc_' + curr_func + '.append(corr)'
exec(temp_cc)
temp_mi = 'mi_' + curr_func + '.append(mutual_info)'
exec(temp_mi)
cc_sums = []
mi_sums = []
for r in range(0, len(funcs)):
cc_sums.append(sum(eval(cc_listVar[r])))
mi_sums.append(sum(eval(mi_listVar[r])))
# get all entries in a single list
cc_all = []
mi_all = []
for r in range(0, len(funcs)):
cc_all = cc_all + eval(cc_listVar[r])
mi_all = mi_all + eval(mi_listVar[r])
reports = []
reports.append(['avg cc', sum(cc_sums)/(nScans*len(funcs))])
reports.append(['avg mi', sum(mi_sums)/(nScans*len(funcs))])
reports.append(['cc by runs', '-----'])
for l in cc_listVar:
reports.append([l, sum(eval(l))/nScans])
reports.append(['mi by runs', '-----'])
for l in mi_listVar:
reports.append([l, sum(eval(l))/nScans])
print(tabulate(reports))
if saveText:
with open(resDir + '/' + subj + '_mi_' + os.path.basename(sourceImg) + '.txt', 'w') as f:
for item in reports:
f.write("%s\n" % item)
print('saved the table report to ' + resDir)
return cc_all, mi_all
T-test and Plotting
We can use the function get_avg_mi
above to calculate the grand average. Then we will run a t-test on the two averages and plot the differences for visual reference. The script below is currently somewhat hard coded for simplicity sake.
nScans = 187
nRuns = 8
# we use the get_avg_mi function stated above. Specify the subject and register type.
cc1, mi1 = get_avg_mi('cdcatmr066', 'rtf', verbose=True) # register to first scan
cc2, mi2 = get_avg_mi('cdcatmr066', 'rtm', verbose=True) # register to mean
assert len(cc1) == len(cc2) == nScans * nRuns
assert len(mi1) == len(mi2) == nScans * nRuns
cc_t, cc_p = stats.ttest_ind(cc1, cc2)
mi_t, mi_p = stats.ttest_ind(mi1, mi2)
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# switch matplotlib backend so that it knows it won't print anything.
plt.switch_backend('agg')
# sns.set(color_codes=True)
sns.set_color_codes('pastel')
# sns.set_palette("tab10")
plt.figure(0)
sns.distplot(cc1, label='rtf');
sns.distplot(cc2, label='rtm');
plt.legend()
plt.title('correlation coefficient n=' + str(len(cc1)))
plt.savefig('/home/jeonj1/proj/mi/cc_distplot.png')
plt.figure(1)
sns.distplot(mi1, label='rtf');
sns.distplot(mi2, label='rtm');
plt.legend()
plt.title('mutual info | n=' + str(len(mi1)))
plt.savefig('/home/jeonj1/proj/mi/mi_distplot.png')
# let's plot box-dist plot combined
f, (ax_box1, ax_box2, ax_dist) = plt.subplots(3, sharex=True, gridspec_kw= {"height_ratios": (0.3, 0.3, 1)})
cc1_mean = np.mean(cc1)
cc2_mean = np.mean(cc2)
sns.boxplot(cc1, ax=ax_box1, color='b')
sns.boxplot(cc2, ax=ax_box2, color='r')
ax_box1.axvline(cc1_mean, color='g', linestyle='--')
ax_box2.axvline(cc2_mean, color='m', linestyle='--')
plt.subplots_adjust(top=0.87)
plt.suptitle('correlation coefficient n=' + str(len(cc1)), fontsize = 16)
sns.distplot(cc1, ax=ax_dist, label='rtf', color='b', norm_hist=True)
sns.distplot(cc2, ax=ax_dist, label='rtm', color='r', norm_hist=True)
ax_dist.axvline(cc1_mean, color='g', linestyle='--')
ax_dist.axvline(cc2_mean, color='m', linestyle='--')
plt.legend()
ax_box1.set(xlabel='')
ax_box2.set(xlabel='')
plt.savefig('/home/jeonj1/proj/mi/cc_box_distplot.png')
# let's plot box-dist plot combined
f, (ax_box1, ax_box2, ax_dist) = plt.subplots(3, sharex=True, gridspec_kw= {"height_ratios": (0.3, 0.3, 1)})
mi1_mean = np.mean(mi1)
mi2_mean = np.mean(mi2)
sns.boxplot(mi1, ax=ax_box1, color='b')
sns.boxplot(mi2, ax=ax_box2, color='r')
ax_box1.axvline(mi1_mean, color='g', linestyle='--')
ax_box2.axvline(mi2_mean, color='m', linestyle='--')
plt.subplots_adjust(top=0.87)
plt.suptitle('mutual information n=' + str(len(cc1)), fontsize = 16)
sns.distplot(mi1, ax=ax_dist, label='rtf', color='b', norm_hist=True)
sns.distplot(mi2, ax=ax_dist, label='rtm', color='r', norm_hist=True)
ax_dist.axvline(mi1_mean, color='g', linestyle='--')
ax_dist.axvline(mi2_mean, color='m', linestyle='--')
plt.legend()
ax_box1.set(xlabel='')
ax_box2.set(xlabel='')
plt.savefig('/home/jeonj1/proj/mi/mi_box_distplot.png')
The following code above will generate two joint plots that combine the box plot and distribution plot for correlation coefficient and mutual information. The first image below shows the correlation coefficient distribution (note that the max is at 1.00). The N here represents the total number of scan images that were compared.
The image below is what we’re really interested in (!The title should be changed to MI!). You can see the values range from 0.6~1.4. Also, it shows evidence that RTF version has outliers depicted in the box plot, but the outlier is moderated with the RTM method.
References & Useful Links
https://people.csail.mit.edu/fisher/publications/papers/tsai99.pdf
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87.2130&rep=rep1&type=pdf
https://matthew-brett.github.io/teaching/mutual_information.html#t1-t2-scatter