Exploiting Spatial Structure for Improved False Discovery Rate: A Review of smoothfdr
In the field of statistical analysis, multiple-testing problems often arise when simultaneously testing a large number of hypotheses. One common measure to control the rate of false positives is the False Discovery Rate (FDR), which limits the proportion of false discoveries among all discoveries. However, existing FDR-controlling methods may fail to detect spatially localized regions of significant test statistics, leading to false-negative or less biologically plausible results.
The smoothfdr
package provides an empirical-Bayes method that addresses this limitation by exploiting the spatial structure inherent in the data. By automatically identifying spatially localized regions of significance and adjusting the statistical significance threshold accordingly, FDR smoothing enhances the power to detect signals and separate them from noise. This article aims to review the key features of the smoothfdr
package and its potential applications.
Installation and Getting Started
To start utilizing the smoothfdr
package, the Python version can be easily installed using pip:
pip install smoothfdr
Once installed, you can run the tool directly from the terminal by entering the command smoothfdr
. Alternatively, if you want to integrate it into your code, simply import the smoothfdr
module.
Running an Example
To illustrate the benefits of FDR smoothing, let’s consider a simple example using the provided dataset data.csv
with a size of 128×128. This dataset contains two plateaus of increased prior probability of a signal. To perform FDR smoothing on this dataset, you can use the following code:
“`python
import numpy as np
from smoothfdr.easy import smooth_fdr
data = np.loadtxt(‘example/data.csv’, delimiter=’,’)
fdr_level = 0.05
results = smooth_fdr(data, fdr_level, verbose=5, missing_val=0)
“`
By default, the smooth_fdr
function assumes a multidimensional grid that has the same shape as the data
array. However, if your points are connected differently, you can transform data
into a one-dimensional vector and pass a list of (x1, x2)
pairs via the edges
parameter. Additionally, if your grid has missing data points (e.g., in an fMRI scan), you can specify the value used to indicate missing data with the missing_val
parameter.
Visualizing the Results
Once you have run the FDR smoothing algorithm, you can analyze the results using the returned dictionary. For visualization purposes, you can plot the raw data, the smoothed prior, the posteriors, and the discoveries at the specified FDR level.
“`python
import matplotlib.pylab as plt
fig, ax = plt.subplots(2, 2)
ax[0, 0].imshow(data, cmap=’gray_r’)
ax[0, 0].set_title(‘Raw data’)
ax[0, 1].imshow(results[‘priors’], cmap=’gray_r’, vmin=0, vmax=1)
ax[0, 1].set_title(‘Smoothed prior’)
ax[1, 0].imshow(results[‘posteriors’], cmap=’gray_r’, vmin=0, vmax=1)
ax[1, 0].set_title(‘Posteriors’)
ax[1, 1].imshow(results[‘discoveries’], cmap=’gray_r’, vmin=0, vmax=1)
ax[1, 1].set_title(‘Discoveries at FDR={0}’.format(fdr_level))
plt.savefig(‘results.png’)
“`
The generated image provides a visual representation of the FDR smoothing results, allowing you to observe the spatial separation of signals from noise.
Conclusion and Further Reading
The smoothfdr
package introduces an empirical-Bayes method for exploiting spatial structure in large multiple-testing problems. By incorporating FDR smoothing into statistical analysis workflows, researchers and data scientists can enhance the power to detect signals and achieve more accurate and biologically plausible results.
For more in-depth understanding and detailed algorithmic information, you can refer to the paper on arXiv titled “False Discovery Rate Smoothing” by W. Tansey, O. Koyejo, R. A. Poldrack, and J. G. Scott.
In summary, the smoothfdr
package offers a valuable tool for data smoothing in the context of false discovery rate control. Its easy installation process, customizable parameters, and visualization capabilities make it accessible to a wide range of stakeholders in various fields of research and data analysis.
Leave a Reply