A Review of smoothfdr

Blake Bradford Avatar

·

Exploiting Spatial Structure for Improved False Discovery Rate: A Review of smoothfdr

In the field of statistical analysis, multiple-testing problems often arise when simultaneously testing a large number of hypotheses. One common measure to control the rate of false positives is the False Discovery Rate (FDR), which limits the proportion of false discoveries among all discoveries. However, existing FDR-controlling methods may fail to detect spatially localized regions of significant test statistics, leading to false-negative or less biologically plausible results.

The smoothfdr package provides an empirical-Bayes method that addresses this limitation by exploiting the spatial structure inherent in the data. By automatically identifying spatially localized regions of significance and adjusting the statistical significance threshold accordingly, FDR smoothing enhances the power to detect signals and separate them from noise. This article aims to review the key features of the smoothfdr package and its potential applications.

Installation and Getting Started

To start utilizing the smoothfdr package, the Python version can be easily installed using pip:

pip install smoothfdr

Once installed, you can run the tool directly from the terminal by entering the command smoothfdr. Alternatively, if you want to integrate it into your code, simply import the smoothfdr module.

Running an Example

To illustrate the benefits of FDR smoothing, let’s consider a simple example using the provided dataset data.csv with a size of 128×128. This dataset contains two plateaus of increased prior probability of a signal. To perform FDR smoothing on this dataset, you can use the following code:

“`python
import numpy as np
from smoothfdr.easy import smooth_fdr

data = np.loadtxt(‘example/data.csv’, delimiter=’,’)
fdr_level = 0.05

results = smooth_fdr(data, fdr_level, verbose=5, missing_val=0)
“`

By default, the smooth_fdr function assumes a multidimensional grid that has the same shape as the data array. However, if your points are connected differently, you can transform data into a one-dimensional vector and pass a list of (x1, x2) pairs via the edges parameter. Additionally, if your grid has missing data points (e.g., in an fMRI scan), you can specify the value used to indicate missing data with the missing_val parameter.

Visualizing the Results

Once you have run the FDR smoothing algorithm, you can analyze the results using the returned dictionary. For visualization purposes, you can plot the raw data, the smoothed prior, the posteriors, and the discoveries at the specified FDR level.

“`python
import matplotlib.pylab as plt

fig, ax = plt.subplots(2, 2)

ax[0, 0].imshow(data, cmap=’gray_r’)
ax[0, 0].set_title(‘Raw data’)

ax[0, 1].imshow(results[‘priors’], cmap=’gray_r’, vmin=0, vmax=1)
ax[0, 1].set_title(‘Smoothed prior’)

ax[1, 0].imshow(results[‘posteriors’], cmap=’gray_r’, vmin=0, vmax=1)
ax[1, 0].set_title(‘Posteriors’)

ax[1, 1].imshow(results[‘discoveries’], cmap=’gray_r’, vmin=0, vmax=1)
ax[1, 1].set_title(‘Discoveries at FDR={0}’.format(fdr_level))

plt.savefig(‘results.png’)
“`

The generated image provides a visual representation of the FDR smoothing results, allowing you to observe the spatial separation of signals from noise.

Conclusion and Further Reading

The smoothfdr package introduces an empirical-Bayes method for exploiting spatial structure in large multiple-testing problems. By incorporating FDR smoothing into statistical analysis workflows, researchers and data scientists can enhance the power to detect signals and achieve more accurate and biologically plausible results.

For more in-depth understanding and detailed algorithmic information, you can refer to the paper on arXiv titled “False Discovery Rate Smoothing” by W. Tansey, O. Koyejo, R. A. Poldrack, and J. G. Scott.

In summary, the smoothfdr package offers a valuable tool for data smoothing in the context of false discovery rate control. Its easy installation process, customizable parameters, and visualization capabilities make it accessible to a wide range of stakeholders in various fields of research and data analysis.

Leave a Reply

Your email address will not be published. Required fields are marked *