Installation

The following modules are needed to run TandemMod.

Required modules

module

version

minimap2

2.17-r941

python

3.7.12

h5py

3.7.0

statsmodels

0.10.0

joblib

0.16.0

scikit-learn

0.22

torch

1.9.1

guppy

6.1.5

ont-tombo

1.5.1

ont_vbz_hdf_plugin

1.0.1

ont-fast5-api

4.1.1

numpy

1.19.5

Conda is recommended for package management, you can create a new conda environment and then install the packages. Here’s an example of how you can do it. Create a new conda environment:

conda create -n TandemMod python=3.7.12

Activate the newly created environment:

conda activate TandemMod

Install the required modules:

conda config --add channels conda-forge
conda config --add channels bioconda

conda install -c conda-forge scipy=1.7.0
conda install -c bioconda minimap2=2.17
conda install -c conda-forge numpy=1.19.5
conda install -c anaconda h5py=3.7.0
conda install -c conda-forge joblib=0.16.0
conda install -c anaconda scikit-learn=0.22
conda install -c bioconda ont-tombo=1.5.1
conda install -c bioconda ont_vbz_hdf_plugin=1.0.1
conda install -c bioconda ont-fast5-api=4.1.1
conda install -c conda-forge statsmodels=0.10.0
pip install torch==1.9.1

Or, some of the modules can be installed by pip:

pip install numpy==1.19.5
pip install h5py==3.7.0
pip install statsmodels==0.10.0
pip install joblib==0.16.0
pip install scikit-learn==0.22
pip install ont-tombo==1.5.1
pip install ont-fast5-api==4.1.1
pip install scipy==1.7.0

Guppy can be obtained from Oxford Nanopore Technologies or from this mirror. Install Guppy using dpkg:

alien ont-guppy-cpu-6.1.5-1.el7.x86_64.rpm
dpkg -i ont-guppy-cpu-6.1.5-1.el7.x86_64.deb

libhdf5 and libcrypto are required for running guppy.

The entire installation will take about 10 minutes. After installing all the essential packages, reset the environment’s state by deactivating and reactivating the environment:

conda deactivate
conda activate TandemMod

We have also provided a yaml file in the repository so you can install the dependencies through the configuration file:

conda env create -f TandemMod.yaml

The source code and data processing scripts are available on GitHub. You can download them by using the git clone command:

git clone https://github.com/yulab2021/TandemMod.git

TandemMod offers three modes: de novo training, transfer learning, and prediction. Researchers can train from scratch, fine-tune pre-trained models, or apply existing models for predictions. It provides a user-friendly solution for studying RNA modifications.

In the provided repository, the pretrained models are located under the ./models directory, and the data processing scripts and the main script are located under the ./scripts directory:

.
├── data
│   ├── A_test.tsv
│   ├── A_train.tsv
│   ├── m5C
│   ├── m6A
│   ├── m6A_test.tsv
│   └── m6A_train.tsv
├── demo
│   ├── fast5
│   │   └── batch_0.fast5
│   ├── files.txt
│   ├── guppy
│   │   ├── fail
│   │   │   └── fastq_runid_71d544d3bd9e1fe7886a5d176c756a576d30ed50_0_0.fastq
│   │   ├── guppy_basecaller_log-2023-06-06_09-58-28.log
│   │   ├── pass
│   │   │   └── fastq_runid_71d544d3bd9e1fe7886a5d176c756a576d30ed50_0_0.fastq
│   │   ├── sequencing_summary.txt
│   │   ├── sequencing_telemetry.js
│   │   └── workspace
│   │       └── batch_0.fast5
├── models
│   ├── hm5C_transfered_from_m5C.pkl
│   ├── m1A_train_on_rice_cDNA.pkl
│   ├── m5C_train_on_rice_cDNA.pkl
│   ├── m6A_train_on_rice_cDNA.pkl
│   ├── m7G_transfered_from_m5C.pkl
│   ├── psU_transfered_from_m5C.pkl
│   ├── test.model
│   └── test.pkl
├── plot
├── README.md
├── scripts
│   ├── extract_feature_from_signal.py
│   ├── extract_signal_from_fast5.py
│   ├── __init__.py
│   ├── models.py
│   ├── TandemMod.py
│   ├── train_test_split.py
│   ├── transcriptome_loci_to_genome_loci.py
│   └── utils.py
└── TandemMod.yaml