Run examples

This sections gives examples on how to use modCnet.

Train ac4C model using IVT datasets

To train a ac4C detection model, ac4C-modified and unmodified samples are required. ac4C-modified and unmodified IVT datasets generated in this study have been uploaded to GEO database under the accession numbers of GSE227087 and GSE267558. This demo demostrated how to train a ac4C detection model from scratch using the IVT datasets.

1. Guppy basecalling

Basecalling converts the raw signal generated by Oxform Nanopore sequencing to DNA/RNA sequence. Guppy is used for basecalling in this step. In some nanopore datasets, the sequence information is already contained within the FAST5 files. In such cases, the basecalling step can be skipped as the sequence data is readily available.

#ac4C-modified
guppy_basecaller -i demo_data/IVET_ac4C -s demo_data/IVT_ac4C_guppy --num_callers 40 --recursive --fast5_out --config rna_r9.4.1_70bps_hac.cfg

#unmodified
guppy_basecaller -i demo_data/IVT_unmod -s demo_data/IVT_unmod_guppy --num_callers 40 --recursive --fast5_out --config rna_r9.4.1_70bps_hac.cfg

2. Multi-reads FAST5 files to single-read FAST5 files

Convert multi-reads FAST5 files to single-read FAST5 files. If the data generated by the sequencing device is already in the single-read format, this step can be skipped.

#ac4C-modified
multi_to_single_fast5 -i demo_data/IVT_ac4C_guppy -s demo_data/IVT_ac4C_guppy_single --recursive

#unmodified
multi_to_single_fast5 -i demo_data/IVT_unmod_guppy -s demo_data/IVT_unmod_guppy_single --recursive

3. Tombo resquiggling

In this step, the sequence obtained by basecalling is aligned or mapped to a reference genome or a known sequence. Then the corrected sequence is then associated with the corresponding current signals. The resquiggling process is typically performed in-place. No separate files are generated in this step.

#ac4C-modified
tombo resquiggle --overwrite --basecall-group Basecall_1D_000 demo_data/IVT_ac4C_guppy_single  demo_data/IVT_DRS.reference.fasta --processes 40 --fit-global-scale --include-event-stdev

#unmodified
tombo resquiggle --overwrite --basecall-group Basecall_1D_000 demo_data/IVT_unmod_guppy_single  demo_data/IVT_DRS.reference.fasta --processes 40 --fit-global-scale --include-event-stdev

4. Map reads to reference

minimap2 is used to map basecalled sequences to reference transcripts. The output sam file serves as the input for the subsequent feature extraction step.

#ac4C-modified
cat demo_data/IVT_ac4C_guppy/pass/*.fastq >demo_data/IVT_ac4C.fastq
minimap2 -ax map-ont demo_data/IVT_DRS.reference.fasta demo_data/IVT_ac4C.fastq >demo_data/IVT_ac4C.sam

#unmodified
cat demo_data/IVT_unmod_guppy/pass/*.fastq >demo_data/IVT_unmod.fastq
minimap2 -ax map-ont demo_data/IVT_DRS.reference.fasta demo/IVT_unmod.fastq >demo_data/IVT_unmod.sam

5. Feature extraction

Extract features from resquiggled fast5 files using the feature_extraction.py scripts in the github repository.

#ac4C-modified
python script/feature_extraction.py --input demo_data/IVT_ac4C_guppy_single \
    --reference demo_data/IVT_DRS.reference.fasta  \
    --sam demo_data/IVT_ac4C.sam \
    --output demo_data/IVT_ac4C.feature.tsv \
    --clip 10 \
    --motif NNCNN

#unmodified
python script/feature_extraction.py --input demo_data/IVT_unmod_guppy_single \
    --reference demo_data/IVT_DRS.reference.fasta  \
    --sam demo_data/IVT_unmod.sam \
    --output demo_data/IVT_unmod.feature.tsv \
    --clip 10 \
    --motif NNCNN

In the feature extraction step, the motif pattern should be provided using the argument --motif. The base symbols of the motif follow the IUB code standard. Here is the full definition of IUB base symbols:

IUB Base	Expansion
A	A
C	C
G	G
T	T
M	AC
V	ACG
R	AG
H	ACT
W	AT
D	AGT
S	CG
B	CGT
Y	CT
N	ACGT
K	GT

6. Train-test split

The train-test split is performed randomly, ensuring that the data points in each set are representative of the overall dataset. The default split ratios are 80% for training and 20% for testing. The train-test split ratio can be customized by using the argument --train_ratio to accommodate the specific requirements of the problem and the size of the dataset.

The training set is used to train the model, allowing it to learn patterns and relationships present in the data. The testing set, on the other hand, is used to assess the model’s performance on new, unseen data. It serves as an independent evaluation set to measure how well the trained model generalizes to data it has not encountered before. By evaluating the model on the testing set, we can estimate its performance, detect overfitting (when the model performs well on the training set but poorly on the testing set) and assess its ability to make accurate predictions on new data.

usage: train_test_split.py [-h] [--input_file INPUT_FILE]
                           [--train_file TRAIN_FILE] [--test_file TEST_FILE]
                           [--train_ratio TRAIN_RATIO]

Split a feature file into training and testing sets.

optional arguments:
  -h, --help                  show this help message and exit
  --input_file INPUT_FILE     Path to the input feature file
  --train_file TRAIN_FILE     Path to the train feature file
  --test_file TEST_FILE       Path to the test feature file
  --train_ratio TRAIN_RATIO   Ratio of instances to use for training (default: 0.8)

#ac4C-modified
python scripts/train_test_split.py --input_file demo_data/IVT_ac4C.feature.tsv --train_file demo_data/IVT_ac4C.train.feature.tsv --test_file demo_data/IVT_ac4C.test.feature.tsv --train_ratio 0.8

#unmodified
python scripts/train_test_split.py --input_file demo_data/IVT_unmod.feature.tsv --train_file demo_Data/IVT_unmod.feature.tsv --test_file demo_data/IVT_unmod.feature.test.tsv --train_ratio 0.8

7. Train ac4C model

To train the modCnet model using your own dataset from scratch, set the --run_mode argument to “train” and the --model_type argument to “C/ac4C”. modCnet accepts both modified and unmodified feature files as input. Additionally, test feature files are necessary to evaluate the model’s performance. You can specify the model save path by using the argument --new_model. The model’s training epochs can be defined using the argument --epochs, and the model states will be saved at the end of each epoch. modCnet will preferentially use the GPU for training if CUDA is available on your device; otherwise, it will utilize the CPU mode. The training process duration can vary, depending on the size of your dataset and the computational capacity, and may last for several hours.

python script/modCnet.py --run_mode train \
  --model_type C/ac4C \
  --new_model demo_data/model/C_ac4C.IVT.demo.pkl \
  --train_data_C demo_data/IVT_unmod.feature.train.tsv \
  --train_data_ac4C demo_data/IVT_ac4C.feature.train.tsv \
  --test_data_C demo_data/IVT_ac4C.feature.test.tsv \
  --test_data_ac4C demo_data/IVT_unmod.feature.test.tsv \
  --epoch 100

During training process, the following information can be used to monitor and evaluate the performance of the model:

device= cpu
train process.
data loaded.
start training...
Epoch 0-0 Train acc: 0.522000,Test Acc: 0.500000,time0:00:24.898431
Epoch 1-0 Train acc: 0.756000,Test Acc: 0.750000,time0:00:42.953740
Epoch 2-0 Train acc: 0.824000,Test Acc: 0.769750,time0:00:27.752530
Epoch 3-0 Train acc: 0.804000,Test Acc: 0.790500,time0:00:29.946116
Epoch 4-0 Train acc: 0.816000,Test Acc: 0.797250,time0:00:24.155293
Epoch 5-0 Train acc: 0.816000,Test Acc: 0.793250,time0:00:23.675549
Epoch 6-0 Train acc: 0.830000,Test Acc: 0.823000,time0:00:27.202119
Epoch 7-0 Train acc: 0.852000,Test Acc: 0.834000,time0:00:36.018639
Epoch 8-0 Train acc: 0.830000,Test Acc: 0.823250,time0:00:27.230856
Epoch 9-0 Train acc: 0.836000,Test Acc: 0.846250,time0:00:58.296155
Epoch 10-0 Train acc: 0.832000,Test Acc: 0.830250,time0:00:22.394222
Epoch 11-0 Train acc: 0.858000,Test Acc: 0.857500,time0:00:18.485811

After the data processing and model training, the following files should be generated by modCnet. The trained model C_ac4C.IVT.demo.pkl will be saved in the ./demo_data/model/ folder. You can utilize this model for making predictions in the future.

.
├── ac4C.feature.test.tsv
├── ac4C.feature.train.tsv
├── C.feature.test.tsv
├── C.feature.train.tsv
├── IVT_DRS.reference.fasta
├── IVT_fast5
│   └── batch_0.fast5
├── IVT_fast5_guppy
│   ├── fail
│   │   └── fastq_runid_71d544d3bd9e1fe7886a5d176c756a576d30ed50_0_0.fastq
│   ├── guppy_basecaller_log-2024-05-20_21-21-06.log
│   ├── pass
│   │   └── fastq_runid_71d544d3bd9e1fe7886a5d176c756a576d30ed50_0_0.fastq
│   ├── sequencing_summary.txt
│   ├── sequencing_telemetry.js
│   └── workspace
│       └── batch_0.fast5
├── IVT_fast5_guppy_single
│   ├── 0
│   │   ├── 00007b91-98f4-41c3-9eab-39f40625d550.fast5
│   │   ├── 00104315-e8fa-4031-a122-3741b7531396.fast5
│   │   ├── 0020eb7c-89f8-44bf-aeaf-acb2ea776b2c.fast5
│   │   ├── 0045dcf9-ac50-4e2e-b8dc-ea7a9157b2c4.fast5
│   │   ├── 005c48b0-72d1-4898-9fb2-00bebca69828.fast5
│   │   ├── 0433af9f-ec17-476e-93ff-6d77f8ff6e62.fast5
│   │   ├── 04343c9a-c88b-46e6-9b7d-1f97f7a28128.fast5
│   │   ├── 0b84f368-b4b9-4c63-af9c-7574f9a12d43.fast5
│   │   └── 0b8898ca-a2cc-4687-a53a-15fc159ceb3b.fast5
│   │
│   └── filename_mapping.txt
├── IVT.fastq
├── IVT.feature
├── IVT.sam
├── m5C.feature.test.tsv
├── m5C.feature.train.tsv
├── model
│   └── C_ac4C.IVT.demo.pkl
└── test.feature.tsv

Train m5C model using IVT datasets

m5C-modified and unmodified IVT datasets are publicly available at the GEO database under the accession code GSE227087.

1. Guppy basecalling

Basecalling converts the raw signal generated by Oxform Nanopore sequencing to DNA/RNA sequence. Guppy is used for basecalling in this step. In some nanopore datasets, the sequence information is already contained within the FAST5 files. In such cases, the basecalling step can be skipped as the sequence data is readily available.

#m5C-modified
guppy_basecaller -i demo_data/IVT_m5C -s demo_data/IVT_m5C_guppy --num_callers 40 --recursive --fast5_out --config rna_r9.4.1_70bps_hac.cfg

#unmodified
guppy_basecaller -i demo_data/IVT_unmod -s demo_data/IVT_unmod_guppy --num_callers 40 --recursive --fast5_out --config rna_r9.4.1_70bps_hac.cfg

2. Multi-reads FAST5 files to single-read FAST5 files

Convert multi-reads FAST5 files to single-read FAST5 files. If the data generated by the sequencing device is already in the single-read format, this step can be skipped.

#m5C-modified
multi_to_single_fast5 -i demo_data/IVT_m5C_guppy -s demo_data/IVT_m5C_guppy_single --recursive

#unmodified
multi_to_single_fast5 -i demo_data/IVT_unmod_guppy -s demo_data/IVT_unmod_guppy_single --recursive

3. Tombo resquiggling

In this step, the sequence obtained by basecalling is aligned or mapped to a reference genome or a known sequence. Then the corrected sequence is then associated with the corresponding current signals. The resquiggling process is typically performed in-place. No separate files are generated in this step.

#m5C-modified
tombo resquiggle --overwrite --basecall-group Basecall_1D_000 demo_data/IVT_m5C_guppy_single  demo_data/IVT_DRS.reference.fasta --processes 40 --fit-global-scale --include-event-stdev

#unmodified
tombo resquiggle --overwrite --basecall-group Basecall_1D_000 demo_data/IVT_unmod_guppy_single  demo_Data/IVT_DRS.reference.fasta --processes 40 --fit-global-scale --include-event-stdev

4. Map reads to reference

minimap2 is used to map basecalled sequences to reference transcripts. The output sam file serves as the input for the subsequent feature extraction step.

#m5C-modified
cat demo_data/IVT_m5C_guppy/pass/*.fastq >demo_data/IVT_m5C.fastq
minimap2 -ax map-ont demo_data/IVT_DRS.reference.fasta  demo_data/IVT_m5C.fastq >demo_data/IVT_m5C.sam

#unmodified
cat demo_data/IVT_unmod_guppy/pass/*.fastq >demo_data/IVT_unmod.fastq
minimap2 -ax map-ont demo_data/IVT_DRS.reference.fasta demo_data/IVT_unmod.fastq >demo_data/IVT_unmod.sam

5. Feature extraction

Extract signals and features from resquiggled fast5 files using the following python script.

#m5C-modified
python script/feature_extraction.py --input demo_data/IVT_m5C_guppy_single \
    --reference demo_data/IVT_DRS.reference.fasta  \
    --sam demo_data/IVT_m5C.sam \
    --output demo_data/IVT_m5C.feature.tsv \
    --clip 10 \
    --motif NNCNN

#unmodified
python script/feature_extraction.py --input demo_data/IVT_unmod_guppy_single \
    --reference demo_data/IVT_DRS.reference.fasta  \
    --sam demo_data/IVT_unmod.sam \
    --output demo_data/IVT_unmod.feature.tsv \
    --clip 10 \
    --motif NNCNN

In the feature extraction step, the motif pattern should be provided using the argument --motif. The base symbols of the motif follow the IUB code standard.

6. Train-test split

The train-test split is performed randomly, ensuring that the data points in each set are representative of the overall dataset. The default split ratios are 80% for training and 20% for testing. The train-test split ratio can be customized by using the argument --train_ratio to accommodate the specific requirements of the problem and the size of the dataset.

The training set is used to train the model, allowing it to learn patterns and relationships present in the data. The testing set, on the other hand, is used to assess the model’s performance on new, unseen data. It serves as an independent evaluation set to measure how well the trained model generalizes to data it has not encountered before. By evaluating the model on the testing set, we can estimate its performance, detect overfitting (when the model performs well on the training set but poorly on the testing set) and assess its ability to make accurate predictions on new data.

usage: train_test_split.py [-h] [--input_file INPUT_FILE]
                           [--train_file TRAIN_FILE] [--test_file TEST_FILE]
                           [--train_ratio TRAIN_RATIO]

Split a feature file into training and testing sets.

optional arguments:
  -h, --help                  show this help message and exit
  --input_file INPUT_FILE     Path to the input feature file
  --train_file TRAIN_FILE     Path to the train feature file
  --test_file TEST_FILE       Path to the test feature file
  --train_ratio TRAIN_RATIO   Ratio of instances to use for training (default: 0.8)

#m5C-modified
python script/train_test_split.py --input_file demo_data/IVT_m5C.feature.tsv --train_file demo_data/IVT_m5C.feature.train.tsv --test_file demo_data/IVT_m5C.feature.test.tsv --train_ratio 0.8

#unmodified
python script/train_test_split.py --input_file demo_data/IVT_unmod.feature.tsv --train_file demo_data/IVT_unmod.feature.train.tsv --test_file demo_data/IVT_unmod.feature.test.tsv --train_ratio 0.8

7. Train m5C model

To train the modCnet model using your own dataset from scratch, you can set the --run_mode argument to “train”. modCnet accepts both modified and unmodified feature files as input. Additionally, test feature files are necessary to evaluate the model’s performance. You can specify the model save path by using the argument --new_model. The model’s training epochs can be defined using the argument --epochs, and the model states will be saved at the end of each epoch. modCnet will preferentially use the GPU for training if CUDA is available on your device; otherwise, it will utilize the CPU mode. The training process duration can vary, depending on the size of your dataset and the computational capacity, and may last for several hours.

python script/modCnet.py --run_mode train \
  --model_type C/m5C
  --new_model demo_data/model/C_m5C.IVT.demo.pkl \
  --train_data_C demo_data/IVT_unmod.feature.train.tsv \
  --train_data_m5C demo_data/IVT_m5C.feature.train.tsv \
  --test_data_C demo_data/IVT_unmod.feature.train.tsv \
  --test_data_m5C demo_data/IVT_m5C.feature.test.tsv \
  --epoch 100

During training process, the following information can be used to monitor and evaluate the performance of the model:

device= cpu
train process.
data loaded.
start training...
Epoch 0-0 Train acc: 0.512000,Test Acc: 0.500000,time0:08:16.780508
Epoch 1-0 Train acc: 0.754000,Test Acc: 0.738250,time0:04:33.946534
Epoch 2-0 Train acc: 0.786000,Test Acc: 0.775250,time0:04:57.815192
Epoch 3-0 Train acc: 0.756000,Test Acc: 0.804750,time0:04:31.987233
Epoch 4-0 Train acc: 0.818000,Test Acc: 0.813000,time0:04:55.408595
Epoch 5-0 Train acc: 0.814000,Test Acc: 0.820000,time0:04:31.761226
Epoch 6-0 Train acc: 0.854000,Test Acc: 0.833250,time0:04:15.148943
Epoch 7-0 Train acc: 0.834000,Test Acc: 0.833250,time0:04:42.237964
Epoch 8-0 Train acc: 0.836000,Test Acc: 0.825000,time0:04:35.039245
Epoch 9-0 Train acc: 0.814000,Test Acc: 0.804250,time0:04:52.260900
Epoch 10-0 Train acc: 0.862000,Test Acc: 0.842750,time0:04:57.368643
Epoch 11-0 Train acc: 0.846000,Test Acc: 0.847750,time0:05:24.563390
Epoch 12-0 Train acc: 0.872000,Test Acc: 0.850250,time0:04:59.518973
Epoch 13-0 Train acc: 0.840000,Test Acc: 0.867000,time0:01:40.365091

After the data processing and model training, the following files should be generated by modCnet. The trained model C_m5C.IVT.demo.pkl will be saved in the ./demo_data/model/ folder. You can utilize this model for making predictions in the future.

Predict ac4C sites in human cell line

HeLa nanopore data is publicly available and can be downloaded from the GSE211759. In this demo, subset of the HeLa nanopore data was taken for demonstration purposes due to the large size of the original datasets. The demo datasets were located under ./demo_data/HeLa_fast5/ directory.

demo
└── HeLa
    └── HeLa_fast5
        └── batch0.fast5

1. Guppy basecalling

Basecalling converts the raw signal generated by Oxform Nanopore sequencing to DNA/RNA sequence. Guppy is used for basecalling in this step. In some nanopore datasets, the sequence information is already contained within the FAST5 files. In such cases, the basecalling step can be skipped as the sequence data is readily available.

guppy_basecaller -i demo_data/HeLa/HeLa_fast5 -s demo_data/HeLa/HeLa_fast5_guppy --num_callers 40 --recursive --fast5_out --config rna_r9.4.1_70bps_hac.cfg

2. Multi-reads FAST5 files to single-read FAST5 files

Convert multi-reads FAST5 files to single-read FAST5 files. If the data generated by the sequencing device is already in the single-read format, this step can be skipped.

multi_to_single_fast5 -i demo_data/HeLa/HeLa_fast5_guppy -s demo_data/HeLa/HeLa_fast5_guppy_single --recursive

3. Tombo resquiggling

In this step, the sequence obtained by basecalling is aligned or mapped to a reference genome or a known sequence. Then the corrected sequence is then associated with the corresponding current signals. The resquiggling process is typically performed in-plac. No separate files are generated in this step. GRCh38 transcripts file can be download here.

tombo resquiggle --overwrite --basecall-group Basecall_1D_000 demo_data/HeLa/HeLa_fast5_guppy_single  demo_data/GRCh38_subset_reference.fa --processes 40 --fit-global-scale --include-event-stdev

4. Map reads to reference

minimap2 is used to map basecalled sequences to reference transcripts. The output sam file serves as the input for the subsequent feature extraction step.

cat demo_data/HeLa/HeLa_fast5_guppy/pass/*.fastq >demo_data/HeLa/HeLa.fastq
minimap2 -ax map-ont demo_data/GRCh38_subset_reference.fa demo_data/HeLa/HeLa.fastq >demo_data/HeLa/HeLa.sam

5. Feature extraction

Extract signals and features from resquiggled fast5 files using the following python scripts.

python script/feature_extraction.py --input demo_data/HeLa/HeLa_fast5_guppy_single \
    --reference demo_data/GRCh38_subset_reference.fa   \
    --sam demo_data/HeLa/HeLa.sam  \
    --output demo_data/HeLa/HeLa.feature.tsv \
    --clip 10 \
    --motif NNCNN

In the feature extraction step, the motif pattern should be provided using the argument --motif.

7. Predict ac4C sites

To predict ac4C sites in HeLa nanopore data using a pretrained model, you can set the --run_mode argument to “predict”. You can specify the pretrained model by using the argument --pretrained_model.

python script/modCnet.py --run_mode predict \
      --pretrained_model model/C_ac4C.pkl \
      --feature_file demo_data/HeLa/HeLa.feature.tsv \
      --predict_result demo_data/HeLa/HeLa.prediction.tsv

During the prediction process, modCnet generates the following files. The prediction result file is named “HEK293T.prediction.tsv”.

demo_data
├── GRCh38_subset_reference.fa
├── HeLa
│   ├── HeLa_fast5
│   ├── HeLa_fast5_guppy
│   ├── HeLa_fast5_guppy_single
│   ├── HeLa.fastq
│   ├── HeLa.feature.tsv
│   ├── HeLa.prediction.tsv
│   └── HeLa.sam

The prediction result “demo/HEK293T/HEK293T.prediction.tsv” provides prediction labels along with the corresponding modification probabilities, which can be utilized for further analysis.

transcript_id   site    motif   read_id                                 prediction   probability
NM_001349947.2  552     AACCA   320a1a8b-7709-4335-8f6a-84f09ba6592a    unmod        0.00014777448
XM_006720125.3  2437    ACCAG   53dd21de-f74b-44db-baa3-06c68772b7e1    unmod        0.062309794
NM_001321485.2  498     TGCTG   1f8ce6a2-5fac-4a2f-ae25-0abdb0de412e    unmod        0.17353779
NM_001199673.2  2972    ATCAA   5781a0c4-ede0-452e-8789-9a43740451ab    unmod        0.26891512
NM_014364.5     1233    GACAA   47f7b914-a51e-4eab-adb2-e500d8a46fd1    unmod        0.029849814
NM_001321485.2  515     GCCTC   31fe54e8-7724-40c6-aaa2-025ab5de7754    unmod        0.004975981
NM_001136267.2  1780    GACTA   62b6ab58-5ee0-4871-95d5-5db66a9c56c7    unmod        0.0018304548
NM_001143883.4  714     TGCAG   4fb0be9b-9628-46aa-9ba4-40a6456d7d52    unmod        0.1989807
NM_006012.4     1058    ATCTT   7c7ff067-1ead-4838-97c8-5fca91fdfe8a    unmod        0.06284212
NM_001143883.4  714     TGCAG   13493367-a9ab-4f20-9f62-ad32c2cc6c2e    unmod        0.022585329
NM_001369747.1  920     ATCAT   5d2b59a7-4946-40b0-9c0e-16ba009ad4f5    unmod        0.0009560142
NM_001321485.2  515     GCCTC   1cbc2a9b-02d5-4906-b292-63fe6a30baaa    unmod        0.0013002371
XR_949965.1     271     GTCAA   5db89b35-738e-462d-b92b-7cded1ed2c21    unmod        0.005573378
NM_005566.4     1652    ACCTT   5fd3dff6-0a1e-4f22-9a10-cb439cf41393    unmod        0.03093134
NM_001024630.4  5513    TTCAA   0f39d0bc-63ac-4c55-a08c-6c88c2f1fcca    unmod        0.083354354
NM_001997.5     473     GGCTT   2f62c329-8d4e-4a2e-b9f8-11290e077d8f    unmod        0.09690974
NR_003286.4     1355    AGCGA   49c7e639-5681-473e-936b-c2a01eb94c6f    mod          0.7482356
NM_001997.5     112     AACGG   31ec8d67-a62d-4085-8983-75a5c6833b17    unmod        0.01882868
NM_001144943.1  1298    TTCTT   133fa83c-cf3a-4b10-9575-81f298fd0839    unmod        0.13784541
XM_017004733.1  2098    CCCTC   0fca07db-bfa9-4974-8cde-aa746a76301c    unmod        0.0036647602
NM_213725.2     421     TTCAA   3e5efe25-6e79-439d-8c9e-26bfd59216da    mod          0.8380922

The execution time for each demonstration is estimated to be approximately 3-10 minutes.