.. _run_examples:

Run examples
==================================
This sections gives examples on how to use modCnet.

Train ac4C model using IVT datasets
********************
To train a ac4C detection model, ac4C-modified and unmodified samples are required. ac4C-modified and unmodified IVT datasets generated in this study have been uploaded to GEO database under the accession numbers of `GSE227087 <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE227087>`_ and `GSE267558 <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE267558>`_. This demo demostrated how to train a ac4C detection model from scratch using the IVT datasets. 


**1. Guppy basecalling**

Basecalling converts the raw signal generated by Oxform Nanopore sequencing to DNA/RNA sequence. Guppy is used for basecalling in this step. In some nanopore datasets, the sequence information is already contained within the FAST5 files. In such cases, the basecalling step can be skipped as the sequence data is readily available.
::
    #ac4C-modified 
    guppy_basecaller -i demo_data/IVET_ac4C -s demo_data/IVT_ac4C_guppy --num_callers 40 --recursive --fast5_out --config rna_r9.4.1_70bps_hac.cfg
    
    #unmodified
    guppy_basecaller -i demo_data/IVT_unmod -s demo_data/IVT_unmod_guppy --num_callers 40 --recursive --fast5_out --config rna_r9.4.1_70bps_hac.cfg

**2. Multi-reads FAST5 files to single-read FAST5 files**

Convert multi-reads FAST5 files to single-read FAST5 files. If the data generated by the sequencing device is already in the single-read format, this step can be skipped.
::
    #ac4C-modified
    multi_to_single_fast5 -i demo_data/IVT_ac4C_guppy -s demo_data/IVT_ac4C_guppy_single --recursive
    
    #unmodified
    multi_to_single_fast5 -i demo_data/IVT_unmod_guppy -s demo_data/IVT_unmod_guppy_single --recursive

**3. Tombo resquiggling**

In this step, the sequence obtained by basecalling is aligned or mapped to a reference genome or a known sequence. Then the corrected sequence is then associated with the corresponding current signals. The resquiggling process is typically performed in-place. No separate files are generated in this step.
::
    #ac4C-modified
    tombo resquiggle --overwrite --basecall-group Basecall_1D_000 demo_data/IVT_ac4C_guppy_single  demo_data/IVT_DRS.reference.fasta --processes 40 --fit-global-scale --include-event-stdev
    
    #unmodified
    tombo resquiggle --overwrite --basecall-group Basecall_1D_000 demo_data/IVT_unmod_guppy_single  demo_data/IVT_DRS.reference.fasta --processes 40 --fit-global-scale --include-event-stdev

**4. Map reads to reference**

minimap2 is used to map basecalled sequences to reference transcripts. The output sam file serves as the input for the subsequent feature extraction step. 
::
    #ac4C-modified
    cat demo_data/IVT_ac4C_guppy/pass/*.fastq >demo_data/IVT_ac4C.fastq
    minimap2 -ax map-ont demo_data/IVT_DRS.reference.fasta demo_data/IVT_ac4C.fastq >demo_data/IVT_ac4C.sam

    #unmodified
    cat demo_data/IVT_unmod_guppy/pass/*.fastq >demo_data/IVT_unmod.fastq
    minimap2 -ax map-ont demo_data/IVT_DRS.reference.fasta demo/IVT_unmod.fastq >demo_data/IVT_unmod.sam

**5. Feature extraction**

Extract features from resquiggled fast5 files using the ``feature_extraction.py`` scripts in the github repository.
::
    #ac4C-modified
    python script/feature_extraction.py --input demo_data/IVT_ac4C_guppy_single \
        --reference demo_data/IVT_DRS.reference.fasta  \
        --sam demo_data/IVT_ac4C.sam \
        --output demo_data/IVT_ac4C.feature.tsv \
        --clip 10 \
        --motif NNCNN
    
    #unmodified
    python script/feature_extraction.py --input demo_data/IVT_unmod_guppy_single \
        --reference demo_data/IVT_DRS.reference.fasta  \
        --sam demo_data/IVT_unmod.sam \
        --output demo_data/IVT_unmod.feature.tsv \
        --clip 10 \
        --motif NNCNN

In the feature extraction step, the motif pattern should be provided using the argument ``--motif``. The base symbols of the motif follow the IUB code standard. Here is the full definition of IUB base symbols:

+-------------+-------------+
| IUB Base    | Expansion   |
+=============+=============+
| A           | A           |
+-------------+-------------+
| C           | C           |
+-------------+-------------+
| G           | G           |
+-------------+-------------+
| T           | T           |
+-------------+-------------+
| M           | AC          |
+-------------+-------------+
| V           | ACG         |
+-------------+-------------+
| R           | AG          |
+-------------+-------------+
| H           | ACT         |
+-------------+-------------+
| W           | AT          |
+-------------+-------------+
| D           | AGT         |
+-------------+-------------+
| S           | CG          |
+-------------+-------------+
| B           | CGT         |
+-------------+-------------+
| Y           | CT          |
+-------------+-------------+
| N           | ACGT        |
+-------------+-------------+
| K           | GT          |
+-------------+-------------+


**6. Train-test split**

The train-test split is performed randomly, ensuring that the data points in each set are representative of the overall dataset. The default split ratios are 80% for training and 20% for testing. The train-test split ratio can be customized by using the argument ``--train_ratio`` to accommodate the specific requirements of the problem and the size of the dataset.

The training set is used to train the model, allowing it to learn patterns and relationships present in the data. The testing set, on the other hand, is used to assess the model's performance on new, unseen data. It serves as an independent evaluation set to measure how well the trained model generalizes to data it has not encountered before. By evaluating the model on the testing set, we can estimate its performance, detect overfitting (when the model performs well on the training set but poorly on the testing set) and assess its ability to make accurate predictions on new data.
::
    usage: train_test_split.py [-h] [--input_file INPUT_FILE]
                               [--train_file TRAIN_FILE] [--test_file TEST_FILE]
                               [--train_ratio TRAIN_RATIO]
    
    Split a feature file into training and testing sets.
    
    optional arguments:
      -h, --help                  show this help message and exit
      --input_file INPUT_FILE     Path to the input feature file
      --train_file TRAIN_FILE     Path to the train feature file
      --test_file TEST_FILE       Path to the test feature file
      --train_ratio TRAIN_RATIO   Ratio of instances to use for training (default: 0.8)

    #ac4C-modified
    python scripts/train_test_split.py --input_file demo_data/IVT_ac4C.feature.tsv --train_file demo_data/IVT_ac4C.train.feature.tsv --test_file demo_data/IVT_ac4C.test.feature.tsv --train_ratio 0.8
    
    #unmodified
    python scripts/train_test_split.py --input_file demo_data/IVT_unmod.feature.tsv --train_file demo_Data/IVT_unmod.feature.tsv --test_file demo_data/IVT_unmod.feature.test.tsv --train_ratio 0.8


**7. Train ac4C model**

To train the modCnet model using your own dataset from scratch, set the ``--run_mode`` argument to "train" and the ``--model_type`` argument to "C/ac4C". modCnet accepts both modified and unmodified feature files as input. Additionally, test feature files are necessary to evaluate the model's performance. You can specify the model save path by using the argument ``--new_model``. The model's training epochs can be defined using the argument ``--epochs``, and the model states will be saved at the end of each epoch. modCnet will preferentially use the ``GPU`` for training if CUDA is available on your device; otherwise, it will utilize the ``CPU`` mode. The training process duration can vary, depending on the size of your dataset and the computational capacity, and may last for several hours. 
::
    python script/modCnet.py --run_mode train \
      --model_type C/ac4C \
      --new_model demo_data/model/C_ac4C.IVT.demo.pkl \
      --train_data_C demo_data/IVT_unmod.feature.train.tsv \
      --train_data_ac4C demo_data/IVT_ac4C.feature.train.tsv \
      --test_data_C demo_data/IVT_ac4C.feature.test.tsv \
      --test_data_ac4C demo_data/IVT_unmod.feature.test.tsv \
      --epoch 100

During training process, the following information can be used to monitor and evaluate the performance of the model:
::
    device= cpu
    train process.
    data loaded.
    start training...
    Epoch 0-0 Train acc: 0.522000,Test Acc: 0.500000,time0:00:24.898431
    Epoch 1-0 Train acc: 0.756000,Test Acc: 0.750000,time0:00:42.953740
    Epoch 2-0 Train acc: 0.824000,Test Acc: 0.769750,time0:00:27.752530
    Epoch 3-0 Train acc: 0.804000,Test Acc: 0.790500,time0:00:29.946116
    Epoch 4-0 Train acc: 0.816000,Test Acc: 0.797250,time0:00:24.155293
    Epoch 5-0 Train acc: 0.816000,Test Acc: 0.793250,time0:00:23.675549
    Epoch 6-0 Train acc: 0.830000,Test Acc: 0.823000,time0:00:27.202119
    Epoch 7-0 Train acc: 0.852000,Test Acc: 0.834000,time0:00:36.018639
    Epoch 8-0 Train acc: 0.830000,Test Acc: 0.823250,time0:00:27.230856
    Epoch 9-0 Train acc: 0.836000,Test Acc: 0.846250,time0:00:58.296155
    Epoch 10-0 Train acc: 0.832000,Test Acc: 0.830250,time0:00:22.394222
    Epoch 11-0 Train acc: 0.858000,Test Acc: 0.857500,time0:00:18.485811


After the data processing and model training, the following files should be generated by modCnet. The trained model ``C_ac4C.IVT.demo.pkl`` will be saved in the ``./demo_data/model/`` folder. You can utilize this model for making predictions in the future.
::
    .
    ├── ac4C.feature.test.tsv
    ├── ac4C.feature.train.tsv
    ├── C.feature.test.tsv
    ├── C.feature.train.tsv
    ├── IVT_DRS.reference.fasta
    ├── IVT_fast5
    │   └── batch_0.fast5
    ├── IVT_fast5_guppy
    │   ├── fail
    │   │   └── fastq_runid_71d544d3bd9e1fe7886a5d176c756a576d30ed50_0_0.fastq
    │   ├── guppy_basecaller_log-2024-05-20_21-21-06.log
    │   ├── pass
    │   │   └── fastq_runid_71d544d3bd9e1fe7886a5d176c756a576d30ed50_0_0.fastq
    │   ├── sequencing_summary.txt
    │   ├── sequencing_telemetry.js
    │   └── workspace
    │       └── batch_0.fast5
    ├── IVT_fast5_guppy_single
    │   ├── 0
    │   │   ├── 00007b91-98f4-41c3-9eab-39f40625d550.fast5
    │   │   ├── 00104315-e8fa-4031-a122-3741b7531396.fast5
    │   │   ├── 0020eb7c-89f8-44bf-aeaf-acb2ea776b2c.fast5
    │   │   ├── 0045dcf9-ac50-4e2e-b8dc-ea7a9157b2c4.fast5
    │   │   ├── 005c48b0-72d1-4898-9fb2-00bebca69828.fast5
    │   │   ├── 0433af9f-ec17-476e-93ff-6d77f8ff6e62.fast5
    │   │   ├── 04343c9a-c88b-46e6-9b7d-1f97f7a28128.fast5
    │   │   ├── 0b84f368-b4b9-4c63-af9c-7574f9a12d43.fast5
    │   │   └── 0b8898ca-a2cc-4687-a53a-15fc159ceb3b.fast5
    │   │   
    │   └── filename_mapping.txt
    ├── IVT.fastq
    ├── IVT.feature
    ├── IVT.sam
    ├── m5C.feature.test.tsv
    ├── m5C.feature.train.tsv
    ├── model
    │   └── C_ac4C.IVT.demo.pkl
    └── test.feature.tsv


Train m5C model using IVT datasets
********************
m5C-modified and unmodified IVT datasets are publicly available at the GEO database under the accession code `GSE227087 <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE227087>`_. 

**1. Guppy basecalling**

Basecalling converts the raw signal generated by Oxform Nanopore sequencing to DNA/RNA sequence. Guppy is used for basecalling in this step. In some nanopore datasets, the sequence information is already contained within the FAST5 files. In such cases, the basecalling step can be skipped as the sequence data is readily available.
::
    #m5C-modified
    guppy_basecaller -i demo_data/IVT_m5C -s demo_data/IVT_m5C_guppy --num_callers 40 --recursive --fast5_out --config rna_r9.4.1_70bps_hac.cfg
    
    #unmodified
    guppy_basecaller -i demo_data/IVT_unmod -s demo_data/IVT_unmod_guppy --num_callers 40 --recursive --fast5_out --config rna_r9.4.1_70bps_hac.cfg

**2. Multi-reads FAST5 files to single-read FAST5 files**

Convert multi-reads FAST5 files to single-read FAST5 files. If the data generated by the sequencing device is already in the single-read format, this step can be skipped.
::
    #m5C-modified
    multi_to_single_fast5 -i demo_data/IVT_m5C_guppy -s demo_data/IVT_m5C_guppy_single --recursive
    
    #unmodified
    multi_to_single_fast5 -i demo_data/IVT_unmod_guppy -s demo_data/IVT_unmod_guppy_single --recursive

**3. Tombo resquiggling**

In this step, the sequence obtained by basecalling is aligned or mapped to a reference genome or a known sequence. Then the corrected sequence is then associated with the corresponding current signals. The resquiggling process is typically performed in-place. No separate files are generated in this step. 
::
    #m5C-modified
    tombo resquiggle --overwrite --basecall-group Basecall_1D_000 demo_data/IVT_m5C_guppy_single  demo_data/IVT_DRS.reference.fasta --processes 40 --fit-global-scale --include-event-stdev
    
    #unmodified
    tombo resquiggle --overwrite --basecall-group Basecall_1D_000 demo_data/IVT_unmod_guppy_single  demo_Data/IVT_DRS.reference.fasta --processes 40 --fit-global-scale --include-event-stdev

**4. Map reads to reference**

minimap2 is used to map basecalled sequences to reference transcripts. The output sam file serves as the input for the subsequent feature extraction step. 
::
    #m5C-modified
    cat demo_data/IVT_m5C_guppy/pass/*.fastq >demo_data/IVT_m5C.fastq
    minimap2 -ax map-ont demo_data/IVT_DRS.reference.fasta  demo_data/IVT_m5C.fastq >demo_data/IVT_m5C.sam

    #unmodified
    cat demo_data/IVT_unmod_guppy/pass/*.fastq >demo_data/IVT_unmod.fastq
    minimap2 -ax map-ont demo_data/IVT_DRS.reference.fasta demo_data/IVT_unmod.fastq >demo_data/IVT_unmod.sam

**5. Feature extraction**

Extract signals and features from resquiggled fast5 files using the following python script.
::
    #m5C-modified
    python script/feature_extraction.py --input demo_data/IVT_m5C_guppy_single \
        --reference demo_data/IVT_DRS.reference.fasta  \
        --sam demo_data/IVT_m5C.sam \
        --output demo_data/IVT_m5C.feature.tsv \
        --clip 10 \
        --motif NNCNN
    
    #unmodified
    python script/feature_extraction.py --input demo_data/IVT_unmod_guppy_single \
        --reference demo_data/IVT_DRS.reference.fasta  \
        --sam demo_data/IVT_unmod.sam \
        --output demo_data/IVT_unmod.feature.tsv \
        --clip 10 \
        --motif NNCNN

In the feature extraction step, the motif pattern should be provided using the argument ``--motif``. The base symbols of the motif follow the IUB code standard. 


**6. Train-test split**

The train-test split is performed randomly, ensuring that the data points in each set are representative of the overall dataset. The default split ratios are 80% for training and 20% for testing. The train-test split ratio can be customized by using the argument ``--train_ratio`` to accommodate the specific requirements of the problem and the size of the dataset.

The training set is used to train the model, allowing it to learn patterns and relationships present in the data. The testing set, on the other hand, is used to assess the model's performance on new, unseen data. It serves as an independent evaluation set to measure how well the trained model generalizes to data it has not encountered before. By evaluating the model on the testing set, we can estimate its performance, detect overfitting (when the model performs well on the training set but poorly on the testing set) and assess its ability to make accurate predictions on new data.
::
    usage: train_test_split.py [-h] [--input_file INPUT_FILE]
                               [--train_file TRAIN_FILE] [--test_file TEST_FILE]
                               [--train_ratio TRAIN_RATIO]
    
    Split a feature file into training and testing sets.
    
    optional arguments:
      -h, --help                  show this help message and exit
      --input_file INPUT_FILE     Path to the input feature file
      --train_file TRAIN_FILE     Path to the train feature file
      --test_file TEST_FILE       Path to the test feature file
      --train_ratio TRAIN_RATIO   Ratio of instances to use for training (default: 0.8)

    #m5C-modified
    python script/train_test_split.py --input_file demo_data/IVT_m5C.feature.tsv --train_file demo_data/IVT_m5C.feature.train.tsv --test_file demo_data/IVT_m5C.feature.test.tsv --train_ratio 0.8
    
    #unmodified
    python script/train_test_split.py --input_file demo_data/IVT_unmod.feature.tsv --train_file demo_data/IVT_unmod.feature.train.tsv --test_file demo_data/IVT_unmod.feature.test.tsv --train_ratio 0.8


**7. Train m5C model**

To train the modCnet model using your own dataset from scratch, you can set the ``--run_mode`` argument to "train". modCnet accepts both modified and unmodified feature files as input. Additionally, test feature files are necessary to evaluate the model's performance. You can specify the model save path by using the argument ``--new_model``. The model's training epochs can be defined using the argument ``--epochs``, and the model states will be saved at the end of each epoch. modCnet will preferentially use the ``GPU`` for training if CUDA is available on your device; otherwise, it will utilize the ``CPU`` mode. The training process duration can vary, depending on the size of your dataset and the computational capacity, and may last for several hours. 
::
    python script/modCnet.py --run_mode train \
      --model_type C/m5C
      --new_model demo_data/model/C_m5C.IVT.demo.pkl \
      --train_data_C demo_data/IVT_unmod.feature.train.tsv \
      --train_data_m5C demo_data/IVT_m5C.feature.train.tsv \
      --test_data_C demo_data/IVT_unmod.feature.train.tsv \
      --test_data_m5C demo_data/IVT_m5C.feature.test.tsv \
      --epoch 100

During training process, the following information can be used to monitor and evaluate the performance of the model:
::
    
    device= cpu
    train process.
    data loaded.
    start training...
    Epoch 0-0 Train acc: 0.512000,Test Acc: 0.500000,time0:08:16.780508
    Epoch 1-0 Train acc: 0.754000,Test Acc: 0.738250,time0:04:33.946534
    Epoch 2-0 Train acc: 0.786000,Test Acc: 0.775250,time0:04:57.815192
    Epoch 3-0 Train acc: 0.756000,Test Acc: 0.804750,time0:04:31.987233
    Epoch 4-0 Train acc: 0.818000,Test Acc: 0.813000,time0:04:55.408595
    Epoch 5-0 Train acc: 0.814000,Test Acc: 0.820000,time0:04:31.761226
    Epoch 6-0 Train acc: 0.854000,Test Acc: 0.833250,time0:04:15.148943
    Epoch 7-0 Train acc: 0.834000,Test Acc: 0.833250,time0:04:42.237964
    Epoch 8-0 Train acc: 0.836000,Test Acc: 0.825000,time0:04:35.039245
    Epoch 9-0 Train acc: 0.814000,Test Acc: 0.804250,time0:04:52.260900
    Epoch 10-0 Train acc: 0.862000,Test Acc: 0.842750,time0:04:57.368643
    Epoch 11-0 Train acc: 0.846000,Test Acc: 0.847750,time0:05:24.563390
    Epoch 12-0 Train acc: 0.872000,Test Acc: 0.850250,time0:04:59.518973
    Epoch 13-0 Train acc: 0.840000,Test Acc: 0.867000,time0:01:40.365091


After the data processing and model training, the following files should be generated by modCnet. The trained model ``C_m5C.IVT.demo.pkl`` will be saved in the ``./demo_data/model/`` folder. You can utilize this model for making predictions in the future.


Predict ac4C sites in human cell line
********************

HeLa nanopore data is publicly available and can be downloaded from the `GSE211759 <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE211759>`_. In this demo, subset of the HeLa nanopore data was taken for demonstration purposes due to the large size of the original datasets. The demo datasets were located under ``./demo_data/HeLa_fast5/`` directory.
::
    demo
    └── HeLa
        └── HeLa_fast5
            └── batch0.fast5

**1. Guppy basecalling**

Basecalling converts the raw signal generated by Oxform Nanopore sequencing to DNA/RNA sequence. Guppy is used for basecalling in this step. In some nanopore datasets, the sequence information is already contained within the FAST5 files. In such cases, the basecalling step can be skipped as the sequence data is readily available.
::
    guppy_basecaller -i demo_data/HeLa/HeLa_fast5 -s demo_data/HeLa/HeLa_fast5_guppy --num_callers 40 --recursive --fast5_out --config rna_r9.4.1_70bps_hac.cfg
    

**2. Multi-reads FAST5 files to single-read FAST5 files**

Convert multi-reads FAST5 files to single-read FAST5 files. If the data generated by the sequencing device is already in the single-read format, this step can be skipped.
::
    multi_to_single_fast5 -i demo_data/HeLa/HeLa_fast5_guppy -s demo_data/HeLa/HeLa_fast5_guppy_single --recursive


**3. Tombo resquiggling**

In this step, the sequence obtained by basecalling is aligned or mapped to a reference genome or a known sequence. Then the corrected sequence is then associated with the corresponding current signals. The resquiggling process is typically performed in-plac. No separate files are generated in this step. GRCh38 transcripts file can be download `here <https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.40/>`_. 
::
    tombo resquiggle --overwrite --basecall-group Basecall_1D_000 demo_data/HeLa/HeLa_fast5_guppy_single  demo_data/GRCh38_subset_reference.fa --processes 40 --fit-global-scale --include-event-stdev


**4. Map reads to reference**

minimap2 is used to map basecalled sequences to reference transcripts. The output sam file serves as the input for the subsequent feature extraction step. 
::
    cat demo_data/HeLa/HeLa_fast5_guppy/pass/*.fastq >demo_data/HeLa/HeLa.fastq
    minimap2 -ax map-ont demo_data/GRCh38_subset_reference.fa demo_data/HeLa/HeLa.fastq >demo_data/HeLa/HeLa.sam


**5. Feature extraction**

Extract signals and features from resquiggled fast5 files using the following python scripts.
::

    python script/feature_extraction.py --input demo_data/HeLa/HeLa_fast5_guppy_single \
        --reference demo_data/GRCh38_subset_reference.fa   \
        --sam demo_data/HeLa/HeLa.sam  \
        --output demo_data/HeLa/HeLa.feature.tsv \
        --clip 10 \
        --motif NNCNN


In the feature extraction step, the motif pattern should be provided using the argument ``--motif``. 


**7. Predict ac4C sites**

To predict ac4C sites in HeLa nanopore data using a pretrained model, you can set the ``--run_mode`` argument to "predict".  You can specify the pretrained model by using the argument ``--pretrained_model``. 
::
    python script/modCnet.py --run_mode predict \
          --pretrained_model model/C_ac4C.pkl \
          --feature_file demo_data/HeLa/HeLa.feature.tsv \
          --predict_result demo_data/HeLa/HeLa.prediction.tsv


During the prediction process, modCnet generates the following files. The prediction result file is named "HEK293T.prediction.tsv". 
::

    demo_data
    ├── GRCh38_subset_reference.fa
    ├── HeLa
    │   ├── HeLa_fast5
    │   ├── HeLa_fast5_guppy
    │   ├── HeLa_fast5_guppy_single
    │   ├── HeLa.fastq
    │   ├── HeLa.feature.tsv
    │   ├── HeLa.prediction.tsv
    │   └── HeLa.sam
    

The prediction result "demo/HEK293T/HEK293T.prediction.tsv" provides prediction labels along with the corresponding modification probabilities, which can be utilized for further analysis.
::
    transcript_id   site    motif   read_id                                 prediction   probability
    NM_001349947.2  552     AACCA   320a1a8b-7709-4335-8f6a-84f09ba6592a    unmod        0.00014777448
    XM_006720125.3  2437    ACCAG   53dd21de-f74b-44db-baa3-06c68772b7e1    unmod        0.062309794
    NM_001321485.2  498     TGCTG   1f8ce6a2-5fac-4a2f-ae25-0abdb0de412e    unmod        0.17353779
    NM_001199673.2  2972    ATCAA   5781a0c4-ede0-452e-8789-9a43740451ab    unmod        0.26891512
    NM_014364.5     1233    GACAA   47f7b914-a51e-4eab-adb2-e500d8a46fd1    unmod        0.029849814
    NM_001321485.2  515     GCCTC   31fe54e8-7724-40c6-aaa2-025ab5de7754    unmod        0.004975981
    NM_001136267.2  1780    GACTA   62b6ab58-5ee0-4871-95d5-5db66a9c56c7    unmod        0.0018304548
    NM_001143883.4  714     TGCAG   4fb0be9b-9628-46aa-9ba4-40a6456d7d52    unmod        0.1989807
    NM_006012.4     1058    ATCTT   7c7ff067-1ead-4838-97c8-5fca91fdfe8a    unmod        0.06284212
    NM_001143883.4  714     TGCAG   13493367-a9ab-4f20-9f62-ad32c2cc6c2e    unmod        0.022585329
    NM_001369747.1  920     ATCAT   5d2b59a7-4946-40b0-9c0e-16ba009ad4f5    unmod        0.0009560142
    NM_001321485.2  515     GCCTC   1cbc2a9b-02d5-4906-b292-63fe6a30baaa    unmod        0.0013002371
    XR_949965.1     271     GTCAA   5db89b35-738e-462d-b92b-7cded1ed2c21    unmod        0.005573378
    NM_005566.4     1652    ACCTT   5fd3dff6-0a1e-4f22-9a10-cb439cf41393    unmod        0.03093134
    NM_001024630.4  5513    TTCAA   0f39d0bc-63ac-4c55-a08c-6c88c2f1fcca    unmod        0.083354354
    NM_001997.5     473     GGCTT   2f62c329-8d4e-4a2e-b9f8-11290e077d8f    unmod        0.09690974
    NR_003286.4     1355    AGCGA   49c7e639-5681-473e-936b-c2a01eb94c6f    mod          0.7482356
    NM_001997.5     112     AACGG   31ec8d67-a62d-4085-8983-75a5c6833b17    unmod        0.01882868
    NM_001144943.1  1298    TTCTT   133fa83c-cf3a-4b10-9575-81f298fd0839    unmod        0.13784541
    XM_017004733.1  2098    CCCTC   0fca07db-bfa9-4974-8cde-aa746a76301c    unmod        0.0036647602
    NM_213725.2     421     TTCAA   3e5efe25-6e79-439d-8c9e-26bfd59216da    mod          0.8380922


The execution time for each demonstration is estimated to be approximately 3-10 minutes.