Input file preparation

Sample STR abundance table :    row format example file : input_example_row.txt ; col format example file : input_example_col.txt
STR motif information table :    motif information table example file : motif_example.txt
   For file format and parameter explanations, please refer to the webpage 'Description'

Analysis parameter settings
Abundance table format : row format col format
Minimum sequencing depth threshold :
Minimum relative copy number threshold :

This STR typing software is based on the depth information of STR sequence captured by Genesky next generation sequencing technology for analysis.
1) Abundance file format : two formats are supported, 'row format' and 'col format'
     row format : each row records the number of reads of STR on a certain fragment for every sample. If a sample contains no reads, it can be 0 or left blank
     First column: fragment name
     Second column: STR sequence type
     The third column and after: the number of reads of this STR sequence contained in each sample
     The file must include a table header. You can give the first two columns any names, but we recommend 'target' and 'str'. The following columns should be named by sample names.
     example file: input_example_row.txt

    col format: each line records the number of reads of STR on a certain fragment for a certain sample. If the sample contains no reads, it can be 0 or this line can be deleted, blank is not allowed. Each line must have 4 columns.
     First column: fragment name
     Second column: STR sequence type
     Third column: sample name
     Fourth column: number of reads
     example file: input_example_col.txt

Matters needing attention:
    (a)Fragment name: choose any name, but use letters, numbers, or underscores only, spaces is not allowed. The fragment name must be unique (the STR of the same target fragment must have the same fragment name), otherwise unexpected errors will occur.
    (b)STR sequence type: Genesky uses a fixed format expressed as 'motif (n)', which means this STR is a sequence composed of n motif. motif is a short sequence of ATCG and must be capitalized.
        e.g. AGT(8)

2) motif file format: this file declares each fragment's motif information, copy number information, etc. (must include all fragments appearing in input, the header is fixed)
    example file: motif_example.txt
    Must contain 6 columns, the column names are:
target fragment name
motif The minimum constituent unit of STR fragment. It is a short sequence composed of capitalized ATCG which must be consistent with the motif in the input file.
ploid ploidy number of species
homology The homology number of this fragment in the genome, usually set to 1. If there are n homologs (there are n different positions in the genome have exactly the same sequence), then set n
Note: copy number of final type = ploid * homology
noise_cutoff During copy number analysis, the noise threshold (STRs with frequencies below the this threshold will be directly excluded), usually set to 0.6 * type_cutoff
type_cutoff During copy number analysis, the typing threshold (STRs with frequencies higher than this threshold will be considered true), usually set to 0.5 * (1 / (ploid * homology))
The STR between the noise threshold and the typing threshold will be corrected and typed by a series of algorithms.

3) Minimum sequencing depth threshold: the minimum reads required for typing, default: 30
4) Minimum relative copy number threshold: the minimum relative copy number required for typing, ranges from 0 to 1. if the relative copy number of the sample is lower than this value, alleles are directly excluded, default: 0