the input
file extensions
Text files and compressed text files in *.txt and *.zip extensions respectively are allowed as input files, and only text files will be processed in the rest of the cases. If working with text files only the program will read them directly, but if you prefer to use multiple text files, which you may compress in a single zip file without any folder structure in it, they will be processed too.
input data file
The data file should be the output from the genotyping platform software directly, since GDF is based on pattern recognition in order to distinguish different platform inputs. It should be a tabulated plain-text file, and it must contain all the relevant information (sample name, snp code and genotype) without any processing of the default output format from the machines' software:
Sequenom typical line format
SAMPLE_ID CALL ASSAY_ID WELL_POSITION DESCRIPTION CALIBRATION MASS_SHIFT
(these default fields are set by the retailer)
SNPlex typical line format
Sample_File Sample_Name Panel Marker Dye SNP Allele_1 Allele_2 ... GQ
(defaults defined in GeneMapper's 'SNPlex_v2' table)
Things to consider when inputing genotypes directly in tables:
1. Although all columns should have a header to describe the data below,
the first column should not have any header. The file must therefore start
with a tabulation character to be recognized as a table.
2. The default expected tables are samples vs. snps, so in case snps vs. samples
tables are used should be indicated on "running options".
Note that this input file is the only required one (the rest are optional), and that a first column should be added when using Sequenom's output file with the name of the plate used in order to locate later the skews and the failed genotypes easier (SNPlex is currently working giving this information using the "platename_snp_well.fsa" format in the "Sample_File" field.
snp configuration file
A configuration tabulated plain-text file can be used in order to provide information of the genes, snps and their position that are going to be used. The first row should have the headers "GENE", "SNP_ID" and "POSITION". The rest of the file should be three columns where the first one should contain the name of the gene, the second one the name of the snp and the third one the position of the snp in that gene, all of them being case-sensitive.
pedigree file
In order to add pedigree information to the genotypes a pedigree may be also be used. It is an optional file to be entered that will be considered only if existing, and that will be only included for the output of phase, haploview and unphased. The standard format of a pedigree file is a tabulated plain-text file, containing information of the pedigree, sample, father, mother, sex and affectation status that are going to be used for each sample. As in the gene file the first row with headers is needed, but in this case the only mandatory information is the "SAMPLE_ID" and the "AFFECTATION_STATUS" (the rest of the columns will be considered if their information is present). The code used in the "SEX" column for "male" and "female" is "1" and "2" respectively. The code used in the "AFFECTATION_STATUS" column for "control" and "case" is "1" and "2" respectively (note that Phase uses "0" and "1" respectively).
population file
Information of the population to which each sample is linked may also be entered. It must be a plain-text tabulated file where the first row should contain the headers for the columns: SAMPLE, POPULATION (the name "unknown" is reserved) and POP_ID (this last one is just a numeric id to identify the population where the code "0" is reserved).
allele translation file
A tabulated plain-text file can be used to provide information of the allele translation from the two possible options to the appropriate bases. This was necessary for older versions of GeneMapper that were giving "A1" and "A2" when manual editing the genotypes. This issue has been solved, but the option will still remain to aid Taqman input. The first row should have the headers "SNP", "ALLELE1" and "ALLELE2", and the rest of the file should be three columns where the first one should contain the name of the snp, and the bases corresponding to the first and the second allele on the other two respectively. These translated alleles will be shown in the output snps vs samples table with a '*' sign next to the genotype.
snps and samples not-to-be-processed files
A list of snps and a list of samples not to be processed can be added if you do not want to edit your configuration file and remove them from it. This SNPs and samples will not be processed, and therefore their genotypes will not be taken into account in the statistics neither will be present in the later input files. These files must be single-columned, containing a snp or a sample per row.