A fragmented all-against-all comparison analyzes genomes by fragmenting them and comparing all pieces with all genomes, and based on this all-against-all aproach a phylogenetic dataset can be extracted. It is also possible to group the genomes and search for unique genomic regions that has a high specificity towards a "target group".
To create a fragmented all-all comparison, click on “New” and then “Fragmented all-all comparison” in the menu bar.
A wizard will help you set up the comparison by first letting you chose a name and some alignment settings. The resolution of the alignment is controlled by two parameters, the "Fragment size" and the "Sliding step size". The fragment size represents the "scanning window size" and it should be smaller than the genomic region you anticipate to find in the analysis. For bacteria, we recommend 200/100 (frag-size /slide-size ) which is more accurate and 500/500 (much faster and usually sufficient) settings. Small fragment sizes and sliding step sizes gives more demanding calculations. When working with viruses and small sequences, shorter settings may be needed. It is also possible to use tblastx (compares sequences on translated level, i.e. amino acids). This is much more demanding and the datasets should be smaller, but if the sequences are pylogenetically far apart, this may be a useful operating mode. Then you have to select genomes from your database to include in the comparison and click “Finish”. By clicking finish you will be taken to the analysis perspective.
When you which to start the alignment process, click the start button. The calculation progress will be shown and the log-window in the right part will show messages on what is happening. Typically, first a lot of conversion and preparation messages appears. Then the a BLAST list is created and executed in parallel "threads". Typically, each "thread" should not take more than at the most a few minutes to complete. The number of simultaneously calculating threads is indicated and also the thread number and the total number of threads that should be run. It is possible to send a pause signal and then resume the calculation later. After all the threads have been run, some data analysis is made and then the alignment is completed. Once an alignment is completed, it is possible to analyze the data in the other tabs in the analysis perspective. An alignment is represented on the hard drive by a folder with the prefix "alignment_analysis".
To Included genomes tab lists all genomes that are included in the alignment. It is also possible to add or remove genomes to/from a finished alignment by clicking on the "modify comparison" button at the bottom of the window.
There are two ways of define groups in a fragmented all-all comparison and one of the ways is by using the Goup settings tab.
Several group settings can be created from the same dataset (e.g. different subtypes) with the "New.." or "Make a copy..." buttons.
The heat plot tab gives an phylogenomic overview of the data. It is the average normalized BLAST score values of all fragments that are shown. It is also possible to define threshold values, meaning that that fragments falling under the threshold is not used to calculate the average similarity value. This gives a better phylogenetic signal since the similarity value is only based on conserved genetic material (the core genome). It is also possible to see how large the core genome is at the specified threshold (select "show core" instead of "show score"). It is possible to change the "color profile" of the heat plot so that differences are highlighted as well as possible for the particular dataset. The number of decimals shown can be changed. The genomes are sorted alphabetically, which often is sufficient. There are sorting possibilities built in for the heat plots. It is possible to move genomes or group of genomes with the "Move selection to row" field or by right clicking it and select "move" from the context menu. The target and background group settings can also be modified from the right click-context menu. If, "Drag and drop, single sorting" is selected, genomes can be dragged with the mouse, one by one. The sort is saved and if several sorts are wanted, new ones can be created with the "new..." button. There is also an "autosort" function, that tries to minimize the score distances between the rows. There is also an export button that allows of the phylogenomic data:
If a remote genome is already in the local database, it will be colored red in the remote list.
The score overview tab shows a graphic representation of the "biomarker scores". Biomarker scores are score values that rank all genomic regions (fragments) in how discriminating they are for the target group in terms of conservation (no false negatives) and uniqueness (no false positives in the background). There are three types of scores with different stringency:
It is possible to export an interesting sub-sequence from the genome (or the whole genome if it is completed) into a format that can be viewed in Artemis.
The export will end up in a directory called "export" under the workspace directory. It will be a "*.gbk" file that essentially is the same file as the original "gbk" file (if there are problems or warnings when loading the original file in Artemis they will remain). The "gene" and "misc feature" track is replaced by the biomarker scores. Five files are exported
In the "score table" tab, there is details about the fragments representing either:
Show sequence displays the actual sequences of the fragments and it is possible to fuse adjacent and overlapping fragments into continuous sequences. The sequences can be exported to a Fasta-file or sent to a web page ready for a blast comparison at NCBI.
The detailed scores, shows how each fragment scores against each genome in the target and background group. This may help to identify which particular strain is causing a cross reaction.
It is also possible to export the table (as its shown) or the full data table (without filtering) as a tab delimited text file for further analysis in e.g. a spreadsheet program.