From: https: //www.jianshu.com/p/e6a5e1f85dda
Use BRAKER2 genome annotation
BRAKER2 genome annotation is a process, it can be combined GeneMark, AUGUSTUS and transcriptome data.
Before using the software, there are several points to note down
- Try to provide high-quality genome. With the current decline in the price of three generations of sequencing, it is not a problem.
- Genomic name should be simple, the best is "> contig1" or "> tig000001"
- Genomic repeats need to shield
- The default parameters typically exhibit the results are good, but also according to species
- Be sure to check the results of the comment, do not directly use
Software Installation
BRAKER lot depends on the software, and you need to install Perl modules also a lot that we can solve these problems with conda (need to add bioconda Channel)
Output will be some message after the installation is complete, the following is a summary
- AUGUSTUS guarantee of the config directory to have write permission (they used conda installation does not consider this issue)
- GeneMark and GenomeThreader also need to install additional downloads
We must install is GeneMark, need from http://exon.gatech.edu/GeneMark/license_download.cgi download and install, and then add environment variables
There are also some suggestions BRAKER2 software, conda is not installed, you need to install their own on-demand
- DIAMOND 0.9.24: Alternative NCBI-BLAST +
- cdbfasta 0.99: corrective gene contains a stop codon within AUGUSTUS predicted open reading frame
- cdbyank 0.981: corrective gene contains a stop codon within AUGUSTUS predicted open reading frame
- GenomeThreader: only when you need to annotate the data with a protein, needs
About these conda not installed software reference https://github.com/Gaius-Augustus/BRAKER#optional-tools
To cdbfasta
and cdbyank
an example
You can then be added to the environment variable
You can also copy to braker2 environment conda established, where ~/miniconda3
is the path I conda
After the installation is complete, run the following proposals are dependent on this step-checking software
Software running
BRAKER according to the data type, there are different operating modes , but according to the status of fact, the most common scenario is a measure of the genome, and also measured the second generation of the transcriptome, and perhaps some of the protein sequences of closely related species. So suppose you have the following data on hand
- Genomic sequences: genome.fasta
- Transcriptome data: XX_1.fq.gz, XX_2.fq.gz
- Protein sequence: proteins.fa
Step: mask repetitive sequences in the genome, this reference step using RepeatMasker RepeatModeler and annotated genomic repeat sequence
This step genome.fasta.masked output will follow the input comment
Step 2: Use the STAR FastQ comparison to the reference genome, STAR instructions refer to "RNA-seq analysis software" RNA-seq alignment tool STAR study notes
If the measured result is input a plurality xx.bam transcriptome assembled, run once for each sample to generate a plurality of BAM than files.
Step 3: Run BRAKER2
braker.pl supports up to 48 threads.
The final output will sequence and protein sequence and CDS GFF file
May issue
Use problem may occur when you install conda
The reason is because the faToTwoBit
program error
This is because conda failed to properly handle dependencies, openssl version too high, the solution is as follows
The following warning appears runtime
Ignoring off
Reference material
- BRAKER2 official tutorial: https://github.com/Gaius-Augustus/BRAKER