Use BRAKER2 genome annotation

From: https: //www.jianshu.com/p/e6a5e1f85dda

Use BRAKER2 genome annotation

BRAKER2 genome annotation is a process, it can be combined GeneMark, AUGUSTUS and transcriptome data.

Before using the software, there are several points to note down

  • Try to provide high-quality genome. With the current decline in the price of three generations of sequencing, it is not a problem.
  • Genomic name should be simple, the best is "> contig1" or "> tig000001"
  • Genomic repeats need to shield
  • The default parameters typically exhibit the results are good, but also according to species
  • Be sure to check the results of the comment, do not directly use

Software Installation

BRAKER lot depends on the software, and you need to install Perl modules also a lot that we can solve these problems with conda (need to add bioconda Channel)

Output will be some message after the installation is complete, the following is a summary

  • AUGUSTUS guarantee of the config directory to have write permission (they used conda installation does not consider this issue)
  • GeneMark and GenomeThreader also need to install additional downloads

We must install is GeneMark, need from  http://exon.gatech.edu/GeneMark/license_download.cgi  download and install, and then add environment variables

There are also some suggestions BRAKER2 software, conda is not installed, you need to install their own on-demand

  • DIAMOND 0.9.24: Alternative NCBI-BLAST +
  • cdbfasta 0.99: corrective gene contains a stop codon within AUGUSTUS predicted open reading frame
  • cdbyank 0.981: corrective gene contains a stop codon within AUGUSTUS predicted open reading frame
  • GenomeThreader: only when you need to annotate the data with a protein, needs

About these conda not installed software reference https://github.com/Gaius-Augustus/BRAKER#optional-tools

To cdbfastaand cdbyankan example

You can then be added to the environment variable

 

You can also copy to braker2 environment conda established, where ~/miniconda3is the path I conda

 

After the installation is complete, run the following proposals are dependent on this step-checking software

Software running

BRAKER according to the data type, there are different operating modes , but according to the status of fact, the most common scenario is a measure of the genome, and also measured the second generation of the transcriptome, and perhaps some of the protein sequences of closely related species. So suppose you have the following data on hand

  • Genomic sequences: genome.fasta
  • Transcriptome data: XX_1.fq.gz, XX_2.fq.gz
  • Protein sequence: proteins.fa

Step: mask repetitive sequences in the genome, this reference step using RepeatMasker RepeatModeler and annotated genomic repeat sequence

This step genome.fasta.masked output will follow the input comment

Step 2: Use the STAR FastQ comparison to the reference genome, STAR instructions refer to "RNA-seq analysis software" RNA-seq alignment tool STAR study notes

If the measured result is input a plurality xx.bam transcriptome assembled, run once for each sample to generate a plurality of BAM than files.

Step 3: Run BRAKER2

braker.pl supports up to 48 threads.

The final output will sequence and protein sequence and CDS GFF file

May issue

Use problem may occur when you install conda

The reason is because the faToTwoBitprogram error

This is because conda failed to properly handle dependencies, openssl version too high, the solution is as follows

The following warning appears runtime

Ignoring off

Reference material

 

Guess you like

Origin www.cnblogs.com/zhanmaomao/p/11671000.html