basecalling | vector mark | Assembly problem |

Bioinformatics

According to the pattern of staining chromosomes can determine chromosome number, 1-22 chromosome shorter turn, they affect the body's development, 23 chromosome determines the sex. Because cancer is caused by variation in the genetic code. Therefore, the interpretation of the genetic code is very important, but because the genetic code length is very long, so although have all been measured out, but they still exist many problems to decipher.

 

Bioinformatics is a discipline, its object of study the genome, so the initial definition of genomic informatics, the main contents acquisition processing, storage, distribution, analysis and interpretation of biological data, namely acquisition management and information mining of biological information.

Deciphering specifically sequence analysis, see the coding sequence encoding a protein which, for non-coding sequences to see what role play. Today the natural sciences and technology in the fields of science, bioinformatics is a combination of complex subjects three categories of issues, including the genome, information structure and complexity.

 

Bioinformatics:

1.Genome informatics is a scientific discipline that encompasses all aspects of genome information acquisition, processing, storage, distribution, analysis, and interpretation. It is a subject area, comprising the genomic information acquisition, processing, storage, distribution, analysis and interpretation all aspects.

2. Bioinformatics is the genomic DNA sequence information is analyzed as a source, to decipher the hidden DNA sequence in genetic language, particularly non-coding region of the substance; while the new found gene information Protein conformation After modeling and prediction.

3. The research goal of bioinformatics is to reveal "the fundamental law of the complexity of the language and the genetic structure of genomic information." It is the combination of today and the next century natural science and technology in the field of science "genome", "information structure" and "complexity" of the three major scientific issues.

 

 

With human genomeproject completed, along with the fast growth of biological data, database species gradually increased, data growth rate is gradually increased. Therefore, the following four types of databases, the DNA nucleotide database & expression sequence tag expressed sequence tag database, wherein the living cells are achieved annotation function gene , which gene covering the human genome 90% . SNPs single nucleotide polymorphism database ( SINGLE nucleotide polymorphisms ), a single species Genome DataSet . Slowly, the birth of other comprehensive database is secondary databases, including Genbank ; EMBL ; DDBJ; between these databases are interactive data every day.

General research process by Gene to Primary Sequence of Protein , and then to 3D Structure of Protein , then comment Biological function , previously considered junk gene is now partially converted into noncoding Gene , and expand its research.

Splicing sequence and annotation of the genome of the big rely mainly on bioinformatics.

 

 

After Sequencing , basecalling measuring fluorescence bases, Vector Mark then (by base-specific primer removal) of these physical methods is applied bioinformatics assembly , Assembly problem is that multiple clips can not be correctly spliced, the assembly main idea is the same data using different cutting methods, these methods produce different sections that can help us find spliced gene set of threads, and now supercompute help with faster, coverage can be 99% . Even so, there are also unable Finishing (fill hole) on the part, which is a big problem. But so far, most of the base sequence information can be read out, and then repeat mark + ORF prediction + Gene annotation decipher more information, to solve biological problems.

 

Guess you like

Origin www.cnblogs.com/yuanjingnan/p/11546253.html