我的操作平台为linux64bit,所以一切代码均在linux平台下运行。
1.linux解压缩file.gz格式文件。
【参考链接】
[1]https://blog.csdn.net/z69183787/article/details/81739901
[2]https://www.cnblogs.com/wangshouchang/p/7748527.html
【解决方案】
gzip -d gencode.v29.annotation.gff3.gz
2.gff文档的下载
在GENECODE数据库中可以下载到chr开头的gff3人类基因组注释文件。
https://www.gencodegenes.org/human/release_29.html
本次实验我主要下载的Comprehensive gene annotation(Regions:CHR)人类染色体的注释文件。
主要用于解决hiseq-count环节时定量结果都为0的情况。
ENSG00000000003 0
ENSG00000000005 0
ENSG00000000419 0
ENSG00000000457 0
ENSG00000000460 0
ENSG00000000938 0
ENSG00000000971 0
ENSG00000001036 0
ENSG00000001084 0
ENSG00000001167 0
ENSG00000001460 0
ENSG00000001461 0
ENSG00000001497 0
ENSG00000001561 0
ENSG00000001617 0
ENSG00000001626 0
ENSG00000001629 0
ENSG00000001630 0
ENSG00000001631 0
ENSG00000002016 0
【问题链接】https://www.bioinfo.info/?/question/462
另附他人总结的gff文件的四种下载方法:
https://blog.csdn.net/u011262253/article/details/89363809
3.hiseq结果文件解读
结果文件分为2列,第一列是基因名称(ENSMUSG00000000001.4),第二列是统计得到的reads数。
在文件的结尾会有汇总信息。
__no_feature 42987809 #不能对应到任何单位类型的reads数
__ambiguous 183025 #不能判断落在那个单位类型的reads数
__too_low_aQual 0 #低于-a设定的reads mapping质量的reads数
__not_aligned 0 #存在于SAM文件,但没有比对上的reads数
__alignment_not_unique 0 #比对到多个位置的reads数
接着下一步我们会对reads进行进一步的分析整合。
具体参见链接:https://www.jianshu.com/p/d8d5e0b2e33b
4.linux下创建新的文件

touch 新文件名.sh
5.HTSeq的安装指南
【参考官网的安装指南】
https://htseq.readthedocs.io/en/release_0.11.1/install.html#installation-on-linux
我的安装平台为buntu64位
python版本为2.7.1
所以本次安装采用的指令为
sudo apt-get install build-essential python2.7-dev python-numpy python-matplotlib python-pysam python-htseq
安装成功!
之前参考一些人的笔记,尝试过很多办法都不能解决。
https://www.cnblogs.com/triple-y/p/9338890.html
http://blog.sina.com.cn/s/blog_68ddca510102wts6.html
但是在这个过程中报错如:
symlinking folders for python2
Could not import 'setuptools', falling back to 'distutils'.
Traceback (most recent call last):
File "setup.py", line 200, in <module>
**kwargs
File "/usr/lib/python2.7/distutils/core.py", line 111, in setup
_setup_distribution = dist = klass(attrs)
File "/usr/lib/python2.7/distutils/dist.py", line 259, in __init__
getattr(self.metadata, "set_" + key)(val)
File "/usr/lib/python2.7/distutils/dist.py", line 1220, in set_requires
distutils.versionpredicate.VersionPredicate(v)
File "/usr/lib/python2.7/distutils/versionpredicate.py", line 113, in __init__
raise ValueError("expected parenthesized list: %r" % paren)
ValueError: expected parenthesized list: '>=0.9.0'
包括也尝试过在windows下的pip指令。(据说htseq是不能用在windows平台上的)
C:\Users\Administrator>pip install HTSeq
Collecting HTSeq
Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0x000002074F161A20>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/htseq/
Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0x000002074F161908>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/htseq/
Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0x000002074F161EF0>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/htseq/
Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0x000002074F161518>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/htseq/
Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0x000002074F161550>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/htseq/
Could not find a version that satisfies the requirement HTSeq (from versions: )
No matching distribution found for HTSeq
You are using pip version 18.0, however version 19.1.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.
6.处理bam文件的时候遇到"file may be truncated"的错误。
【错误屏显】
Error occured when reading beginning of SAM/BAM file.
no BGZF EOF marker; file may be truncated
【对bam文件是否完整的诊断方案】
samtools view 42_align_sorted.bam|tail
参考链接:https://www.jianshu.com/p/c6dd7edd6e80
【猜测出现这种情况的可能原因】
(1)生成文件的过程中,突然中断指令。
(2)在文件传输的过程中,为传输完整。(我是在用u盘拷贝文件时拷贝不完全。)