python : R markdown 格式转换为 html

R语言 rmarkdown包会调用 pandoc-1.19.2 ，

library(rmarkdown)
render("test.Rmd", "html_document")

试用了，发现转换后的 HTML 视觉效果很差。

决定自己用 python 写一个 R markdown 格式转换为 html

Rmd2htm.py

# -*- coding: utf-8 -*-
import os, sys 
import re

if len(sys.argv) ==2:
    f1 = sys.argv[1]
else:
    print 'usage: Rmd2htm.py file1.Rmd '
    sys.exit(1)

if not os.path.exists(f1):
    print 'Error: %s not found\n' % f1
    sys.exit(1)

fn,ext =os.path.splitext(f1)
if ext != '.Rmd' and ext !='.rmd':
    print 'Error: %s is not .Rmd' % ext
    sys.exit(1)

head = """<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<style type="text/css">
code {
  color: inherit;
  background-color: rgba(0, 0, 0, 0.05);
}
</style>
</head>
<body>
"""
foot = """
</body>
</html>
"""

fp = open(f1, 'r')
f2 = fn +'.htm'
fp2 = open(f2, 'w')
fp2.write(head)
iscode = False
for line in fp:
    aline = line.strip()
    if len(aline) ==0: fp2.write("<p>\n")
    elif aline.startswith("---"): fp2.write("<hr/>\n")
    elif aline.startswith("```{r"): fp2.write("<code><pre>\n");iscode=True
    elif aline.startswith("```"):
        if iscode: fp2.write("</pre></code>\n");iscode=False
        else: fp2.write("<code><pre>\n");iscode=True
    elif iscode: fp2.write(line)
    else:
        if  aline.startswith("######"): aline = "<h6>"+aline[6:]+" </h6>"
        elif aline.startswith("#####"): aline = "<h5>"+aline[5:]+" </h5>"
        elif aline.startswith("####"): aline = "<h4>"+aline[4:]+" </h4>"
        elif aline.startswith("###"): aline = "<h3>"+aline[3:]+" </h3>"
        elif aline.startswith("##"): aline = "<h2>"+aline[2:]+" </h2>"
        elif aline.startswith("#"): aline = "<h1>"+aline[1:]+" </h1>"
        elif aline.startswith("+"): aline = "<li>"+aline[1:]+" </li>"
        elif aline.startswith("-"): aline = "<li>"+aline[1:]+" </li>"
        elif aline.startswith("**"):
            aline = "<strong>"+aline[2:].replace("**","</strong>",1)
            if aline.find("`",0) > -1:
                aline = aline.replace("`r","<code>").replace("`","</code>")
        elif aline.startswith("*"): aline = "<li>"+aline[1:]+" </li>"
        elif aline.find("*",0) > -1:
            i = aline.find("*",0)
            j = aline.find("*",i+1)
            aline = aline.replace("*","<em>",1)
            if j>i: aline = aline.replace("*","</em>",1)
        else: 
            if aline.find("`",0) > -1:
                aline = aline.replace("`r","<code>").replace("`","</code>")
        if aline.startswith("<h") or aline.startswith("<li>"): fp2.write(aline+"\n")
        else: fp2.write(aline+"<br>\n")
#
fp.close()
fp2.write(foot)
fp2.close()
print f2

运行 cmd

Rmd2htm.py test.Rmd

测试样例来自 [R in Action, 2nd] 第22章用 R 和 Markdown 创建动态报告

# Regression Report
---
```{r echo=FALSE, results='hide'}
n <- nrow(women)
fit <- lm(weight ~ height, data=women)
sfit <- summary(fit)
b <- coefficients(fit)
```
Linear regression was used to model the relationship between
weights and height in a sample of *n* women. The equation
**weight = `r b[[1]]` + `r b[[2]]` * height**
accounted for `r round(sfit$r.squared,2)`% of the variance
in weights. The ANOVA table is given below.

---
```{r echo=FALSE, results='asis'}
library(xtable)
options(xtable.comment=FALSE)
print(xtable(sfit), type="html", html.table.attributes="border=0")
```
---
The regression is plotted in the following figure.

```{r echo=FALSE, fig.width=5, fig.height=4}
library(ggplot2)
ggplot(data=women, aes(x=height, y=weight)) +
	geom_point() + geom_smooth(method="lm")
```

彩蛋: python : convert markdown to html

belldeep

发布了106 篇原创文章 · 获赞 27 · 访问量 33万+

私信关注

python : R markdown 格式转换为 html

猜你喜欢