Django 开发连载 - 与 Elasticsearch 的交互

给 CSDN 博客建立一个全文索引应用

–2018.08.11

首先要解决的问题是，Python 访问 ElasticSearch 数据库的接口
在 Django 的网页架构基础上，将用户请求发送给 ElasticSearch,返回结果
需要保存用户每一次的搜索关键字
提供并发可靠保证

Python 下可以与 ElasticSearch 交互的客户端有两个:

elasticsearch-py
elasticsearch-dsl

elasticsearch-dsl 是建立在 elasticsearch-py 之上的，相比之下，更加符合 python 使用者的习惯
elasticsearch-py 更加灵活和易于扩展。

Since I was using Django — which is written in Python — it was easy to interact with ElasticSearch. There are two client libraries to interact with ElasticSearch with Python. There’s ++elasticsearch-py++, which is the official low-level client. And there’s ++elasticsearch-dsl++, which is build upon the former but gives a higher-level abstraction with a bit less functionality.

elasticsearch-py 的用法如下：

from datetime import datetime
from elasticsearch import Elasticsearch
es = Elasticsearch()

doc = {
    'author': 'kimchy',
    'text': 'Elasticsearch: cool. bonsai cool.',
    'timestamp': datetime.now(),
}
res = es.index(index="test-index", doc_type='tweet', id=1, body=doc)
print(res['result'])

res = es.get(index="test-index", doc_type='tweet', id=1)
print(res['_source'])

es.indices.refresh(index="test-index")

res = es.search(index="test-index", body={"query": {"match_all": {}}})
print("Got %d Hits:" % res['hits']['total'])
for hit in res['hits']['hits']:
    print("%(timestamp)s %(author)s: %(text)s" % hit["_source"])

查询部分更加接近于 ElasticSearch DSL 原始的语法

在 Kibana 中调试 ElasticSearch 查询的时候，通常我们会使用 ElasticSearch 文档中所教授的语法，这种语法拿来在 elasticSearch-py 下直接可以运行：

from elasticsearch import Elasticsearch
client = Elasticsearch()

response = client.search(
    index="my-index",
    body={
      "query": {
        "filtered": {
          "query": {
            "bool": {
              "must": [{"match": {"title": "python"}}],
              "must_not": [{"match": {"description": "beta"}}]
            }
          },
          "filter": {"term": {"category": "search"}}
        }
      },
      "aggs" : {
        "per_tag": {
          "terms": {"field": "tags"},
          "aggs": {
            "max_lines": {"max": {"field": "lines"}}
          }
        }
      }
    }
)

for hit in response['hits']['hits']:
    print(hit['_score'], hit['_source']['title'])

for tag in response['aggregations']['per_tag']['buckets']:
    print(tag['key'], tag['max_lines']['value'])

elasticsearch-dsl 的用法如下：用函数来封装了一层 DSL

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search

client = Elasticsearch()

s = Search(using=client, index="my-index") \
    .filter("term", category="search") \
    .query("match", title="python")   \
    .exclude("match", description="beta")

s.aggs.bucket('per_tag', 'terms', field='tags') \
    .metric('max_lines', 'max', field='lines')

response = s.execute()

for hit in response:
    print(hit.meta.score, hit.title)

for tag in response.aggregations.per_tag.buckets:
    print(tag.key, tag.max_lines.value)

如此看来， elasticsearch-py 提供了全套基础建设，包括 DSL，而 elasticsearch-dsl 只是对其中的检索功能做了封装，而本身还依赖于 elasticsearch-py 提供的底层框架

try:
    import _pickle as pickle
except ImportError:
    import pickle
import os
from elasticsearch import Elasticsearch

class loadJson(object):
    def loadAllFiles(self,path):
        localPath = os.fsencode(path)
        for file in os.listdir(localPath):
            filename = path+os.fsdecode(file)
            filehandler = open(filename,'rb')
            jsonObj = pickle.load(filehandler)
            filehandler.close()
            self.saveToElasticSearch(jsonObj)
            print(jsonObj)

    def saveToElasticSearch(self,doc):
        es = Elasticsearch("http://192.168.1.112:9200")
        es.index(index="csdnblog",doc_type="CSDNPost",body=doc)


utlLoader = loadJson()
utlLoader.loadAllFiles("G:\\SideProjects\\CSDN_Blogs\\PostThread\\")

上面的代码，作用是将我从 CSDN 中爬取的 Blog 保存为本地 Json 文件之后，反序列化这些 Json 文件，最终存入 ElasticSearch 做全文索引。

扫描二维码关注公众号，回复： 3553558 查看本文章

安装 ElasticSearch 客户端

在实现上述的功能之前，我们还必须在 virtualenv 下建立的Django 中安装 ElasticSearch 客户端。

定位到 virtualenv 目录，激活 virtualenv 环境，安装 elasticsearch 客户端：

activate.bat
pip3 install elasticsearch
pip3 list

安装完毕之后，使用 pip3 list 来查看已经安装的包.
此时安装的便是低层次的 elasticsearch 客户端，接近于 elasticsearch DSL 语法的客户端，而 elasticsearch-dsl 便是基于这个库二次开发的库。安装的时候加上后缀名 -dsl便可：

pip3 install elasticsearch-dsl

提供一个访问 elasticsearch 的入口

之前的 Django 项目，我们在 SqlHub 下顺利可以实现请求视图函数之间的联动。以此为基础，在 Index.html 中增加一个表单，指向即将新建的视图函数，用来返回从 elaticsearch 请求的结果。

关键点是在 SqlHub\Index.html 中创建动作 FullTextSearch 以及在 views.py 中配置好动作的视图函数 fulltextsearch，使其可将结果展现。

创建搜索表单

<form action="/SqlHub/FullTextSearch" method = "post">

{% csrf_token %}

    Search Key Word:<input type = text name = keyword><br>

    <input type = submit>

</form>

参考文章：

https://medium.freecodecamp.org/elasticsearch-with-django-the-easy-way-909375bc16cb

该文告诉我们的是如何使用 elasticsearch-dsl 实现 CRUD 的操作，并且 Django 项目中无需配置 elasticsearch ，仅需要安装 elasticsearch 库并正确引用即可。

在这里我只是做了一个参考，因此本次使用的是纯正的 elasticsearch-py 版本。

https://elasticsearch-py.readthedocs.io/en/master/

这是 elasticsearch Python 客户端的官方文档。可以找到一切有关 Python 访问 elasticsearch 的方法

实现简单的 elasticsearch 全文索引的视图函数

该视图函数接收用户提交的请求，并将该请求丢给 elasticsearch 处理，接收到结果后，调用 elasticsearch 展现界面( es.html) 来展示此次请求的结果

from django.shortcuts import render_to_response, render
from SqlHub.models import SqlNew
from django.template import RequestContext
from django.http import HttpResponseRedirect
import time
import datetime
from elasticsearch import Elasticsearch

def archive(request):
    posts = SqlNew.objects.all()
    curtime = datetime.datetime.now()
    context = {"posts": posts, "curtime": curtime}
    return render(request, 'Index.html', context)



def newone(request):
    curtime = datetime.datetime.now()
    oneblog = SqlNew()
    oneblog.title = request.POST["title"]
    oneblog.body = request.POST["body"]
    oneblog.timestamp = curtime
    oneblog.save()
    return HttpResponseRedirect('/SqlHub')



def fulltextsearch(request):

    es = Elasticsearch({"192.168.1.10:9200"})
    ret = es.search(index="csdnblog2"
                    ,body= {
                          "query":{
                            "term":{"pageContent": "cluster"}
                          }
                        }
                    )
    resultback = ret["hits"]["hits"]
    context_rs = {"results":resultback}
    return render(request,'es.html',context_rs)

提供一个展现 elasticsearch 全文索引查询结果的模板

在该模板上也要实现用户提交 elaticsearch 请求的动作。

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Title</title>
</head>
<body>




<form action="/SqlHub/FullTextSearch" method = "post">

{% csrf_token %}

    Search Key Word:<input type = text name = keyword><br>

    <input type = submit>

</form>


{% for item in results %}

{% for key,value in item.items %}
    {% if key == "_source" %}
        {% for key1,value1 in value.items %}
                {% if key1 == "article_url" %}
                    {{ value1 }}<br>
                {% endif %}
        {% endfor %}
    {% endif %}
{% endfor %}

{% endfor %}


</body>
</html>

Django 是无法访问 Python 数据字典的，因此只能用这类方法解决一下。或者将数据字典改为对象。

最终还要配置表单动作与视图函数的映射关系：

from django.urls import path, include
import SqlHub.views

urlpatterns = [        path(r'',SqlHub.views.archive),
                       path(r'New', SqlHub.views.newone),
                       path(r'FullTextSearch', SqlHub.views.fulltextsearch),

               ]

Django 开发连载 - 与 Elasticsearch 的交互

提供一个访问 elasticsearch 的入口

猜你喜欢