Neo4j use Cypher to optimize high-volume nodes removed.

Read quite a few massive introduction deleted node optimized article, write this on a number of details.

There is such a scene: When we do experiments, often need to remove all data from the database and start over. This operation sounds simple, but actually do not like the time thought it would be simple, lower paper record some of my lessons for your reference.

I have come to operate Neo4j database configuration Neo4j Desktop by default, which means the maximum heap memory is 1G.

This article assumes that you have installed the Neo4j APOC library, if not installed, you can see here (https://neo4j-contrib.github.io/neo4j-apoc-procedures/#_installation_with_neo4j_desktop).

Cypher Shell

Herein are performed in Shell Cypher Cypher request. Specific location see Neo4j Desktop screenshot below 

Translator's Introduction: If you want to see where this interface, when using Neo4j Desktop, you must select the "Create a Local Graph" When you create a database, and then after the "Start", and then only to see the Manage button, click before you can see above s Mark 

640.png

Data structure

Use APOC library apoc.periodic.iterate way to create 100 million graph nodes.

neo4j> CALL apoc.periodic.iterate(         "UNWIND range(1, 1000000) as id RETURN id",         "CREATE (:Node {id: id})",         {}       )       YIELD timeTaken, operations       RETURN timeTaken, operations;+-------------------------------------------------------------------------+| timeTaken | operations                                                  |+-------------------------------------------------------------------------+| 8         | {total: 1000000, committed: 1000000, failed: 0, errors: {}} |+-------------------------------------------------------------------------+1 row available after 8249 ms, consumed after another 0 ms

运行完上面的语句后,我们来看一下现在图中有多少个节点:

neo4j> MATCH () RETURN count(*);+----------+| count(*) |+----------+| 1000000  |+----------+1 row available after 0 ms, consumed after another 0 ms

Well, 100 one million nodes create successful, then we have to delete them.

Delete Node

These nodes are deleted the first time using the following query, first find and then delete it.

neo4j> MATCH (n)       DETACH DELETE n;There is not enough memory to perform the current task. Please try increasing "dbms.memory.heap.max_size" in the neo4j configuration (normally in "conf/neo4j.conf" or, if you you are using Neo4j Desktop, found through the user interface) or if you are running an embedded installation increase the heap by using "-Xmx" command line flag, and then restart the >

哦~, 内存溢出了!为什么内存会溢出呢?最好的办法是先把堆存储打印出来,确认一下。

我们可能通过修改Neo4j的配置,使其在内存溢出时将内存内容转储成文件。配置如下:

dbms.jvm.additional=-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/neo4jdump.hprof

如果是使用Neo4j Desktop的,可以找到其Settings标签页进行设置。

640.png

如果我们要看转储出来的内容,就需要使用YourKit或VisualVM这类工具。我使用的是VisualVM, 下图为堆内容的截图:

640.png

 我们的写查询会产生大量的日志,而这些日志需要用命令进行保存,而堆中的大部分空间都被这此保存命令所占用。

批处理删除

如果我们成批的删除节点,这样在内存中就不会有那么多的命令了,这是个好办法。而apoc.periodic.iterate正是用于批处理执行语句的,我们试一下:

CALL apoc.periodic.iterate(  "MATCH (n) RETURN n",  "DELETE n",  {batchSize: 10000})YIELD timeTaken, operationsRETURN timeTaken, operations

通过下图可以看出来,有时间是可以正常运行的,但有时他仍会占满整个堆内存,造成垃圾回收暂停。

640.png

 通过Neo4j Desktop上的Terminal标签 可以看到debug日志,通过搜索这个日志可以看到所有的垃圾回收器暂停的信息。

$ grep VmPauseMonitorComponent logs/debug.log | tail -n 102019-04-14 16:14:22.377+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=9143, gcTime=4619, gcCount=7}2019-04-14 16:14:28.845+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=6367, gcTime=6451, gcCount=10}2019-04-14 16:14:35.730+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=2131, gcTime=6875, gcCount=12}2019-04-14 16:14:44.455+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=9080, gcTime=4523, gcCount=5}2019-04-14 16:14:46.721+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=6364, gcTime=6449, gcCount=18}2019-04-14 16:15:09.106+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=19938, gcTime=22355, gcCount=28}2019-04-14 16:15:13.288+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=6428, gcTime=4176, gcCount=7}2019-04-14 16:15:17.807+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=4418, gcTime=4515, gcCount=5}2019-04-14 16:16:00.108+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=19724, gcTime=42279, gcCount=40}2019-04-14 16:16:00.209+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=22476, gcTime=10, gcCount=1}

此时使用VisualVM看内存堆的内容,可以看到如下: 

640.png

这次占用空间不是命令了,而是我们要删除的那些节点。 为了避免所有结点都加载到内存中,我们可以使用apoc.periodic.commit 代替apoc.periodic.iterate。而apoc.periodic.commit所需要的查询语句必须带有LIMIT子句,同时还需要包含一个RETURN子句,只要有返回结果,他就会持续迭代下去。

neo4j> CALL apoc.periodic.commit(         "MATCH (n) WITH n LIMIT $limit DELETE n RETURN count(*)",         {limit: 10000}       )       YIELD updates, executions, runtime, batches       RETURN updates, executions, runtime, batches;+------------------------------------------+| updates | executions | runtime | batches |+------------------------------------------+| 1000000 | 100        | 7       | 101     |+------------------------------------------+1 row available after 7540 ms, consumed after another 0 ms

OK,这下所有结点都被顺利删除了。我们可以继续干其他事了。

Product is slightly Library http://www.pinlue.com/ 

Guess you like

Origin blog.51cto.com/14325182/2401356