Neo4j: Worker for session..crashed. Java Heap Space OutOfMemoryError

apache-spark neo4j

167 观看

1回复

1368 作者的声誉

Spark job I am running:

It is a pretty simple program that was converted to scala from java and 'parallelized' (it was not intended to be ran in parallel but is an experiment to a) learn spark and neo4j and b) see if i can get some speed gains just by running on a spark cluster with more nodes doing more work). The reason being is the big bottle neck is a spatial call within the neo4j cypher script (a withinDistance call). The test set of data is pretty small 52,000 nodes and about 140 mb size of a database.

Also when neo4j starts up it gives me a warning of

Starting Neo4j.
WARNING: Max 4096 open files allowed, minimum of 40000 recommended. See the Neo4j manual.
/usr/share/neo4j/bin/neo4j: line 411: /var/run/neo4j/neo4j.pid: No such file or directory

Which is strange since I believe that is open files and I asked the system admin to set that to way higher? (ulimit -Hn seems to confirm this? says 90,000 though a ulimit -a shows open files at 4096 (softlimit) I guess that is what neo4j sees and whines about)

Also when I ran this locally on my mac os X. The software would run and execute for about 14 hours or so (maybe 9) and then i would see in the console that the database would just stop talking to the spark. It wasn't down or anything like the job would time out and I could still cypher-shell into the database. But it would somehow lose connection to the spark jobs so they would try and finally the spark submit would just give up and stop.

C02RH2U9G8WM:scala-2.11 little.mac$ ulimit -Hn
unlimited

(also since last edit I even upped my limits more in the neo4j conf, now with 4gb max memory for heap sizes)

Some code bits from the job (using the ported code to scala with added spark dataframes. I know it is not properly paralleled but was hoping to get something to work before pressing forward.). I was building a hybrid program that was like the code in java I ported but using dataframes from spark (connected to neo4j).

Essentially (pseudo code):

 while (going through all these lat and lons) 
{
    def DoCalculation()
    {

         val noBbox="call spatial.bbox('geom', {lat:" + minLat +",lon:"+minLon +"}, {lat:"+maxLat+",lon:" + maxLon +"}) yield node return node.altitude as altitude, node.gtype as gtype, node.toDateFormatLong as toDateFormatLong, node.latitude as latitude, node.longitude as longitude, node.fromDateFormatLong as fromDateFormatLong, node.fromDate as fromDate, node.toDate as toDate ORDER BY node.toDateFormatLong DESC";           
         try {
                  //not overly sure what the partitions and batch are really doing for me.
                  val initialDf2 = neo.cypher(noBbox).partitions(5).batch(10000).loadDataFrame

                     val theRow = initialDf2.collect() //was someStr

                     for(i <- 0 until theRow.length){
                            //do more calculations

                        var radius2= 100
                        //this call is where the biggest bottle neck is,t he spatial withinDistance is where i thought
                        //I could put this code ons park and make the calls through data frames and do the same long work
                        //but by batching it out to many nodes would get more speed gains?

                        val pointQuery="call spatial.withinDistance('geom', {lat:" + lat + ",lon:"+ lon +"}, " + radius2 + ") yield node, distance WITH node, distance match (node:POINT) WHERE node.toDateFormatLong < " + toDateFormatLong + " return node.fromDateFormatLong as fromDateFormatLong, node.toDateFormatLong as toDateFormatLong";    
                        try {

                            val pointResults = neo.cypher(pointQuery).loadDataFrame; //did i need to batch here?
                            var prRow = pointResults.collect();     
                            //do stuff with prRow loadDataFrame         
                        } catch {
                             case e: Exception => e.printStackTrace
                        }
                        //do way more stuff with the data just in some scala/java datastructures
                    }
                } catch {
                  case e: Exception => println("EMPTY COLLECTION")
          }
    }
}

Running a spark-submit job that useses the spark connector to connect to Neo4j I get these errors in /var/log/neo4j/neo4j.log

java.lang.OutOfMemoryError: Java heap space
2017-12-27 03:17:13.969+0000 ERROR Worker for session '13662816-0a86-4c95-8b7f-cea9d92440c8' crashed. Java heap space
java.lang.OutOfMemoryError: Java heap space
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1855)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2068)
        at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
        at org.neo4j.bolt.v1.runtime.concurrent.RunnableBoltWorker.run(RunnableBoltWorker.java:88)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
        at org.neo4j.helpers.NamedThreadFactory$2.run(NamedThreadFactory.java:109)
2017-12-27 03:17:23.244+0000 ERROR Worker for session '75983e7c-097a-4770-bcab-d63f78300dc5' crashed. Java heap space
java.lang.OutOfMemoryError: Java heap space

I know that the neo4j.conf file I can change the heapsizes (currently commented out but set to 512m) the thing that I am asking is what it says in the conf file:

# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size.

So doesn't this mean I should leave alone the heapsizes here int he conf if they are calculated to surely be more than what I can set? (these machines have 8cores and 8gb ram). Or would specifically setting these really help? maybe to 2000 (if its in megabytes), to get two gig? I ask because I feel the error log file is giving this out of memory error but it really is for a different reason.

EDIT my jvm values from the debug.log

BEFORE:

2017-12-26 16:24:06.768+0000 INFO [o.n.k.i.DiagnosticsManager] NETWORK
2017-12-26 16:24:06.768+0000 INFO [o.n.k.i.DiagnosticsManager] System memory information:
2017-12-26 16:24:06.771+0000 INFO [o.n.k.i.DiagnosticsManager] Total Physical memory: 7.79 GB
2017-12-26 16:24:06.772+0000 INFO [o.n.k.i.DiagnosticsManager] Free Physical memory: 5.49 GB
2017-12-26 16:24:06.772+0000 INFO [o.n.k.i.DiagnosticsManager] Committed virtual memory: 5.62 GB
2017-12-26 16:24:06.773+0000 INFO [o.n.k.i.DiagnosticsManager] Total swap space: 16.50 GB
2017-12-26 16:24:06.773+0000 INFO [o.n.k.i.DiagnosticsManager] Free swap space: 16.49 GB
2017-12-26 16:24:06.773+0000 INFO [o.n.k.i.DiagnosticsManager] JVM memory information:
2017-12-26 16:24:06.773+0000 INFO [o.n.k.i.DiagnosticsManager] Free  memory: 85.66 MB
2017-12-26 16:24:06.773+0000 INFO [o.n.k.i.DiagnosticsManager] Total memory: 126.00 MB
2017-12-26 16:24:06.774+0000 INFO [o.n.k.i.DiagnosticsManager] Max   memory: 1.95 GB
2017-12-26 16:24:06.776+0000 INFO [o.n.k.i.DiagnosticsManager] Garbage Collector: G1 Young Generation: [G1 Eden Space, G1 Survivor Space]
2017-12-26 16:24:06.776+0000 INFO [o.n.k.i.DiagnosticsManager] Garbage Collector: G1 Old Generation: [G1 Eden Space, G1 Survivor Space, G1 Old Gen]
2017-12-26 16:24:06.777+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Code Cache (Non-heap memory): committed=4.94 MB, used=4.93 MB, max=240.00 MB, threshold=0.00 B
2017-12-26 16:24:06.777+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Metaspace (Non-heap memory): committed=14.38 MB, used=13.41 MB, max=-1.00 B, threshold=0.00 B
2017-12-26 16:24:06.777+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Compressed Class Space (Non-heap memory): committed=1.88 MB, used=1.64 MB, max=1.00 GB, threshold=0.00 B
2017-12-26 16:24:06.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Eden Space (Heap memory): committed=39.00 MB, used=35.00 MB, max=-1.00 B, threshold=?
2017-12-26 16:24:06.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Survivor Space (Heap memory): committed=3.00 MB, used=3.00 MB, max=-1.00 B, threshold=?
2017-12-26 16:24:06.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Old Gen (Heap memory): committed=84.00 MB, used=1.34 MB, max=1.95 GB, threshold=0.00 B
2017-12-26 16:24:06.778+0000 INFO [o.n.k.i.DiagnosticsManager] Operating system information:
2017-12-26 16:24:06.779+0000 INFO [o.n.k.i.DiagnosticsManager] Operating System: Linux; version: 3.10.0-693.5.2.el7.x86_64; arch: amd64; cpus: 8
2017-12-26 16:24:06.779+0000 INFO [o.n.k.i.DiagnosticsManager] Max number of file descriptors: 90000
2017-12-26 16:24:06.780+0000 INFO [o.n.k.i.DiagnosticsManager] Number of open file descriptors: 103
2017-12-26 16:24:06.782+0000 INFO [o.n.k.i.DiagnosticsManager] Process id: 26252@hp380-1
2017-12-26 16:24:06.782+0000 INFO [o.n.k.i.DiagnosticsManager] Byte order: LITTLE_ENDIAN
2017-12-26 16:24:06.793+0000 INFO [o.n.k.i.DiagnosticsManager] Local timezone: Etc/GMT
2017-12-26 16:24:06.793+0000 INFO [o.n.k.i.DiagnosticsManager] JVM information:
2017-12-26 16:24:06.794+0000 INFO [o.n.k.i.DiagnosticsManager] VM Name: OpenJDK 64-Bit Server VM
2017-12-26 16:24:06.794+0000 INFO [o.n.k.i.DiagnosticsManager] VM Vendor: Oracle Corporation
2017-12-26 16:24:06.794+0000 INFO [o.n.k.i.DiagnosticsManager] VM Version: 25.151-b12
2017-12-26 16:24:06.794+0000 INFO [o.n.k.i.DiagnosticsManager] JIT compiler: HotSpot 64-Bit Tiered Compilers
2017-12-26 16:24:06.795+0000 INFO [o.n.k.i.DiagnosticsManager] VM Arguments: [-XX:+UseG1GC, -XX:-OmitStackTraceInFastThrow, -XX:+AlwaysPreTouch, -XX:+UnlockExperimentalVMOptions, -XX:+TrustFinalNonStaticFields, -XX:+DisableExplicitGC, -Djdk.tls.ephemeralDHKeySize=2048, -Dunsupported.dbms.udc.source=rpm, -Dfile.encoding=UTF-8]
2017-12-26 16:24:06.795+0000 INFO [o.n.k.i.DiagnosticsManager] Java classpath:

AFTER:

2017-12-27 16:17:30.740+0000 INFO [o.n.k.i.DiagnosticsManager] System memory information:
2017-12-27 16:17:30.749+0000 INFO [o.n.k.i.DiagnosticsManager] Total Physical memory: 7.79 GB
2017-12-27 16:17:30.750+0000 INFO [o.n.k.i.DiagnosticsManager] Free Physical memory: 4.23 GB
2017-12-27 16:17:30.750+0000 INFO [o.n.k.i.DiagnosticsManager] Committed virtual memory: 5.62 GB
2017-12-27 16:17:30.751+0000 INFO [o.n.k.i.DiagnosticsManager] Total swap space: 16.50 GB
2017-12-27 16:17:30.751+0000 INFO [o.n.k.i.DiagnosticsManager] Free swap space: 16.19 GB
2017-12-27 16:17:30.751+0000 INFO [o.n.k.i.DiagnosticsManager] JVM memory information:
2017-12-27 16:17:30.751+0000 INFO [o.n.k.i.DiagnosticsManager] Free  memory: 1.89 GB
2017-12-27 16:17:30.751+0000 INFO [o.n.k.i.DiagnosticsManager] Total memory: 1.95 GB
2017-12-27 16:17:30.752+0000 INFO [o.n.k.i.DiagnosticsManager] Max   memory: 1.95 GB
2017-12-27 16:17:30.777+0000 INFO [o.n.k.i.DiagnosticsManager] Garbage Collector: G1 Young Generation: [G1 Eden Space, G1 Survivor Space]
2017-12-27 16:17:30.777+0000 INFO [o.n.k.i.DiagnosticsManager] Garbage Collector: G1 Old Generation: [G1 Eden Space, G1 Survivor Space, G1 Old Gen]
2017-12-27 16:17:30.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Code Cache (Non-heap memory): committed=4.94 MB, used=4.89 MB, max=240.00 MB, threshold=0.00 B
2017-12-27 16:17:30.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Metaspace (Non-heap memory): committed=14.38 MB, used=13.42 MB, max=-1.00 B, threshold=0.00 B
2017-12-27 16:17:30.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Compressed Class Space (Non-heap memory): committed=1.88 MB, used=1.64 MB, max=1.00 GB, threshold=0.00 B
2017-12-27 16:17:30.779+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Eden Space (Heap memory): committed=105.00 MB, used=59.00 MB, max=-1.00 B, threshold=?
2017-12-27 16:17:30.779+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Survivor Space (Heap memory): committed=0.00 B, used=0.00 B, max=-1.00 B, threshold=?
2017-12-27 16:17:30.779+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Old Gen (Heap memory): committed=1.85 GB, used=0.00 B, max=1.95 GB, threshold=0.00 B
2017-12-27 16:17:30.779+0000 INFO [o.n.k.i.DiagnosticsManager] Operating system information:
2017-12-27 16:17:30.780+0000 INFO [o.n.k.i.DiagnosticsManager] Operating System: Linux; version: 3.10.0-693.5.2.el7.x86_64; arch: amd64; cpus: 8
2017-12-27 16:17:30.780+0000 INFO [o.n.k.i.DiagnosticsManager] Max number of file descriptors: 90000
2017-12-27 16:17:30.781+0000 INFO [o.n.k.i.DiagnosticsManager] Number of open file descriptors: 103
2017-12-27 16:17:30.785+0000 INFO [o.n.k.i.DiagnosticsManager] Process id: 20774@hp380-1
2017-12-27 16:17:30.785+0000 INFO [o.n.k.i.DiagnosticsManager] Byte order: LITTLE_ENDIAN
2017-12-27 16:17:30.814+0000 INFO [o.n.k.i.DiagnosticsManager] Local timezone: Etc/GMT
2017-12-27 16:17:30.815+0000 INFO [o.n.k.i.DiagnosticsManager] JVM information:
2017-12-27 16:17:30.815+0000 INFO [o.n.k.i.DiagnosticsManager] VM Name: OpenJDK 64-Bit Server VM
2017-12-27 16:17:30.815+0000 INFO [o.n.k.i.DiagnosticsManager] VM Vendor: Oracle Corporation
2017-12-27 16:17:30.815+0000 INFO [o.n.k.i.DiagnosticsManager] VM Version: 25.151-b12
2017-12-27 16:17:30.815+0000 INFO [o.n.k.i.DiagnosticsManager] JIT compiler: HotSpot 64-Bit Tiered Compilers
2017-12-27 16:17:30.816+0000 INFO [o.n.k.i.DiagnosticsManager] VM Arguments: [-Xms2000m, -Xmx2000m, -XX:+UseG1GC, -XX:-OmitStackTraceInFastThrow, -XX:+AlwaysPreTouch, -XX:+UnlockExperimentalVMOptions, -XX:+TrustFinalNonStaticFields, -XX:+DisableExplicitGC, -Djdk.tls.ephemeralDHKeySize=2048, -Dunsupported.dbms.udc.source=rpm, -Dfile.encoding=UTF-8]
2017-12-27 16:17:30.816+0000 INFO [o.n.k.i.DiagnosticsManager] Java classpath:

Just an fyi, and I still seem to get java heap errors. These machines (not for production just dev) have only 8gb each

作者: Codejoy 的来源 发布者: 2017 年 12 月 27 日

回应 1


1

22318 作者的声誉

We usually recommend setting these yourself. You can check your debug.log file for the logs during startup, that can report the values it chose to use as default. You're looking for an excerpt like this:

JVM memory information:
Free  memory: 204.79 MB
Total memory: 256.00 MB
Max   memory: 4.00 GB

I believe the Total memory is the initial heap size and Max memory is the max heap size.

When setting this yourself, we usually recommending keeping the initial and max set to the same value. Here's a knowledge base article on estimating initial memory configuration that may be helpful.

If the defaults seem sufficient, then it may be better to look for other areas to optimize, or see if the issue is known on the apache-spark side of things.

作者: InverseFalcon 发布者: 2017 年 12 月 27 日
32x32