Skip to content

Test fail on GTX 1080Ti with CUDA_ERROR_OUT_OF_MEMORY  #83

@Jiede1

Description

@Jiede1

The environment is Centos7.4 with Cuda9.0 and one GeForce GTX 1080Ti.

- Run map + reduce on datasets with 100,000,000 elements - multiple partitions
- Run map + map + reduce on datasets - multiple partitions
- Run map + map + map + collect on datasets
- Run map + map + map + reduce on datasets - multiple partitions
- Run map on dataset with a single primitive array column
- Run map with free variables on dataset with a single primitive array column
- Run reduce on dataset with a single primitive array column
- Run map & reduce on a single primitive array in a structure *** FAILED ***
  jcuda.CudaException: CUDA_ERROR_OUT_OF_MEMORY
  at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:312)
  at jcuda.driver.JCudaDriver.cuCtxCreate(JCudaDriver.java:1444)
  at com.ibm.gpuenabler.GPUSparkEnv$.get(GPUSparkEnv.scala:143)
  at com.ibm.gpuenabler.CUDADSFunctionSuite$$anonfun$47.apply$mcV$sp(CUDADSFunctionSuite.scala:743)
  at com.ibm.gpuenabler.CUDADSFunctionSuite$$anonfun$47.apply(CUDADSFunctionSuite.scala:740)
  at com.ibm.gpuenabler.CUDADSFunctionSuite$$anonfun$47.apply(CUDADSFunctionSuite.scala:740)
  at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)
  ...
- Run logistic regression *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 1.0 failed 1 times, most recent failure: Lost task 5.0 in stage 1.0 (TID 13, localhost, executor driver): jcuda.CudaException: CUDA_ERROR_INVALID_CONTEXT
        at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:312)
        at jcuda.driver.JCudaDriver.cuModuleLoadData(JCudaDriver.java:2014)
        at com.ibm.gpuenabler.CUDAManager$$anonfun$cachedLoadModule$1.apply(CUDAManager.scala:102)
        at com.ibm.gpuenabler.CUDAManager$$anonfun$cachedLoadModule$1.apply(CUDAManager.scala:87)
        at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194)
        at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)
        at com.ibm.gpuenabler.CUDAManager.cachedLoadModule(CUDAManager.scala:87)
        at com.ibm.gpuenabler.CUDAManager.getModule(CUDAManager.scala:62)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$JCUDAIteratorImpl.processGPU(Unknown Source)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$JCUDAIteratorImpl.hasNext(Unknown Source)
        at com.ibm.gpuenabler.MAPGPUExec$$anonfun$doExecute$1.apply(CUDADSUtils.scala:152)
        at com.ibm.gpuenabler.MAPGPUExec$$anonfun$doExecute$1.apply(CUDADSUtils.scala:73)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions