parallel processing - OPEN CL, Python and parallelisation -


As a starter in Openclass, I have a simple understanding question to optimize GPU computing.

As far as I can understand that I can make a matrix of 1000 X 1000 and can insert a code on a single pixel using a GPU at the same time. About the following option:

  • I have 100x100 x 100 matrix and they need to be calculated separately. So I should make serial / or I can start 100 instances, that is, I start 100 Python multiprocesses and shoots a matrix calculation for every single GPU (assumption is enough resources).

  • On the other hand, I have a matrix of 1000 x 1000 and 100 different examples to calculate, can I do it at the same time or as a serial processing ?

    How to resolve any advice or concept in the fastest way.

    Thanks Adrian

    The OpenCL execution model revolves around the kernel, which is executed only for each point in your problem domain. When you launch a kernel for execution on your OpenCL device, you define the 1, 2, or 3-dimensional index space for this domain (aka NDRAN or Global Work Size). It is entirely up to you that you can map NDRAG to your real problem. For example, you can launch 100x100x100 100 NMRX to process 100x100 matrix (assuming they are all free). . Your colon then defines the calculation for one element of one of these matrix. Alternatively, you can launch 100 kernels, each with 100x100 NDRange to achieve the same thing. The former is probably fast, because it avoids the launch of several kernels.

    I strongly recommend that you get more information about OpenCL performance models, in particular, in section 3.2, there is a good description of key concepts surrounding kernel execution.

Comments

Popular posts from this blog

Java - Error: no suitable method found for add(int, java.lang.String) -

java - JPA TypedQuery: Parameter value element did not match expected type -

c++ - static template member variable has internal linkage but is not defined -