performance - NVidia CUDA Thrust device vector allocation is too slow -

August 15, 2011

Does anyone know why the vector allocation on the device takes a lot more to the first part compiled in debug mode? In my special case (NVIDIA Quadro 3000M, Kuda Toolkit 6.0, Windows 7, MSVC2, 2010) debug compiled version takes the first run more than 40 seconds, the next (no recombination) takes 10 times less ( Vector allocation on device for release version 1 second).

  #include & lt; Emphasis / host_vector.h & gt; #include & lt; Thrust / device_vector.h & gt; #include & lt; Thrust / generate.h> # Include & lt; Emphasis / order & Gt; # Include & lt; Emphasis / Copy H & gt; # Include & lt; Cstdlib & gt; # Include & lt; Ctime & gt; Int main (zero) {clock_t t; T = clock (); Emphasis: host_vector & lt; Int & gt; H_vec (100); Clock_t dt = clock () - t; Printf ("Allocation on host -% f seconds. \ N", (float) DT / CLOCKS_PER_SEC); T = clock (); Emphasis: generated (h_vec.begin (), h_vec.end (), rand); DT = clock () - T; Printf ("initialization on host -% f seconds. \ N", (float), DT / CLOCKS_PER_SEC); T = clock (); Emphasis: device_vector & lt; Int & gt; D_vec (100); // The first run for debug compiled version takes more than 40 seconds ... dt = clock () - t; Printf ("Allocation on device -% f seconds. \ N", (float) DT / CLOCKS_PER_SEC); T = clock (); D_vec [0] = H_wec [0]; DT = clock () - T; Printf ("Copy to a device -% f seconds. \ N", (float) DT / CLOCKS_PER_SEC); T = clock (); D_vec = h_vec; DT = clock () - T; Printf ("Copy All Devices -% F Seconds. \ N", (Float) DT / CLOCKS_PER_SEC); T = clock (); Thrust: sort (d_vec.begin (), d_vec.end ()); DT = clock () - T; Printf ("Sort on device -% f seconds. \ N", (float) DT / CLOCKS_PER_SEC); T = clock (); Emphasis :: copy (d_vec.begin (), d_vec.end (), h_vec.begin ()); DT = clock () - T; Printf ("Copy to host -% f seconds. \ N", (float), DT / CLOCKS_PER_SEC); T = clock (); For (Int i = 0; I <10; i ++) printf ("% d \ n", h_vec [i]); DT = clock () - T; Printf ("Output -% f seconds. \ N", (Float) DT / CLOCKS_PER_SEC); Std :: cin.ignore (); Return 0; }   The vector you are measuring for instant, the time is not the vector The cost of allocation and initialization, it is the CUDA runtime and the upper cost associated with the driver. I think that if you changed your code like this:  
  int main (zero) {clock_t t; .... Kudapri (0); // This force reference establishment and lazy runtime overheads T = clock (); Emphasis: device_vector & lt; Int & gt; d_vec (100); // The first run for debug compiled version takes more than 40 seconds ... dt = clock () - t; Printf ("Allocation on device -% f seconds. \ N", (float) DT / CLOCKS_PER_SEC); .....    You should see that when you measure the allocation of vectors between first and second runs, then it becomes the same even though the wall The clock time shows a big difference.  
 I do not have a good explanation why there is such a big difference in startup time between first and second runs, but if I used to guess, then that at some driver level, JIT reunited at first Runs on the bar, and the driver caches the code for later runs. One thing to check is that you are compiling the code for the right architecture for your GPU, which will eliminate the re-assembly of the driver as the source of difference of time.  
 The nvprof utility can provide you an API trace and time. You can see where it is run and the API is generated by the difference in timing in the call sequence. It is not beyond the scope of the possibility that you are seeing the effect of some type of driver bug, but it is impossible to say without much information.   

 



















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment



Popular posts from this blog




Java - Error: no suitable method found for add(int, java.lang.String) -






April 15, 2013








    I'm in the middle of homework work and I'm stuck. At this point in my code, I think that I should have a GUI window that opens and allows me to type "inserted text number". Notice is not going anywhere but at this point, If I pass through the problem then it will be in a linked list. I am getting two of the same error for the lines. Add (index, element); And I can not seem to get past it, there is no suitable method for the error "add (int, java. string string)". Code is below, please advise. To clarify - this will not be a method error because it is a linked list. There should not be any way involved.    import java.awt. *; Import java.awt.event. *; Import javax.swing. *; Import java.util. *; Import java.util.Scanner; Import java.util.LinkedList; Public class TopTenList JFrame {Private TopTenList tt; See the Private JTextArea list; Private JTextField cmdTextField; Private JTextField resultsTestfield; // This GUI window is the code for the public toptist...





Read more





java - JPA TypedQuery: Parameter value element did not match expected
type -






January 15, 2015








I am using JPA 2.0 and getting the following code in the DAO layer:    Public Zero Test () {string key = "status"; String [] Conditions = {"A", "B"}; & Lt; TestTable & gt; Results = Search (Keys, Conditions); } Public listing & lt; TestTable & gt; Search (string key, object value error) {string sql = "test ndf to testlet and jade nand." + Key + "in:" + key; TypedQuery & LT; TestTable & gt; Query = entityManager.createQuery (SQL, TestTable.class); Query.setParameter (key, Arrays.asList (valueArr)) / *** Error Line *** Return query.getResultList (); }    In the above error line, it throws the following exception:    java.lang.IllegalArgumentException: parameter value element [[Ljava.lang.String; @ cbe5bc] did not match the expected type [java.lang.String]    Why is this expected type string while actually this string []? Please help!   Note: This is derived from the normal routine and is a simplified code. I can no...





Read more





c++ - static template member variable has internal linkage but is not
defined -






March 15, 2013








    Yes, I know, there is a question with almost the same title, but it refers to a different situation. Error message) In my case, I have a  .cpp  file with a big named designation name (implementation details). There is a property class template with a static data member in that name space, which I need to access from outside the unknown namespace. I give it a little bit of meat:    file.hpp namespace Bar {template & lt; Typename A & gt; Struct foo {static_assert (is_same & lt; a, float & gt; :: value} is_same & lt; a, double>: value, ""); Fixed zero set_static_var (a const & x); // ...}; }    and    file.cpp namespace {template & lt; Typename A & gt; Struct foo_traits {// supports the implementation of several static code bars: foo & lt; & Gt; Fixed A data; }; The template's & lt; & Gt; Float foo_traits & lt; Float & gt; :: datum; // No change if this is in the global namespace template & lt; & Gt; Doub...





Read more

Search This Blog

SET RT

performance - NVidia CUDA Thrust device vector allocation is too slow -

Comments

Post a Comment

Popular posts from this blog

Java - Error: no suitable method found for add(int, java.lang.String) -

java - JPA TypedQuery: Parameter value element did not match expected type -

c++ - static template member variable has internal linkage but is not defined -