python - numpy: copying value defaults on integer indexing vs boolean indexing -


I have recently started studying McKinney Python for data analysis. It slipped into the book:

Array slices are considered on the original array. This means the data is not copied and any changes in the view will appear in the source array ... As NumPy is built in the brain with a large data usage case, you can visualize performance and memory problems if NumPy It's okay to force data to copy.

OK looks like a smart design option but two pages later say this:

Boolean indoxing will always choose the selection of data from an array Copies the data, even if the returned array is unchanged.

Wait, what? In addition,

You can mix and match the Bullying Array with slices ... For example data [name == 'Bob', 2:]

What will he come back now? A look at the copy of the data? And why is this behavior like this? Coming from R, I look at the techniques used equally for boolean indexing and location-based indexing. If the copy is designed to avoid copying, then what does this design choice play?

Thank you.

Let's assume a 1D array, data in memory will look something like this:

  10 | 11 | 12 | 13. 14 | 15 | 16   

Index is trivial to reach an element by the index. Just take the position of the first element, and jump n , so for the arr [2] :

  10 | 11 | 12 | 13. 14 | 15 | 16 ^   

I can achieve status in memory with only one multiplication fast and easy.

I can do a piece, and "only take arr2 = arr [2: -1] ":

  10 | 11 | 12 | 13. 14 | 15 | 16 ^ ---- ^ ---- ^ ---- ^   

Now, the memory layout is very similar. Getting an element is multiplied by a new starting point arr2 [1] : 10 | 11 | 12 | 13. 14 | 15 | 16 (undiscovered) ----- ^ ----------

You can do a great trick, and arr3 = arr [:: 2 ] , move all the elements while jumping on one.

  10 | 11 | 12 | 13. 14 | 15 | 16 ^ --------- ^ --------- ^ --------- ^   

Again, get the index of the code To do it> AR3 is very simple: just have to multiply, but now the size is big, what is the progress, they tell you the size of the block and how to get an element based on the sequencing. Strends are even more powerful in more dimensions, by the way we can change memory (1D) in matrix (2D).

Now, we go to Boolean Areas. If my mask is: T F T T F F T and I ask you for the third element, you will need to transfer the mask, it will know which third is true, and then get the index; Thus, very slow, when we take a boolean mask, we have to copy the data.

As a side note, sometimes, the cost of copying is equal to the displayable, yet as a copy, if you read "every fifth element of the array" If you want to do many operations, the data in memory will not be aligned, so the CPU will have to wait to receive it each time. It will then be faster (continuous) to make a copy, and work with it.

Comments

Popular posts from this blog

Java - Error: no suitable method found for add(int, java.lang.String) -

java - JPA TypedQuery: Parameter value element did not match expected type -

c++ - static template member variable has internal linkage but is not defined -