在numpy中，如何有效列出所有固定大小的子矩阵？（numpy获取矩阵大小）

25-02-04 16

想了解在numpy中，如何有效列出所有固定大小的子矩阵？的新动态吗？本文将为您提供详细的信息，我们还将为您解答关于numpy获取矩阵大小的相关问题，此外，我们还将为您介绍关于C中没有固定大小的字符数组

想了解在numpy中，如何有效列出所有固定大小的子矩阵？的新动态吗？本文将为您提供详细的信息，我们还将为您解答关于numpy获取矩阵大小的相关问题，此外，我们还将为您介绍关于C中没有固定大小的字符数组、NumPy 2d 数组的切片，或者如何从 nxn 数组 (n>m) 中提取 mxm 子矩阵？、python – 在numpy / scipy中为稀疏的矩阵添加一个非常重复的矩阵？、python – 如何有效地将矩阵变换应用于NumPy数组的每一行？的新知识。

本文目录一览：

在numpy中，如何有效列出所有固定大小的子矩阵？（numpy获取矩阵大小）
C中没有固定大小的字符数组
NumPy 2d 数组的切片，或者如何从 nxn 数组 (n>m) 中提取 mxm 子矩阵？
python – 在numpy / scipy中为稀疏的矩阵添加一个非常重复的矩阵？
python – 如何有效地将矩阵变换应用于NumPy数组的每一行？

在numpy中，如何有效列出所有固定大小的子矩阵？（numpy获取矩阵大小）

我有一个任意的NxM矩阵，例如：

1 2 3 4 5 6
7 8 9 0 1 2
3 4 5 6 7 8
9 0 1 2 3 4

我想获得此矩阵中所有3x3子矩阵的列表：

1 2 3       2 3 4               0 1 2
7 8 9   ;   8 9 0   ;  ...  ;   6 7 8
3 4 5       4 5 6               2 3 4

我可以使用两个嵌套循环来做到这一点：

rows,cols = input_matrix.shape
patches = []
for row in np.arange(0,rows - 3):
    for col in np.arange(0,cols - 3):
        patches.append(input_matrix[row:row+3,col:col+3])

但是对于大的输入矩阵，这很慢。 有没有办法用numpy更快地做到这一点？

我已经看过了np.split，但是这给了我不重叠的子矩阵，而我想要所有可能的子矩阵，而不管它们是否重叠。

C中没有固定大小的字符数组

如何在不使用C中的固定长度数组的情况下输入字符数组？
我被赋予C中心字符串的赋值,并被告知不要使用固定大小的数组.

解决方法

创建没有固定大小的数组的唯一方法是使用 malloc,它接受要分配的内存大小(以字节为单位).然后,您将使用它作为char *,它也可以容纳数组语法.不要忘记测试返回值是非零(这是malloc指示您内存不足的方式).

使用完内存后,您将负责将其免费释放回系统.

例如：

size_t size = 42; // you can read this from user input or any other source
char* str = malloc(size);

if (str == 0) {
    printf( "Insufficient memory available\n" );
}
else {
    // Use the memory and then...
    free(str);
}

NumPy 2d 数组的切片，或者如何从 nxn 数组 (n>m) 中提取 mxm 子矩阵？

我想切片一个 NumPy nxn 数组。我想提取该数组的 m 行和列的任意选择（即行/列数中没有任何模式），使其成为一个新的 mxm
数组。对于这个例子，假设数组是 4x4，我想从中提取一个 2x2 数组。

这是我们的数组：

from numpy import *
x = range(16)
x = reshape(x,(4,4))

print x
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

要删除的行和列是相同的。最简单的情况是当我想提取一个位于开头或结尾的 2x2 子矩阵时，即：

In [33]: x[0:2,0:2]
Out[33]: 
array([[0,1],[4,5]])

In [34]: x[2:,2:]
Out[34]: 
array([[10,11],[14,15]])

但是，如果我需要删除另一种行/列混合怎么办？如果我需要删除第一行和第三行/行，从而提取子矩阵[[5,7],[13,15]]怎么办？行/行可以有任何组合。我在某处读到我只需要使用数组/行和列的索引列表来索引我的数组，但这似乎不起作用：

In [35]: x[[1,3],[1,3]]
Out[35]: array([ 5,15])

我找到了一种方法，即：

    In [61]: x[[1,3]][:,3]]
Out[61]: 
array([[ 5,15]])

第一个问题是它很难阅读，尽管我可以忍受。如果有人有更好的解决方案，我当然想听听。

另一件事是我在一个论坛上读到，用数组索引数组会强制 NumPy
制作所需数组的副本，因此在处理大型数组时，这可能会成为一个问题。为什么会这样/这种机制是如何工作的？

python – 在numpy / scipy中为稀疏的矩阵添加一个非常重复的矩阵？

我正在尝试在NumPy / Scipy中实现一个函数来计算单个(训练)向量和大量其他(观察)向量之间的 Jensen-Shannon divergence.观察向量存储在非常大的(500,000×65536) Scipy sparse matrix中(密集矩阵不适合存储器).

作为算法的一部分,我需要为每个观察向量Oi计算T Oi,其中T是训练向量.我无法使用NumPy的常规广播规则找到一种方法,因为稀疏矩阵似乎不支持那些(如果T保留为密集阵列,Scipy会尝试使稀疏矩阵首先密集,哪些运行内存不足;如果我将T变成稀疏矩阵,则T Oi失败,因为形状不一致).

目前,我正在采取将训练向量平铺为500,000×65536稀疏矩阵的非常低效的步骤：

training = sp.csr_matrix(training.astype(np.float32))
tindptr = np.arange(0,len(training.indices)*observations.shape[0]+1,len(training.indices),dtype=np.int32)
tindices = np.tile(training.indices,observations.shape[0])
tdata = np.tile(training.data,observations.shape[0])
mtraining = sp.csr_matrix((tdata,tindices,tindptr),shape=observations.shape)

但是当它只存储~1500个“真实”元素时,它占用了大量的内存(大约6GB).构建起来也很慢.

我试图通过使用stride_tricks使CSR矩阵的indptr变得聪明,数据成员不会在重复数据上使用额外的内存.

training = sp.csr_matrix(training)
mtraining = sp.csr_matrix(observations.shape,dtype=np.int32)
tdata = training.data
vdata = np.lib.stride_tricks.as_strided(tdata,(mtraining.shape[0],tdata.size),(0,tdata.itemsize))
indices = training.indices
vindices = np.lib.stride_tricks.as_strided(indices,indices.size),indices.itemsize))
mtraining.indptr = np.arange(0,len(indices)*mtraining.shape[0]+1,len(indices),dtype=np.int32)
mtraining.data = vdata
mtraining.indices = vindices

但是这不起作用,因为跨步视图mtraining.data和mtraining.indices是错误的形状(根据this answer,没有办法使它成为正确的形状).尝试使用.flat迭代器使它们看起来平坦失败,因为它看起来不像数组(例如它没有dtype成员),并且使用flatten()方法最终制作副本.

有没有办法完成这项工作？

解决方法

你甚至没有考虑过的另一个选择是自己以稀疏格式实现总和,这样你就可以充分利用数组的周期性.如果你滥用scipy的稀疏矩阵的这种特殊行为,这可能很容易做到：

>>> a = sps.csr_matrix([1,2,3,4])
>>> a.data
array([1,4])
>>> a.indices
array([0,1,3])
>>> a.indptr
array([0,4])

>>> b = sps.csr_matrix((np.array([1,4,5]),...                     np.array([0,0]),5])),shape=(1,4))
>>> b
<1x4 sparse matrix of type '<type 'numpy.int32'>'
    with 5 stored elements in Compressed Sparse Row format>
>>> b.todense()
matrix([[6,4]])

因此,您甚至不必在训练向量和观察矩阵的每一行之间寻找巧合来添加它们：只需用正确的指针填充所有数据,并且需要求和的东西将得到求和何时访问数据.

编辑

鉴于第一个代码的速度很慢,您可以按如下方式将内存换成速度：

def csr_add_sparse_vec(sps_mat,sps_vec) :
    """Adds a sparse vector to every row of a sparse matrix"""
    # No checks done,but both arguments should be sparse matrices in CSR
    # format,both should have the same number of columns,and the vector
    # should be a vector and have only one row.

    rows,cols = sps_mat.shape
    nnz_vec = len(sps_vec.data)
    nnz_per_row = np.diff(sps_mat.indptr)
    longest_row = np.max(nnz_per_row)

    old_data = np.zeros((rows * longest_row,),dtype=sps_mat.data.dtype)
    old_cols = np.zeros((rows * longest_row,dtype=sps_mat.indices.dtype)

    data_idx = np.arange(longest_row) < nnz_per_row[:,None]
    data_idx = data_idx.reshape(-1)
    old_data[data_idx] = sps_mat.data
    old_cols[data_idx] = sps_mat.indices
    old_data = old_data.reshape(rows,-1)
    old_cols = old_cols.reshape(rows,-1)

    new_data = np.zeros((rows,longest_row + nnz_vec,dtype=sps_mat.data.dtype)
    new_data[:,:longest_row] = old_data
    del old_data
    new_cols = np.zeros((rows,dtype=sps_mat.indices.dtype)
    new_cols[:,:longest_row] = old_cols
    del old_cols
    new_data[:,longest_row:] = sps_vec.data
    new_cols[:,longest_row:] = sps_vec.indices
    new_data = new_data.reshape(-1)
    new_cols = new_cols.reshape(-1)
    new_pointer = np.arange(0,(rows + 1) * (longest_row + nnz_vec),longest_row + nnz_vec)

    ret = sps.csr_matrix((new_data,new_cols,new_pointer),shape=sps_mat.shape)
    ret.eliminate_zeros()

    return ret

它没有以前那么快,但它可以在大约1秒内完成10,000行：

In [2]: a
Out[2]: 
<10000x65536 sparse matrix of type '<type 'numpy.float64'>'
    with 15000000 stored elements in Compressed Sparse Row format>

In [3]: b
Out[3]: 
<1x65536 sparse matrix of type '<type 'numpy.float64'>'
    with 1500 stored elements in Compressed Sparse Row format>

In [4]: csr_add_sparse_vec(a,b)
Out[4]: 
<10000x65536 sparse matrix of type '<type 'numpy.float64'>'
    with 30000000 stored elements in Compressed Sparse Row format>

In [5]: %timeit csr_add_sparse_vec(a,b)
1 loops,best of 3: 956 ms per loop

编辑此代码非常非常慢

def csr_add_sparse_vec(sps_mat,cols = sps_mat.shape

    new_data = sps_mat.data
    new_pointer = sps_mat.indptr.copy()
    new_cols = sps_mat.indices

    aux_idx = np.arange(rows + 1)

    for value,col in itertools.izip(sps_vec.data,sps_vec.indices) :
        new_data = np.insert(new_data,new_pointer[1:],[value] * rows)
        new_cols = np.insert(new_cols,[col] * rows)
        new_pointer += aux_idx

    return sps.csr_matrix((new_data,shape=sps_mat.shape)

python – 如何有效地将矩阵变换应用于NumPy数组的每一行？

假设我有一个2d NumPy ndarray,就像这样：

[[ 0,1,2,3 ],[ 4,5,6,7 ],[ 8,9,10,11 ]]

从概念上讲,我想要做的是：

For each row:
    Transpose the row
    Multiply the transposed row by a transformation matrix
    Transpose the result
    Store the result in the original ndarray,overwriting the original row data

我有一个极其缓慢,强力的方法,在功能上实现了这一点：

import numpy as np
transform_matrix = np.matrix( /* 4x4 matrix setup clipped for brevity */ )
for i,row in enumerate( data ):
    tr = row.reshape( ( 4,1 ) )
    new_row = np.dot( transform_matrix,tr )
    data[i] = new_row.reshape( ( 1,4 ) )

然而,这似乎是NumPy应该做的那种操作.我认为 – 作为NumPy的新手 – 我只是遗漏了文档中的一些基本内容.有什么指针吗？

请注意,如果创建新的ndarray更快,而不是就地编辑它,那么这也适用于我正在做的事情;操作速度是首要关注的问题.

解决方法

您要执行的一系列操作等同于以下内容：

data[:] = data.dot(transform_matrix.T)

或使用新数组而不是修改原始数据,这应该更快一点：

data.dot(transform_matrix.T)

这是解释：

For each row:
    Transpose the row

相当于转置矩阵然后越过列.

Multiply the transposed row by a transformation matrix

将矩阵的每列左乘第二矩阵相当于将整个事物左乘第二矩阵.此时,你拥有的是transform_matrix.dot(data.T)

Transpose the result

矩阵转置的基本属性之一是transform_matrix.dot(data.T).T等同于data.dot(transform_matrix.T).

Store the result in the original ndarray,overwriting the original row data

切片分配执行此操作.

关于在numpy中，如何有效列出所有固定大小的子矩阵？和numpy获取矩阵大小的问题我们已经讲解完毕，感谢您的阅读，如果还想了解更多关于C中没有固定大小的字符数组、NumPy 2d 数组的切片，或者如何从 nxn 数组 (n>m) 中提取 mxm 子矩阵？、python – 在numpy / scipy中为稀疏的矩阵添加一个非常重复的矩阵？、python – 如何有效地将矩阵变换应用于NumPy数组的每一行？等相关内容，可以在本站寻找。

本文标签：