字符串上的Numpy'where'（numpy 字符串）

25-01-29 24

针对字符串上的Numpy'where'和numpy字符串这两个问题，本篇文章进行了详细的解答，同时本文还将给你拓展genfromtxt（）中的NumPydtype问题，以字节字符串形式读取字符串、nu

针对字符串上的Numpy'where'和numpy 字符串这两个问题，本篇文章进行了详细的解答，同时本文还将给你拓展genfromtxt（）中的NumPy dtype问题，以字节字符串形式读取字符串、numpy 字符串数组比 python 字符串快吗、numpy.where() 用法详解、numpy.where() 详细的分步说明/示例等相关知识，希望可以帮助到你。

本文目录一览：

字符串上的Numpy'where'（numpy 字符串）
genfromtxt（）中的NumPy dtype问题，以字节字符串形式读取字符串
numpy 字符串数组比 python 字符串快吗
numpy.where() 用法详解
numpy.where() 详细的分步说明/示例

字符串上的Numpy'where'（numpy 字符串）

我想在字符串数组上使用numpy.where函数。但是，我这样做没有成功。有人可以帮我解决这个问题吗？

例如，当我numpy.where在以下示例中使用时，出现错误：

import numpy as npA = [''apple'', ''orange'', ''apple'', ''banana'']arr_index = np.where(A == ''apple'',1,0)

我得到以下内容：

>>> arr_indexarray(0)>>> print A[arr_index]>>> apple

但是，我想知道A字符串''apple''匹配的字符串数组中的索引。在上面的字符串中，这发生在0和2。但是，np.where仅返回0而不是2。

那么，我该如何numpy.where处理弦乐呢？提前致谢。

答案1

小编典典

print a[arr_index]

不array_index！

a = np.array([''apple'', ''orange'', ''apple'', ''banana''])arr_index = np.where(a == ''apple'')print arr_indexprint a[arr_index]

genfromtxt（）中的NumPy dtype问题，以字节字符串形式读取字符串

我想将标准ascii csv文件读入numpy，其中包含浮点数和字符串。

例如，

ZINC00043096,C.3,C1,-0.1540,methylZINC00043096,C.3,C2,0.0638,methyleneZINC00043096,C.3,C4,0.0669,methyleneZINC00090377,C.3,C7,0.2070,methylene...

不管我尝试什么，结果数组看起来像

例如，

all_data = np.genfromtxt(csv_file, dtype=None, delimiter='','')[(b''ZINC00043096'', b''C.3'', b''C1'', -0.154, b''methyl'') (b''ZINC00043096'', b''C.3'', b''C2'', 0.0638, b''methylene'') (b''ZINC00043096'', b''C.3'', b''C4'', 0.0669, b''methylene'')

但是，我想为字节字符串转换节省一个步骤，并且想知道如何将字符串列作为常规字符串直接读取。

我尝试了numpy.genfromtxt（）文档中的几项内容，例如dtype=''S,S,S,f,S''或dtype=''a25,a25,a25,f,a25''，但实际上没有任何帮助。

恐怕，但是我想我只是不了解dtype转换的真正工作原理……如果可以在这里给我一些提示，那将是很好的！

谢谢

答案1

小编典典

在Python2.7中

array([(''ZINC00043096'', ''C.3'', ''C1'', -0.154, ''methyl''),       (''ZINC00043096'', ''C.3'', ''C2'', 0.0638, ''methylene''),       (''ZINC00043096'', ''C.3'', ''C4'', 0.0669, ''methylene''),       (''ZINC00090377'', ''C.3'', ''C7'', 0.207, ''methylene'')],       dtype=[(''f0'', ''S12''), (''f1'', ''S3''), (''f2'', ''S2''), (''f3'', ''<f8''), (''f4'', ''S9'')])

在Python3中

array([(b''ZINC00043096'', b''C.3'', b''C1'', -0.154, b''methyl''),       (b''ZINC00043096'', b''C.3'', b''C2'', 0.0638, b''methylene''),       (b''ZINC00043096'', b''C.3'', b''C4'', 0.0669, b''methylene''),       (b''ZINC00090377'', b''C.3'', b''C7'', 0.207, b''methylene'')],       dtype=[(''f0'', ''S12''), (''f1'', ''S3''), (''f2'', ''S2''), (''f3'', ''<f8''), (''f4'', ''S9'')])

Python3中的“常规”字符串是unicode。但是您的文本文件具有字节字符串。
all_data在这两种情况下都是相同的（136个字节），但是Python3显示字节字符串的方式是b''C.3''，而不仅仅是’C.3’。

您打算对这些字符串进行哪些类型的操作？''ZIN'' in all_data[''f0''][1]适用于2.7版本，但在3中必须使用b''ZIN'' inall_data[''f0''][1]。

numpy中的可变/未知长度字符串/ unicode
dtype
提醒我，您可以在中指定unicode字符串类型dtype。但是，如果您事先不知道字符串的长度，这将变得更加复杂。

alttype = np.dtype([(''f0'', ''U12''), (''f1'', ''U3''), (''f2'', ''U2''), (''f3'', ''<f8''), (''f4'', ''U9'')])all_data_u = np.genfromtxt(csv_file, dtype=alttype, delimiter='','')

生产

array([(''ZINC00043096'', ''C.3'', ''C1'', -0.154, ''methyl''),       (''ZINC00043096'', ''C.3'', ''C2'', 0.0638, ''methylene''),       (''ZINC00043096'', ''C.3'', ''C4'', 0.0669, ''methylene''),       (''ZINC00090377'', ''C.3'', ''C7'', 0.207, ''methylene'')],       dtype=[(''f0'', ''<U12''), (''f1'', ''<U3''), (''f2'', ''<U2''), (''f3'', ''<f8''), (''f4'', ''<U9'')])

在Python2.7中all_data_u显示为

(u''ZINC00043096'', u''C.3'', u''C1'', -0.154, u''methyl'')

all_data_u是448个字节，因为numpy为每个unicode字符分配了4个字节。每个U4项目长16个字节。

v 1.14中的更改：https :
//docs.scipy.org/doc/numpy/release.html#encoding-argument-for-text-io-
functions

numpy 字符串数组比 python 字符串快吗

如何解决numpy 字符串数组比 python 字符串快吗？

我正在创建一个大约 3000 万字长的字符串。可以想象，创建一个每次增加大约 100 个单词的 for 循环需要花费很长时间。有没有办法以更内存友好的方式表示字符串，比如 numpy 数组？我对 numpy 的经验很少。

bigStr = ''''
for tweet in df[''text'']:
  bigStr = bigStr + '' '' + tweet
len(bigStr)

解决方法

如果你想构建一个字符串，使用'' ''.join，它将在 O(n) 时间内创建最终的字符串，而不是一次构建一个，这需要 O(n^2)时间。

bigStr = '' ''.join([tweet for tweet in df[''text'']])

我可以看到您正在尝试获取所有数据的长度。为此，您不需要附加所有字符串。（我看到你为每个元素添加了一个空格）

只需获取 tweet 的长度并将其添加到整数变量中（每个空格+1）：

number_of_texts = 0
for tweet in df[''text'']:
  number_of_texts += 1 + len(tweet)

print(number_of_texts)

numpy.where() 用法详解

numpy.where (condition[, x, y]) numpy.where() 有两种用法：

###1. np.where(condition, x, y)

满足条件(condition)，输出x，不满足输出y。 如果是一维数组，相当于[xv if c else yv for (c,xv,yv) in zip(condition,x,y)]

>>> aa = np.arange(10)
>>> np.where(aa,1,-1)
array([-1,  1,  1,  1,  1,  1,  1,  1,  1,  1])  # 0为False，所以第一个输出-1
>>> np.where(aa > 5,1,-1)
array([-1, -1, -1, -1, -1, -1,  1,  1,  1,  1])

>>> np.where([[True,False], [True,True]],    # 官网上的例子
			 [[1,2], [3,4]],
             [[9,8], [7,6]])
array([[1, 8],
	   [3, 4]])

上面这个例子的条件为[[True,False], [True,False]]，分别对应最后输出结果的四个值。第一个值从[1,9]中选，因为条件为True，所以是选1。第二个值从[2,8]中选，因为条件为False，所以选8，后面以此类推。类似的问题可以再看个例子：

>>> a = 10
>>> np.where([[a > 5,a < 5], [a == 10,a == 7]],
             [["chosen","not chosen"], ["chosen","not chosen"]],
             [["not chosen","chosen"], ["not chosen","chosen"]])

array([[''chosen'', ''chosen''],
       [''chosen'', ''chosen'']], dtype=''<U10'')

###2. np.where(condition)

只有条件 (condition)，没有x和y，则输出满足条件 (即非0) 元素的坐标 (等价于numpy.nonzero)。这里的坐标以tuple的形式给出，通常原数组有多少维，输出的tuple中就包含几个数组，分别对应符合条件元素的各维坐标。

>>> a = np.array([2,4,6,8,10])
>>> np.where(a > 5)				# 返回索引
(array([2, 3, 4]),)   
>>> a[np.where(a > 5)]  			# 等价于 a[a>5]
array([ 6,  8, 10])

>>> np.where([[0, 1], [1, 0]])
(array([0, 1]), array([1, 0]))

上面这个例子条件中[[0,1],[1,0]]的真值为两个1，各自的第一维坐标为[0,1]，第二维坐标为[1,0] 。 下面看个复杂点的例子：

>>> a = np.arange(27).reshape(3,3,3)
>>> a
array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

>>> np.where(a > 5)
(array([0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2]),
 array([2, 2, 2, 0, 0, 0, 1, 1, 1, 2, 2, 2, 0, 0, 0, 1, 1, 1, 2, 2, 2]),
 array([0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]))


# 符合条件的元素为
	   [ 6,  7,  8]],

      [[ 9, 10, 11],
       [12, 13, 14],
       [15, 16, 17]],

      [[18, 19, 20],
       [21, 22, 23],
       [24, 25, 26]]]

所以np.where会输出每个元素的对应的坐标，因为原数组有三维，所以tuple中有三个数组。 /

numpy.where() 详细的分步说明/示例

有人可以提供一维和二维数组的分步注释示例吗？

我们今天的关于字符串上的Numpy'where'和numpy 字符串的分享已经告一段落，感谢您的关注，如果您想了解更多关于genfromtxt（）中的NumPy dtype问题，以字节字符串形式读取字符串、numpy 字符串数组比 python 字符串快吗、numpy.where() 用法详解、numpy.where() 详细的分步说明/示例的相关信息，请在本站查询。

本文标签：