pandas数据框选择行，其中列表列包含任何字符串列表（pandas选择某列数据）

25-01-24 28

针对pandas数据框选择行，其中列表列包含任何字符串列表和pandas选择某列数据这两个问题，本篇文章进行了详细的解答，同时本文还将给你拓展Pandas删除不包含字符串列表的行、pandas数据透视

针对pandas数据框选择行，其中列表列包含任何字符串列表和pandas选择某列数据这两个问题，本篇文章进行了详细的解答，同时本文还将给你拓展Pandas 删除不包含字符串列表的行、pandas 数据透视表，其中该列包含具有多个类别的字符串、Pandas 键列表列和值列表列到字典列表列、pandas-将字符串转换为字符串列表等相关知识，希望可以帮助到你。

本文目录一览：

pandas数据框选择行，其中列表列包含任何字符串列表（pandas选择某列数据）
Pandas 删除不包含字符串列表的行
pandas 数据透视表，其中该列包含具有多个类别的字符串
Pandas 键列表列和值列表列到字典列表列
pandas-将字符串转换为字符串列表

pandas数据框选择行，其中列表列包含任何字符串列表（pandas选择某列数据）

我有一个看起来像这样的Pandas DataFrame：

  molecule            species0        a              [dog]1        b       [horse, pig]2        c         [cat, dog]3        d  [cat, horse, pig]4        e     [chicken, pig]

而且我想提取仅包含那些行的DataFrame，其中包含的任何行selection = [''cat'', ''dog'']。因此结果应如下所示：

  molecule            species0        a              [dog]1        c         [cat, dog]2        d  [cat, horse, pig]

最简单的方法是什么？

供测试用：

selection = [''cat'', ''dog'']df = pd.DataFrame({''molecule'': [''a'',''b'',''c'',''d'',''e''], ''species'' : [[''dog''], [''horse'',''pig''],[''cat'', ''dog''], [''cat'',''horse'',''pig''], [''chicken'',''pig'']]})

答案1

小编典典

IIUC重新创建您的df，然后使用isinwithany应该比apply

df[pd.DataFrame(df.species.tolist()).isin(selection).any(1).values]Out[64]:   molecule            species0        a              [dog]2        c         [cat, dog]3        d  [cat, horse, pig]

Pandas 删除不包含字符串列表的行

如何解决Pandas 删除不包含字符串列表的行？

我有一个具有这种数据结构的 csv 文件：

timestamp.  message.         name.   destinationUserName sourceUserName 
time.        login.          hello.    
time.        logout.         hello
time.        successful      hello1
time.        hello.          no
time.        notsuccessful   no

在我当前的代码中，我能够根据它是否包含 name 或 hello 来过滤 hello1 列……但我想做的不仅是检查 name 但能够检查 message 列并仅返回包含 successful 或 notsuccesful 的消息。

到目前为止我有这个代码：

f=pd.read_csv(''file.csv'')
f = f[f[''name''].isin(names_to_keep)]

这可以完美地返回包含我在 names_to_keep 中声明的名称列表的所有名称。所以我尝试更新代码以使用

添加消息

f = f[f[''name''].isin(names_to_keep & f[f[''message''].isin(message_to_keep)])]

在这种情况下，使用 & 它返回一个空文档，因为在当前文件中我没有任何带有该字符串的 message，这很好，但我希望脚本返回names 即使没有 message 机器代码。

我希望我的例子足够清楚，如果您需要更多信息，请告诉我。

预期结果：

timestamp.  message.         name.   destinationUserName sourceUserName 
time.        login.          hello.    
time.        logout.         hello
time.        successful      hello1
time.        notsuccessful   no

解决方法

如果要返回名称列包含值列表中的值或消息列包含值列表中的值的行，您可以使用它。

import pandas as pd

df = pd.read_csv(''test.csv'')

names_to_keep =  [''hello'',''hello1'',''hello2'']

messages_to_keep = [''successful'',''notsuccessful'']

print(df)

df = df[df[''name''].isin(names_to_keep) 
 | df[''message''].isin(messages_to_keep)]

print(df)

Sample Input
  timestamp        message    name destinationuserna
0      time          login   hello             user1
1      time         logout  hello1             user2
2      time     successful  hello2             user3
3      time          hello      no             user3
4      time  notsuccessful   don''t            random

Sample Output
0      time          login   hello             user1         8-8103
1      time         logout  hello1             user2         8-8103
2      time     successful  hello2             user3         8-8103
4      time  notsuccessful   don''t            random         8-8103

pandas 数据透视表，其中该列包含具有多个类别的字符串

可以将字符串转成列表后使用.explode()，然后正常旋转：

df['cat'] = df['cat'].str.split(',')
df = df.explode('cat').pivot_table(index=df.explode('cat').index,columns='cat',values='value')

输出：

cat a   b   c
0   1.0 NaN NaN
1   2.0 2.0 NaN
2   3.0 3.0 3.0
3   NaN 2.0 2.0
4   NaN 1.0 NaN

然后您可以重置或重命名索引，如果您不希望它被命名为 cat。

尝试使用 str.get_dummies 并乘以 value 列（如有必要，然后将 0 替换为 nan）

df['cat'].str.get_dummies(",").mul(df['value'],axis=0).replace(0,np.nan)

     a    b    c
0  1.0  NaN  NaN
1  2.0  2.0  NaN
2  3.0  3.0  3.0
3  NaN  2.0  2.0
4  NaN  1.0  NaN

Pandas 键列表列和值列表列到字典列表列

试试：

df['dicts'] = [dict(zip(x,y)) for x,y in zip(df['keys'],df['values'])]

或者解压：

df['dicts'] = [dict(zip(*u)) for u in zip(df['keys'],df['values'])]

pandas-将字符串转换为字符串列表

我有这个’file.csv’文件可以和熊猫一起阅读：

Title|Tags
T1|"[Tag1,Tag2]"
T1|"[Tag1,Tag2,Tag3]"
T2|"[Tag3,Tag1]"

使用

df = pd.read_csv('file.csv',sep='|')

输出为：

  Title              Tags
0    T1       [Tag1,Tag2]
1    T1  [Tag1,Tag3]
2    T2       [Tag3,Tag1]

我知道该列Tags是完整字符串，因为：

In [64]: df['Tags'][0][0]
Out[64]: '['

我需要将其阅读为类似的字符串列表["Tag1","Tag2"]。我尝试了这个问题中提供的解决方案，但是没有运气，因为我有[和]字符，实际上弄乱了事情。

预期的输出应为：

In [64]: df['Tags'][0][0]
Out[64]: 'Tag1'

关于pandas数据框选择行，其中列表列包含任何字符串列表和pandas选择某列数据的问题就给大家分享到这里，感谢你花时间阅读本站内容，更多关于Pandas 删除不包含字符串列表的行、pandas 数据透视表，其中该列包含具有多个类别的字符串、Pandas 键列表列和值列表列到字典列表列、pandas-将字符串转换为字符串列表等相关知识的信息别忘了在本站进行查找喔。

本文标签：