如何在不迭代python的情况下随机拆分json.bz2文件？

25-03-10 16

针对如何在不迭代python的情况下随机拆分json.bz2文件？这个问题，本篇文章进行了详细的解答，同时本文还将给你拓展c#–如何在不迭代的情况下放弃IEnumerator？、python–在不使用

针对如何在不迭代python的情况下随机拆分json.bz2文件？这个问题，本篇文章进行了详细的解答，同时本文还将给你拓展c# – 如何在不迭代的情况下放弃IEnumerator？、python – 在不使用DOM方法的情况下迭代解析大型XML文件、python – 如何在不创建模型的情况下在django中保存文件、Python-如何在不覆盖数据的情况下（使用pandas）写入现有的Excel文件？等相关知识，希望可以帮助到你。

本文目录一览：

如何在不迭代python的情况下随机拆分json.bz2文件？
c# – 如何在不迭代的情况下放弃IEnumerator？
python – 在不使用DOM方法的情况下迭代解析大型XML文件
python – 如何在不创建模型的情况下在django中保存文件
Python-如何在不覆盖数据的情况下（使用pandas）写入现有的Excel文件？

如何在不迭代python的情况下随机拆分json.bz2文件？

如何解决如何在不迭代python的情况下随机拆分json.bz2文件？？

我有一个超过50GB大小的json.bz2文件。我想将文件拆分为多个分区，以便使用python在多线程中运行进程。

您能建议我一种使用python代码随机分割json.bz2文件（不读取/迭代）的理想方法吗？

注意：流程无需花费几个小时。

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

c# – 如何在不迭代的情况下放弃IEnumerator？

考虑以下代码,第一个演示当我们完成对IEnumerable字符串的迭代时执行“清理”.第二遍是让我悲伤的原因.我需要能够在到达结束之前放弃IEnumerable,然后执行清理代码.但如果你运行这个,你会发现在第二次通过时,清理工作永远不会开火.

放弃像这样的IEnumerable的首选方法是什么？

static void Main(string[] args)
{
    // first pass
    foreach (String color in readColors())
        Console.WriteLine(color);

    // second pass
    IEnumerator<string> reader = readColors().GetEnumerator();
    if (reader.MoveNext())
    {
        Console.WriteLine(reader.Current);
        reader.dispose();
    }
}
static IEnumerable<string> readColors()
{
    string[] colors = { "red","green","blue" };
    for (int i = 0; i < colors.Length; i++)
        yield return colors[i];

    Console.WriteLine("Cleanup goes here");
}

解决方法

您需要将迭代器方法的主要部分放入try..finally,并使用finally中的清理代码：

public IEnumerable<string> readColors()
    {
        try
        {
            string[] colors = { "red","blue" };
            for (int i = 0; i < colors.Length; i++)
                yield return colors[i];
        }
        finally
        {
            Console.WriteLine("Cleanup goes here");
        }
    }

请记住,在引擎盖下,迭代器方法会导致创建一个单独的类,它实现了IEnumerable和IEnumerator.通过将清理放在finally块中,它最终会生成在生成的类’dispose方法中.

[编辑:(正如其他答案所指出的)更喜欢使用手工调用dispose的方法.我假设你这样做是为了突出讨论中的问题,但无论如何都值得指出]

python – 在不使用DOM方法的情况下迭代解析大型XML文件

我有一个xml文件

<temp>
  <email id="1" Body="abc"/>
  <email id="2" Body="fre"/>
  .
  .
  <email id="998349883487454359203" Body="hi"/>
</temp>

我想阅读每个电子邮件标签的xml文件.也就是说,在我想要从中读取电子邮件id = 1..extract body时,读取的电子邮件id = 2 …并从中提取主体…等等

我尝试使用DOM模型进行XML解析,因为我的文件大小是100 GB ..这种方法不起作用.然后我尝试使用：

  from xml.etree import ElementTree as ET
  tree=ET.parse('myfile.xml')
  root=ET.parse('myfile.xml').getroot()
  for i in root.findall('email/'):
              print i.get('Body')

现在,一旦我得到根…我不知道为什么我的代码无法解析.

使用iterparse时的代码抛出以下错误：

 "UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 437: ordinal not in range(128)"

有人可以帮忙

解决方法:

iterparse的一个例子：

import cStringIO
from xml.etree.ElementTree import iterparse

fakefile = cStringIO.StringIO("""<temp>
  <email id="1" Body="abc"/>
  <email id="2" Body="fre"/>
  <email id="998349883487454359203" Body="hi"/>
</temp>
""")
for _, elem in iterparse(fakefile):
    if elem.tag == 'email':
        print elem.attrib['id'], elem.attrib['Body']
    elem.clear()

只需用您的真实文件替换fakefile即可.
另请阅读this了解更多详情.

python – 如何在不创建模型的情况下在django中保存文件

我想上传excel文件并将该文件保存到 django中的特定位置,而无需为该文件创建模型.

我在这里试过
我的forms.py文件

class UploadFileForm(forms.Form):

    file  = forms.FileField(label=''Select a file'',help_text=''max. 42 megabytes'')

我的views.py

import xlrd  
from property.forms import UploadFileForm

def excel(request):

    if request.method == ''POST'':

        form = UploadFileForm(request.POST,request.FILES)
        if form.is_valid():
            newdoc = handle_uploaded_file(request.FILES[''file''])
            print newdoc
            print "you in"
            newdoc.save()

            return HttpResponseRedirect(reverse(''upload.views.excel''))
    else:
        form = UploadFileForm() # A empty,unbound form
    #documents = Document.objects.all()
    return render_to_response(''property/list.html'',{''form'': form},context_instance=RequestContext(request))

def handle_uploaded_file(f):
    destination = open(''media/file/sheet.xls'',''wb+'')
    for chunk in f.chunks():
        destination.write(chunk)
    destination.close()

所以在尝试这个时我得到了错误.

IOError at /property/excel/
[Errno 2] No such file or directory: ''media/file/sheet.xls''
Request Method: POST
Request URL:    http://127.0.0.1:8000/property/excel/
Django Version: 1.5
Exception Type: IOError
Exception Value:    
[Errno 2] No such file or directory: ''media/file/sheet.xls''
Exception Location: D:\Django_workspace\6thmarch\dtz\property\views.py in handle_uploaded_file,line 785

请帮我解决这个问题,handle_uploaded_file()函数有问题.

解决方法

如果你使用open(如open(‘path’,’wb’),那么你需要使用FULL路径.

你能做的是：

from django.conf import settings
destination = open(settings.MEDIA_ROOT + filename,''wb+'')

Python-如何在不覆盖数据的情况下（使用pandas）写入现有的Excel文件？

如何解决Python-如何在不覆盖数据的情况下（使用pandas）写入现有的Excel文件？？

pandas文档表示，它对xlsx文件使用openpyxl。快速浏览一下其中的代码ExcelWriter可以提示可能会发生以下情况：

import pandas
from openpyxl import load_workbook

book = load_workbook(''Masterfile.xlsx'')
writer = pandas.ExcelWriter(''Masterfile.xlsx'', engine=''openpyxl'') 
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

data_filtered.to_excel(writer, "Main", cols=[''Diff1'', ''Diff2''])

writer.save()

解决方法

我使用熊猫以以下方式写入excel文件：

import pandas

writer = pandas.ExcelWriter(''Masterfile.xlsx'') 

data_filtered.to_excel(writer,"Main",cols=[''Diff1'',''Diff2''])

writer.save()

Masterfile.xlsx已经包含许多不同的选项卡。但是，它还不包含“ Main”。

熊猫正确地写到“主要”表，不幸的是，它也删除了所有其他标签。

今天关于如何在不迭代python的情况下随机拆分json.bz2文件？的分享就到这里，希望大家有所收获，若想了解更多关于c# – 如何在不迭代的情况下放弃IEnumerator？、python – 在不使用DOM方法的情况下迭代解析大型XML文件、python – 如何在不创建模型的情况下在django中保存文件、Python-如何在不覆盖数据的情况下（使用pandas）写入现有的Excel文件？等相关知识，可以在本站进行查询。

本文标签：