GVKun编程网logo

Scraping 时从最后一个索引开始往 MySQL 中插入数据(python最后一位索引)

1

本文将介绍Scraping时从最后一个索引开始往MySQL中插入数据的详细情况,特别是关于python最后一位索引的相关信息。我们将通过案例分析、数据研究等多种方式,帮助您更全面地了解这个主题,同时也

本文将介绍Scraping 时从最后一个索引开始往 MySQL 中插入数据的详细情况,特别是关于python最后一位索引的相关信息。我们将通过案例分析、数据研究等多种方式,帮助您更全面地了解这个主题,同时也将涉及一些关于Bootstrap Modal Form 不会在 MYSQL 中插入数据、Fintechs divided on screen scraping ban、Gmail Scraping:获取发件人电子邮件地址的更快方法?、Golang Colly Scraping - 网站验证码捕获我的数据的知识。

本文目录一览:

Scraping 时从最后一个索引开始往 MySQL 中插入数据(python最后一位索引)

Scraping 时从最后一个索引开始往 MySQL 中插入数据(python最后一位索引)

如何解决Scraping 时从最后一个索引开始往 MySQL 中插入数据

我正在抓取网站并同时在 MysqL 数据库中插入数据。像这样的东西。我不得不删除抓取部分,否则代码会太大。

  1. def get_page(links):
  2. parent_window = driver.current_window_handle
  3. for link in links:
  4. driver.execute_script(''window.open(arguments[0]);'',link)
  5. all_windows = driver.window_handles
  6. child_window = [window for window in all_windows if window != parent_window][0]
  7. driver.switch_to.window(child_window)
  8. #scraping
  9. try:
  10. cursor.execute("INSERT INTO Investors(name,tags,website,introduction) VALUES(%s,%s,%s)",(name,tag,introduction,))
  11. except Exception as e:
  12. raise e
  13. parent_window1 = driver.current_window_handle
  14. for lin in team_div:
  15. driver.execute_script(''window.open(arguments[0]);'',lin)
  16. all_windows = driver.window_handles
  17. child_window1 = [window for window in all_windows if window != parent_window1][1]
  18. driver.switch_to.window(child_window1)
  19. time.sleep(2)
  20. #scraping
  21. driver.close()
  22. driver.switch_to.window(parent_window1)
  23. sql = cursor.execute(f"SELECT inv_id FROM Investors WHERE name =\\''{name}\\''")
  24. pid = cursor.fetchone()
  25. try:
  26. cursor.execute("INSERT INTO team_members(inv_id,mem_name,picture,experience) VALUES(%s,(pid,port_name,headshot,work_ex,))
  27. except:
  28. pass
  29. driver.refresh()
  30. time.sleep(3)
  31. driver.execute_script("window.scrollBy(0,2825)","")
  32. time.sleep(2)
  33. #scraping
  34. try:
  35. cursor.execute("INSERT INTO portfolio(inv_id,port_icon,port_desc) VALUES(%s,(pid1,p_name,p_icon,p_short_des,))
  36. except:
  37. pass
  38. driver.close()
  39. driver.switch_to.window(parent_window)
  40. def get_links(page):
  41. if page == 1:
  42. url = ''https://www.cypherhunter.com/en/search/?q=investments''
  43. driver.get(url)
  44. time.sleep(2)
  45. links = driver.find_elements_by_xpath(''//div[@]//a'')
  46. return links
  47. else:
  48. url = f''https://www.cypherhunter.com/en/search/page/{page}/?q=investments''
  49. driver.get(url)
  50. time.sleep(2)
  51. links = driver.find_elements_by_xpath(''//div[@]//a'')
  52. return links
  53. for p in range(1,48):
  54. z = get_links(p)
  55. get_page(z)

我有一种感觉,这可能是一种低效的数据发送方式,但随后就会变成两个问题。我的问题是我怎样才能做到这样,如果脚本由于某种原因失败......它在下次运行时从同一个地方开始。我们可以从 MysqL 获得的最后一个索引但是如何在代码中做到这一点。只能手动?

Bootstrap Modal Form 不会在 MYSQL 中插入数据

如何解决Bootstrap Modal Form 不会在 MYSQL 中插入数据

我的引导模式有问题,它不会在 MysqL 中插入数据,即使我在同一页面中有一个类似的模式来更新数据.. 数据在表格中表示,每个表格都有删除和更新按钮行...这两者在同一页面中都运行良好...我没有使用 Ajax,我希望这可以在不使用任何其他技术的情况下工作......这有点奇怪,因为更新查询使用相同的技术并且无论如何仍然有效>

  1. <?PHP
  2. require_once "../db.PHP";
  3. session_start();
  4. if(!isset($_SESSION[''admin''])){
  5. header(''location:login.PHP'');
  6. }
  7. function redirect($url)
  8. {
  9. if (!headers_sent())
  10. {
  11. header(''Location: ''.$url);
  12. exit;
  13. }
  14. else
  15. {
  16. echo ''<script type="text/javascript">'';
  17. echo ''window.location.href="''.$url.''";'';
  18. echo ''</script>'';
  19. echo ''<noscript>'';
  20. echo ''<Meta http-equiv="refresh" content="0;url=''.$url.''" />'';
  21. echo ''</noscript>''; exit;
  22. }
  23. }
  24. ?>
  25. <!DOCTYPE html>
  26. <html lang="en">
  27. <head>
  28. <Meta charset="UTF-8">
  29. <Meta http-equiv="X-UA-Compatible" content="IE=edge">
  30. <Meta name="viewport" content="width=device-width,initial-scale=1.0">
  31. <title>Document</title>
  32. <link rel="stylesheet" href="../css/all.css">
  33. <link rel="stylesheet" href="../css/bootstrap.min.css">
  34. <script src="../js/jquery-3.5.1.min.js"></script>
  35. <script srC="../js/bootstrap.min.js"></script>
  36. <style>
  37. .nav-link{
  38. font-size:18px;
  39. font-weight:bold;
  40. }
  41. </style>
  42. </head>
  43. <body>
  44. <?PHP
  45. if(isset($_POST[''confirmadd''])){
  46. $code = $_POST[''voyage''];
  47. $newrow0 = $_POST[''depart''];
  48. $newrow1 = $_POST[''villedep''];
  49. $newrow2 = $_POST[''arrivee''];
  50. $newrow3 = $_POST[''villearr''];
  51. $newrow4 = $_POST[''price''];
  52. $addTrip = MysqLi_query($sql,"INSERT INTO voyage VALUES(''$code'',''$newrow0'',''$newrow1'',$newrow2'',''$newrow3'',''$newrow4'');");
  53. if($addTrip){
  54. redirect(''dashboard.PHP'');
  55. }else{
  56. echo "error";
  57. }
  58. }
  59. ?>
  60. <div class="container-fluid bg-info sticky-top">
  61. <div class="row d-flex align-items-center">
  62. <div class="col-6 offset-1">
  63. <h2 class="display-4 text-white">AnouBus</h2>
  64. </div>
  65. <div class="col-4">
  66. <div class="navbar navbar-expand-lg navbar-light navbar-fixed-top">
  67. <ul class="navbar navbar-nav">
  68. <li class="nav-item"><a class="nav-link" href="logout.PHP">logout</a></li>
  69. </ul>
  70. </div>
  71. </div>
  72. </div>
  73. </div>
  74. <h1 class="text-center">Your Trips</h1>
  75. <button class="btn btn-primary mx-2" style="float:right;" data-toggle="modal" data-target="#addmodal">Add Trip</button>
  76. <div class="modal fade" id="addmodal">
  77. <div class="modal-dialog modal-dialog-centered">
  78. <div class="modal-content">
  79. <div class="modal-header"><h2 class="text-center">Update Trip</h2></div>
  80. <div class="modal-body">
  81. <form method="post">
  82. <div class="form-row">
  83. <div class="form-group col-12">
  84. <input type="text" name="voyage" class="form-control" placeholder="Trip Code">
  85. </div>
  86. <div class="form-group col-12">
  87. <input type="text" name="depart" class="form-control" placeholder="Departure Time" ?>
  88. </div>
  89. <div class="form-group col-12">
  90. <input type="text" name="villedep" class="form-control" placeholder="Departure City">
  91. </div>
  92. <div class="form-group col-12">
  93. <input type="text" name="arrivee" class="form-control" placeholder="Arrival Time">
  94. </div>
  95. <div class="form-group col-12">
  96. <input type="text" name="villearr" class="form-control" placeholder="Arrival City">
  97. </div>
  98. <div class="form-group col-12">
  99. <input type="text" name="price" class="form-control" placeholder="Price">
  100. </div>
  101. <div class="form-group col-12">
  102. <input type="submit" class="form-control btn btn-success" name="confirmadd" value="confirm">
  103. </div>
  104. </div>
  105. </form>
  106. </div>
  107. </div>
  108. </div>
  109. </div>

Fintechs divided on screen scraping ban

Fintechs divided on screen scraping ban

http://www.zdnet.com/article/fintechs-divided-on-screen-scraping-ban/


Founding director of The Regtech Association and Verifier CEO, Lisa Schutz, has urged for the Australian government to follow in the footsteps of the European Union (EU) and cease screen scraping.

"I believe passionately that screen scraping should be prohibited as it is in the EU," she told the Senate Committee of Financial Technology and Regulatory Technology on Thursday.

Screen scraping is the process where customers give a third-party company, such as a fintech firm, permission to access their data before taking a "snapshot" of it and using it to deliver a service or product to the customer.

She explained that while Verifier has the option to use screen scraping, it has instead chosen to access customer data under the 12 principles of the Privacy Act so it does not impede on the privacy of consumers.

"It''s the long way to get the right outcome … [but] it comes back to what is the 2050 Australia that we want to live in," she said.

Schutz also took the opportunity to address how the Consumer Data Right, specifically Open Banking, would enable the sharing of data in a controlled and respectable manner.

The other alternative, she proposed, was to follow in the footsteps of the EU, which "put [a] sunset on screen scraping and that was for 18 months".

The hearing, chaired by NSW Liberal Senator Andrew Bragg, also heard from Dave Stein, head of corporate development at Melbourne-based fintech startup Airwallex, who agreed with Schutz''s call to ban screen scraping.

"Screen scraping is bad technology. It''s just aided bad technology. It''s a way around barriers that exist, but it''s not actually trying to solve the underlying problem, which is helping people communicate and do what they want with their finances, pay the way they want," he said.

"We don''t do that, we don''t use that, but for us it''s a technology decision. We just don''t want to invest in a dated technology."

Read more: Rules drafted on how to access data under Consumer Data Right

On the other end of the spectrum was Raiz Invest general counsel Astrid Raetze, who argued on Thursday that screen scraping will always have two camps.

"There''s the banks and their views, and then there are fintechs who are not bank affiliated. Largely, the argument centres around the banks saying, ''it''s bad, it''s wrong you have to shut it down'', and then there''s the fintechs who say, ''we need it''."

For Raiz Invest, they sit in the fintech camp and currently use screen scraping, Raetze said, alongside other players such as Xero, ANZ, and Macquarie Bank.

Raetze highlighted that without screen scraping, the only other alternative tool for the company to access data would be to develop APIs under open banking.

"What that doesn''t take into consideration is the disparity of resources between the two camps," she explained

She said, based on guesstimates, the development process would cost the company a minimum of AU$1 to AU$2 million and require 6-12 months to complete.  

"If you switch on open banking and turn off screen scraping … what you will do is hamstring the fintech industry," she said.

At the same time, Raetze said it would also mean data holders, such as the large banking institutions, would have to be prepared to enable the building of APIs.

"We''d have to persuade CBA to build an API and negotiate with CBA (Commonwealth Bank of Australia), as an example," she said, while also describing how based on past conversations with banks like CBA that it would be difficult to do. 

The committee also questioned Raiz Invest about screen scraping in association to data security, which Raetze said puts customers and their data at "no risk".

"We have the same level security and we do not transact on your account, so there is no risk to you," she said.

Similarly, Illion managing director Luke Howes said banning screen scraping would be "simplistic and misguided".

"I have never seen in six years, any consumer harm because it''s safe. Banning it will cripple millions of users and businesses who rely on it. If you ban it, you''ll send an industry back five or 10 years," he warned.

Related Coverage

  • Senate committee to probe Australia''s fintech and regtech opportunity
  •  CyberCX floats government loans to help startups comply with open banking
  • Stone and Chalk to support Sydney fintech startups
  • APRA granted 20 banking licences in 12 years, with 10 since 2018
  • OAIC seeks feedback on draft CDR privacy safeguard guidelines
  • Australia''s open banking regime: Generic product data available from 1 July
  • Energy sector asks not to be forgotten in changes to Aussie data-sharing rules
  • ANZ asks for third party Consumer Data Right access
  • NAB asks to not be disadvantaged in opening CDR up to new players

Gmail Scraping:获取发件人电子邮件地址的更快方法?

Gmail Scraping:获取发件人电子邮件地址的更快方法?

如何解决Gmail Scraping:获取发件人电子邮件地址的更快方法?

我在 gmail 网站上运行一个脚本(因为它比从头开始构建电子邮件客户端更容易)获取发件人的电子邮件地址。这可以通过运行来完成

document.getElementsByClassName(''gF gK'')[0].querySelector(''[email]'').attributes.email.value

或者也通过运行

document.querySelector(''[]'').lastElementChild.attributes.jid.value

我每 50 毫秒运行一次这些工作非常可靠,但是我确实遇到了一个问题:当我从电子邮件 A 切换到电子邮件 B 时,尽管 B 的发件人名称已经立即显示,但我抓取的两个 div 仍然有地址在实际返回电子邮件 B 的地址之前,将电子邮件 A 放入其中大约一秒钟。

换句话说,我必须等待一秒钟才能显示实际的电子邮件地址,我想解决这个等待时间。

我在 div 内容中找不到任何其他出现的电子邮件,但我非常有信心在页面上的某处可以找到“新”电子邮件的零秒实例。

Golang Colly Scraping - 网站验证码捕获我的数据

Golang Colly Scraping - 网站验证码捕获我的数据

如何解决Golang Colly Scraping - 网站验证码捕获我的数据

我确实为亚马逊产品标题进行了抓取,但亚马逊验证码抓住了我的抓取工具。我尝试了 10 次 - 去运行 main.go(8 次抓住我 - 2 次我刮掉产品标题)

我对此进行了研究,但没有找到任何适用于 golang 的解决方案(只有 python)有没有适合我的解决方案?


package main

import (
    "fmt"
    "strings"0

    "github.com/gocolly/colly"
)

func main() {

    // Create a Collector specifically for Shopify
    c := colly.NewCollector(
        colly.AllowedDomains("www.amazon.com","amazon.com"),)
    c.OnHTML("div",func(h *colly.HTMLElement) {
        capctha := h.Text
        title := h.ChildText("span#productTitle")
        fmt.Println(strings.Trimspace(title))
        fmt.Println(strings.Trimspace(capctha))
    })

    // Start the collector
    c.Visit("https://www.amazon.com/Bluetooth-Over-Ear-Headphones-Foldable-Prolonged/dp/B07K5214NZ")
}

输出:

输入您在下面看到的字符 抱歉,我们只需要确保 你不是机器人。为获得最佳效果,请确保您的浏览器是 接受 cookie。

解决方法

如果你不介意不同的包,我写了一个包来搜索HTML (本质上是围绕 var_results = VarsResults.objects.filter( id_res__read_date__range=(start_d,end_d),id_res__proj_code=pr_code,var_id__is_quarterly=False ).select_related( "id_res","var_id","scaling_id" ).values( "id_res__read_date","id_res__unit_id","id_res__device_id","id_res__proj_code","var_val",<scale_output_value(scaling instance_id)> ) 的薄包装):

x/net/html

结果:

package main

import (
   "github.com/89z/mech"
   "net/http"
)

func main() {
   r,err := http.Get("https://www.amazon.com/dp/B07K5214NZ")
   if err != nil {
      panic(err)
   }
   defer r.Body.Close()
   doc,err := mech.Parse(r.Body)
   if err != nil {
      panic(err)
   }
   span := doc.ByAttr("id","productTitle")
   span.Scan()
   println(span.Text())
}

https://pkg.go.dev/github.com/89z/mech

关于Scraping 时从最后一个索引开始往 MySQL 中插入数据python最后一位索引的问题我们已经讲解完毕,感谢您的阅读,如果还想了解更多关于Bootstrap Modal Form 不会在 MYSQL 中插入数据、Fintechs divided on screen scraping ban、Gmail Scraping:获取发件人电子邮件地址的更快方法?、Golang Colly Scraping - 网站验证码捕获我的数据等相关内容,可以在本站寻找。

本文标签: