当它只有属性时如何在 scrapysharp 中找到表单,即没有名称或 ID
How to find the form in scrapysharp when it only has attributes i.e. no name or id
我不熟悉 scrapySharp 以及网络抓取。我正在尝试抓取一个安全且具有登录屏幕的网站。表单元素没有 name/id 属性,因此让我的生活变得更加复杂。我一直无法弄清楚如何使用下面的代码加载表单。非常感谢任何见解!
C#:
ScrapingBrowser browser = new ScrapingBrowser();
var homepage = browser.NavigateToPage(new Uri("https://somedomain.com/ProviderLogin.action/"));
var form1 = homepage.Find("form", ScrapySharp.Html.By.Text("form"));
var form2 = homepage.FindFormById("form[action='provider-login']");
HTML:
<form action="provider-login" method="post">
<div class="login-box">
<input type="text" name="username" id="username" autocomplete="false" placeholder="Username"
class="form-control input-lg login-input login-input-username" value="" />
<input type="password" id="password" name="password" placeholder="Password" type="password"
class="form-control input-lg login-input login-input-password" />
<button name="login" type="submit" class="btn btn-primary btn-block btn-md login-btn" >
Login
</button>
</div>
</form>
你无法在 ScrapySharp 中使用 "By" 实现这一点,因为它只有四个 "Element Search Kinds" :
{
Text,
Id,
Name,
Class
}
在您的情况下,您没有其中之一,因此请考虑使用 "CssSelect" 来实现您的目的:
var form = homepage.Html.CssSelect("form[action='provider-login']");
//Or
var form = homepage.Html.CssSelect("form[action*='provider-login']");
可以通过标签找到第一个表单节点,然后使用PageWebForm构造函数:
var browser = new ScrapingBrowser();
var homepage = browser.NavigateToPage(new Uri("https://somedomain.com/ProviderLogin.action/"));
var form1node = homepage.Html.SelectSingleNode("//form");
var form1 = new PageWebForm(form1node, browser); // this is where it happens!
form1["username"] = "some username";
form1["password"] = "some password";
form1.Method = HttpVerb.Post;
var webpage = form1.Submit();
我不熟悉 scrapySharp 以及网络抓取。我正在尝试抓取一个安全且具有登录屏幕的网站。表单元素没有 name/id 属性,因此让我的生活变得更加复杂。我一直无法弄清楚如何使用下面的代码加载表单。非常感谢任何见解!
C#:
ScrapingBrowser browser = new ScrapingBrowser();
var homepage = browser.NavigateToPage(new Uri("https://somedomain.com/ProviderLogin.action/"));
var form1 = homepage.Find("form", ScrapySharp.Html.By.Text("form"));
var form2 = homepage.FindFormById("form[action='provider-login']");
HTML:
<form action="provider-login" method="post">
<div class="login-box">
<input type="text" name="username" id="username" autocomplete="false" placeholder="Username"
class="form-control input-lg login-input login-input-username" value="" />
<input type="password" id="password" name="password" placeholder="Password" type="password"
class="form-control input-lg login-input login-input-password" />
<button name="login" type="submit" class="btn btn-primary btn-block btn-md login-btn" >
Login
</button>
</div>
</form>
你无法在 ScrapySharp 中使用 "By" 实现这一点,因为它只有四个 "Element Search Kinds" :
{
Text,
Id,
Name,
Class
}
在您的情况下,您没有其中之一,因此请考虑使用 "CssSelect" 来实现您的目的:
var form = homepage.Html.CssSelect("form[action='provider-login']");
//Or
var form = homepage.Html.CssSelect("form[action*='provider-login']");
可以通过标签找到第一个表单节点,然后使用PageWebForm构造函数:
var browser = new ScrapingBrowser();
var homepage = browser.NavigateToPage(new Uri("https://somedomain.com/ProviderLogin.action/"));
var form1node = homepage.Html.SelectSingleNode("//form");
var form1 = new PageWebForm(form1node, browser); // this is where it happens!
form1["username"] = "some username";
form1["password"] = "some password";
form1.Method = HttpVerb.Post;
var webpage = form1.Submit();