在控制器上过滤以检查用户代理,然后根据结果是否为真进行重定向
Filter on Controller to check User Agent and then redirect based on if result is true
------------ 注意(编辑)-
我可能完全错了,如果这实际上是错误的,任何指导将不胜感激(mvc 的新手)
在解决方案中,存在一个 robots.txt 文件来阻止站点中的所有抓取工具。唯一的问题是,Facebook crawler/scraper 没有遵守规则,仍然是 crawling/scraping 网站,导致每隔几分钟记录一次错误并发送电子邮件。为此发送的错误是“在控制器 'SolutionName.Web.Controllers.QuoteController' 上未找到 public 操作方法 'Customer'。”
这个问题的解决方案是在控制器上创建一个过滤器来检查代理名称。如果代理名称用于 facebook,则将它们重定向到“无机器人身份验证页面”。过滤器必须在控制器上,因为该站点可满足 3 条不同的路线,其中每条路线都有自定义 link 并且客户可以访问在 facebook 上共享的直接 links(从而创建一条路线为此,在路由配置中将不起作用)。
我面临的问题是解决方案没有在控制器过滤器上立即重定向。它正在加入操作方法(这些操作方法是部分页面),然后由于无法重定向而失败(视图已经开始呈现 - 这是正确的)。 有没有办法在第一次访问这个过滤器时立即重定向?或者是否有更好的解决方案?
为了测试和排除故障,我正在更改代码中的用户代理以匹配记录的内容。
从过滤器重定向时的错误:“不允许子操作执行重定向操作。”
当前由于 Facebook 的爬虫记录的错误:“在控制器 'SolutionName.Web.Controllers.QuoteController' 上找不到 public 操作方法 'Customer'。”
来自堆栈跟踪的用户代理:
这是我所做的:
自定义过滤器:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Reflection;
using System.Web;
using System.Web.Mvc;
namespace SolutionName.Web.Classes
{
public class UserAgentActionFilterAttribute : ActionFilterAttribute
{
public override void OnActionExecuting(ActionExecutingContext filterContext)
{
try
{
List<string> Crawlers = new List<string>()
{
"facebookexternalhit/1.1","facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)","facebookexternalhit/1.1","Facebot"
};
string userAgent = HttpContext.Current.Request.UserAgent.ToLower();
bool iscrawler = Crawlers.Exists(x => userAgent.Contains(x));
if (userAgent != null && iscrawler)
{
filterContext.Result = new RedirectResult("~/Home/NoRobotsAuthentication");
return;
}
base.OnActionExecuting(filterContext);
}
catch (Exception errException)
{
LogHelper.LogException(Severity.Error, errException);
SessionHelper.PolicyBase = null;
SessionHelper.ClearQuoteSession();
filterContext.Result = new RedirectResult("~/Home/NoRobotsAuthentication");
return;
}
}
}
}
NoRobotsAuthentication.cshtml:
@{
ViewBag.PageTitle = "Robots not authorized";
Layout = "~/Views/Shared/_LayoutClean2.cshtml";
}
<div class="container body-content">
<div class="row">
<div class="col-lg-12 col-md-12 col-sm-12 col-xs-12 container-solid">
<div class="form-horizontal">
<h3>@ViewBag.NotAuthorized</h3>
</div>
</div>
</div>
无机器人操作方法:
#region Bot Detection
public ActionResult NoRobotsAuthentication()
{
ViewBag.NotAuthorized = "Robots / Scrapers not authorized!";
return View();
}
#endregion
我要检查的控制器之一:
namespace SolutionName.Web.Controllers
{
[UserAgentActionFilter]
public class QuoteController : Controller
{
public ActionResult Customer()
{ //Some logic }
}
}
过滤器为运行:
时出现错误的部分页面ActionResult
public ActionResult _Sidebar()
{
var model = SessionHelper.PolicyBase;
return PartialView("_Sidebar", model);
}
这是因为您使用的是 ActionFilterAttribute
。如果您在此处查看文档:https://docs.microsoft.com/en-us/aspnet/core/mvc/controllers/filters?view=aspnetcore-3.1 它解释了过滤器的生命周期,基本上 - 当您到达操作过滤器时,为时已晚。您需要一个授权过滤器或资源过滤器,以便您可以 short-circuit 请求。
Each filter type is executed at a different stage in the filter
pipeline:
Authorization Filters
- Authorization filters run first and are used to determine whether the user is authorized for the request.
- Authorization filters short-circuit the pipeline if the request is not authorized.
Resource filters
- Run after authorization.
- OnResourceExecuting runs code before the rest of the filter pipeline. For example, OnResourceExecuting runs code before model binding.
- OnResourceExecuted runs code after the rest of
the pipeline has completed.
下面的示例取自文档,它是资源过滤器的一个实现。据推测,授权过滤器可以实现类似的实现,但我相信在授权过滤器失败后返回有效的 Http 状态代码可能有点 anti-pattern.
// See that it's implementing IResourceFilter
public class ShortCircuitingResourceFilterAttribute : Attribute, IResourceFilter
{
public void OnResourceExecuting(ResourceExecutingContext context)
{
context.Result = new ContentResult()
{
Content = "Resource unavailable - header not set."
};
}
public void OnResourceExecuted(ResourceExecutedContext context)
{
}
}
我已尝试将其与您提供的内容合并 - 请注意,这可能无法开箱即用。
public class ShortCircuitingResourceFilterAttribute : Attribute, IResourceFilter
{
public void OnResourceExecuting(ResourceExecutingContext context)
{
try
{
// You had duplicates in your list, try to use Hashset for .Contains methods
var crawlerSet = new Hashset<string>()
{
"facebookexternalhit/1.1",
"facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)",
"Facebot"
};
string userAgent = HttpContext.Current.Request.UserAgent;
// You're unnecessarily and incorrectly checking if the userAgent is null multiple times
// if it's null it'll fail when you're .ToLower()'ing it.
if (!string.IsNullOrEmpty(userAgent) && crawlerSet.Contains(userAgent.ToLower()))
{
// Some crawler
context.Result = new RedirectResult("~/Home/NoRobotsAuthentication");
}
}
catch (Exception errException)
{
LogHelper.LogException(Severity.Error, errException);
SessionHelper.PolicyBase = null;
SessionHelper.ClearQuoteSession();
context.Result = new RedirectResult("~/Home/NoRobotsAuthentication");
}
}
public void OnResourceExecuted(ResourceExecutedContext context)
{
}
}
------------ 注意(编辑)- 我可能完全错了,如果这实际上是错误的,任何指导将不胜感激(mvc 的新手)
在解决方案中,存在一个 robots.txt 文件来阻止站点中的所有抓取工具。唯一的问题是,Facebook crawler/scraper 没有遵守规则,仍然是 crawling/scraping 网站,导致每隔几分钟记录一次错误并发送电子邮件。为此发送的错误是“在控制器 'SolutionName.Web.Controllers.QuoteController' 上未找到 public 操作方法 'Customer'。”
这个问题的解决方案是在控制器上创建一个过滤器来检查代理名称。如果代理名称用于 facebook,则将它们重定向到“无机器人身份验证页面”。过滤器必须在控制器上,因为该站点可满足 3 条不同的路线,其中每条路线都有自定义 link 并且客户可以访问在 facebook 上共享的直接 links(从而创建一条路线为此,在路由配置中将不起作用)。
我面临的问题是解决方案没有在控制器过滤器上立即重定向。它正在加入操作方法(这些操作方法是部分页面),然后由于无法重定向而失败(视图已经开始呈现 - 这是正确的)。 有没有办法在第一次访问这个过滤器时立即重定向?或者是否有更好的解决方案?
为了测试和排除故障,我正在更改代码中的用户代理以匹配记录的内容。 从过滤器重定向时的错误:“不允许子操作执行重定向操作。”
当前由于 Facebook 的爬虫记录的错误:“在控制器 'SolutionName.Web.Controllers.QuoteController' 上找不到 public 操作方法 'Customer'。”
来自堆栈跟踪的用户代理:
这是我所做的:
自定义过滤器:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Reflection;
using System.Web;
using System.Web.Mvc;
namespace SolutionName.Web.Classes
{
public class UserAgentActionFilterAttribute : ActionFilterAttribute
{
public override void OnActionExecuting(ActionExecutingContext filterContext)
{
try
{
List<string> Crawlers = new List<string>()
{
"facebookexternalhit/1.1","facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)","facebookexternalhit/1.1","Facebot"
};
string userAgent = HttpContext.Current.Request.UserAgent.ToLower();
bool iscrawler = Crawlers.Exists(x => userAgent.Contains(x));
if (userAgent != null && iscrawler)
{
filterContext.Result = new RedirectResult("~/Home/NoRobotsAuthentication");
return;
}
base.OnActionExecuting(filterContext);
}
catch (Exception errException)
{
LogHelper.LogException(Severity.Error, errException);
SessionHelper.PolicyBase = null;
SessionHelper.ClearQuoteSession();
filterContext.Result = new RedirectResult("~/Home/NoRobotsAuthentication");
return;
}
}
}
}
NoRobotsAuthentication.cshtml:
@{
ViewBag.PageTitle = "Robots not authorized";
Layout = "~/Views/Shared/_LayoutClean2.cshtml";
}
<div class="container body-content">
<div class="row">
<div class="col-lg-12 col-md-12 col-sm-12 col-xs-12 container-solid">
<div class="form-horizontal">
<h3>@ViewBag.NotAuthorized</h3>
</div>
</div>
</div>
无机器人操作方法:
#region Bot Detection
public ActionResult NoRobotsAuthentication()
{
ViewBag.NotAuthorized = "Robots / Scrapers not authorized!";
return View();
}
#endregion
我要检查的控制器之一:
namespace SolutionName.Web.Controllers
{
[UserAgentActionFilter]
public class QuoteController : Controller
{
public ActionResult Customer()
{ //Some logic }
}
}
过滤器为运行:
时出现错误的部分页面ActionResult public ActionResult _Sidebar()
{
var model = SessionHelper.PolicyBase;
return PartialView("_Sidebar", model);
}
这是因为您使用的是 ActionFilterAttribute
。如果您在此处查看文档:https://docs.microsoft.com/en-us/aspnet/core/mvc/controllers/filters?view=aspnetcore-3.1 它解释了过滤器的生命周期,基本上 - 当您到达操作过滤器时,为时已晚。您需要一个授权过滤器或资源过滤器,以便您可以 short-circuit 请求。
Each filter type is executed at a different stage in the filter pipeline:
Authorization Filters
- Authorization filters run first and are used to determine whether the user is authorized for the request.
- Authorization filters short-circuit the pipeline if the request is not authorized.
Resource filters
- Run after authorization.
- OnResourceExecuting runs code before the rest of the filter pipeline. For example, OnResourceExecuting runs code before model binding.
- OnResourceExecuted runs code after the rest of the pipeline has completed.
下面的示例取自文档,它是资源过滤器的一个实现。据推测,授权过滤器可以实现类似的实现,但我相信在授权过滤器失败后返回有效的 Http 状态代码可能有点 anti-pattern.
// See that it's implementing IResourceFilter
public class ShortCircuitingResourceFilterAttribute : Attribute, IResourceFilter
{
public void OnResourceExecuting(ResourceExecutingContext context)
{
context.Result = new ContentResult()
{
Content = "Resource unavailable - header not set."
};
}
public void OnResourceExecuted(ResourceExecutedContext context)
{
}
}
我已尝试将其与您提供的内容合并 - 请注意,这可能无法开箱即用。
public class ShortCircuitingResourceFilterAttribute : Attribute, IResourceFilter
{
public void OnResourceExecuting(ResourceExecutingContext context)
{
try
{
// You had duplicates in your list, try to use Hashset for .Contains methods
var crawlerSet = new Hashset<string>()
{
"facebookexternalhit/1.1",
"facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)",
"Facebot"
};
string userAgent = HttpContext.Current.Request.UserAgent;
// You're unnecessarily and incorrectly checking if the userAgent is null multiple times
// if it's null it'll fail when you're .ToLower()'ing it.
if (!string.IsNullOrEmpty(userAgent) && crawlerSet.Contains(userAgent.ToLower()))
{
// Some crawler
context.Result = new RedirectResult("~/Home/NoRobotsAuthentication");
}
}
catch (Exception errException)
{
LogHelper.LogException(Severity.Error, errException);
SessionHelper.PolicyBase = null;
SessionHelper.ClearQuoteSession();
context.Result = new RedirectResult("~/Home/NoRobotsAuthentication");
}
}
public void OnResourceExecuted(ResourceExecutedContext context)
{
}
}