在保护评论表单和相关 API 端点时,是否应该在浏览器、服务器或两者中对输入进行清理、验证和编码?

When securing a comment form and related API endpoint, should input be sanitized, validated and encoded in browser, server or both?

我正在尝试在没有用户身份验证的非 CMS 环境中尽可能确保评论表单的安全。

该表单应对浏览器和 curl/postman 类型请求都是安全的。

环境

后端 - Node.js、MongoDB Atlas 和 Azure 网络应用程序。
前端 - jQuery.

下面是我当前工作实施的详细概述,但希望不会太过分。

以下是我对实施的问题。

使用的相关库

Helmet - helps secure Express apps by setting various HTTP headers, including Content Security Policy
reCaptcha v3 - 防止垃圾邮件和其他类型的自动滥用
DOMPurify - XSS 消毒器
validator.js - 字符串验证器和消毒器库
he - HTML 实体 encoder/decoder

一般数据流为:

/*
on click event:  
- get sanitized data
- perform some validations
- html encode the values
- get recaptcha v3 token from google
- send all data, including token, to server
- send token to google to verify
- if the response 'score' is above 0.5, add the submission to the database  
- return the entry to the client and populate the DOM with the submission   
*/ 

POST 请求 - 浏览器

// test input:  
// <script>alert("hi!")</script><h1>hello there!</h1> <a href="">link</a>

// sanitize the input  
var sanitized_input_1_text = DOMPurify.sanitize($input_1.val().trim(), { SAFE_FOR_JQUERY: true });
var sanitized_input_2_text = DOMPurify.sanitize($input_2.val().trim(), { SAFE_FOR_JQUERY: true });

// validation - make sure input is between 1 and 140 characters
var input_1_text_valid_length = validator.isLength(sanitized_input_1_text, { min: 1, max: 140 });
var input_2_text_valid_length = validator.isLength(sanitized_input_2_text, { min: 1, max: 140 });

// if validations pass
if (input_1_text_valid_length === true && input_2_text_valid_length === true) {

/* 
encode the sanitized input 
not sure if i should encode BEFORE adding to MongoDB  
or just add to database "as is" and encode BEFORE displaying in the DOM with $("#ouput").html(html_content);
*/  
var sanitized_encoded_input_1_text = he.encode(input_1_text);
var sanitized_encoded_input_2_text = he.encode(input_2_text);

// define parameters to send to database  
var parameters = {};
parameters.input_1_text = sanitized_encoded_input_1_text; 
parameters.input_2_text = sanitized_encoded_input_2_text; 

// get token from google and send token and input to database
// see:  https://developers.google.com/recaptcha/docs/v3#programmatically_invoke_the_challenge
grecaptcha.ready(function() {
    grecaptcha.execute('site-key-here', { action: 'submit' }).then(function(token) {
        parameters.token = token;
        jquery_ajax_call_to_my_api(parameters);
    });
});
}

POST 请求-服务器

var secret_key = process.env.RECAPTCHA_SECRET_SITE_KEY;
var token = req.body.token;
var url = `https://www.google.com/recaptcha/api/siteverify?secret=${secret_key}&response=${token}`;

// verify recaptcha token with google
var response = await fetch(url);
var response_json = await response.json();
var score = response_json.score;
var document = {};

/*
if google's response 'score' is greater than 0.5, 
add submission to the database and populate client DOM with $("#output").prepend(html); 
see: https://developers.google.com/recaptcha/docs/v3#interpreting_the_score
*/
if (score >= 0.5) {

    // add submission to database 
    // return submisson to client to update the DOM
    // DOM will just display this text:  <h1>hello there!</h1> <a href="">link</a>
}); 

页面加载时的 GET 请求

Logic/Assumptions:

POST 来自 curl、postman 等的请求

Logic/Assumptions:

服务器上的头盔配置

app.use(
    helmet({
        contentSecurityPolicy: {
            directives: {
                defaultSrc: ["'self'"],
                scriptSrc: ["'self'", "https://somedomain.io", "https://maps.googleapis.com", "https://www.google.com", "https://www.gstatic.com"],
                styleSrc: ["'self'", "fonts.googleapis.com", "'unsafe-inline'"],
                fontSrc: ["'self'", "fonts.gstatic.com"],
                imgSrc: ["'self'", "https://maps.gstatic.com", "https://maps.googleapis.com", "data:"],
                frameSrc: ["'self'", "https://www.google.com"]
            }
        },
    })
);

问题

  1. 我应该将值添加到 MongoDB 数据库作为 HTML 编码实体还是“按原样”存储它们并在填充 DOM 之前对它们进行编码他们?

  2. 如果值 保存为 MongoDB 中的 html 个实体,这是否会使在数据库中搜索内容变得困难因为搜索 "<h1>hello there!</h1> <a href="">link</a> 不会 return 任何结果,因为数据库中的值是 &#x3C;h1&#x3E;hello there!&#x3C;/h1&#x3E; &#x3C;a href=&#x22;&#x22;&#x3E;link&#x3C;/a&#x3E;

  3. 在我阅读有关保护 Web 表单的文章时,很多人都说客户端实践相当多余,因为 DOM 中的任何内容都可以更改,JavaScript 可以禁用,并且可以使用 curl 或 postman 直接向 API 端点发出请求,从而绕过任何客户端方法。

  4. 也就是说,应该执行清理(DOMPurify)、验证(validator.js)和编码(he):1) 仅客户端 2 ) 客户端 服务器端或 3) 仅限服务器端?

为了完整起见,这里是另一个相关问题:

在从客户端向服务器发送数据时,以下任何组件是否会自动转义或 HTML 编码?我问是因为如果他们这样做,可能会不需要一些手动转义或编码。

您应该始终不确定您使用的每个数据在使用前是否在后端进行了清理!

https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html

在阅读更多有关该主题的内容后,这是我想出的方法:

点击事件:

  • 清理数据(DOMPurify
  • 验证数据(validator.js
  • 从 google (reCaptcha v3)
  • 获取 recaptcha v3 令牌
  • 将包括令牌在内的所有数据发送到服务器
  • 服务器正在使用 Helmet
  • 服务器正在使用 Express Rate Limit and Rate Limit Mongo 将特定路由上的 POST 请求限制为每 X 毫秒(按 IP 地址)X
  • 服务器落后Cloudflare proxy which provides some security and caching features (requires setting app.set('trust proxy', true) in node server file in order for rate limiter to pick up the user's actual IP address - see Express behind proxies)
  • 从服务器向 google 发送令牌以验证 (reCaptcha v3)
  • 如果响应 'score' 高于 0.5,请再次执行相同的清理和验证
  • 如果验证通过,将 moderated 标志值为 false
  • 的条目添加到数据库

我没有立即向浏览器输入 return 个条目,而是决定需要一个手动审核过程,其中涉及将条目的 moderated 值更改为 true。虽然它消除了用户响应的即时性,但如果响应没有立即发布,它会降低对垃圾邮件发送者等的诱惑。

  • GET 页面加载请求然后 return 所有 moderated: true
  • 的条目
  • HTML 在显示之前对值进行编码 (he)
  • 用 HTML 编码条目填充 DOM

代码看起来像这样:

POST 请求 - 浏览器

// sanitize the input  
var sanitized_input_1_text = DOMPurify.sanitize($input_1.val().trim(), { SAFE_FOR_JQUERY: true });
var sanitized_input_2_text = DOMPurify.sanitize($input_2.val().trim(), { SAFE_FOR_JQUERY: true });

// validation - make sure input is between 1 and 140 characters
var input_1_text_valid_length = validator.isLength(sanitized_input_1_text, { min: 1, max: 140 });
var input_2_text_valid_length = validator.isLength(sanitized_input_2_text, { min: 1, max: 140 });

// validation - regex to only allow certain characters
// for pattern, see:  
var pattern = /^(?!.*([ ,'-]))[a-zA-Z]+(?:[ ,'-]+[a-zA-Z]+)*$/;
var input_1_text_valid_characters = validator.matches(sanitized_input_1_text, pattern, "gm");
var input_2_text_valid_characters = validator.matches(sanitized_input_2_text, pattern, "gm");

// if validations pass
if (input_1_text_valid_length === true && input_2_text_valid_length === true && input_1_text_valid_characters === true && input_2_text_valid_characters === true) {

// define parameters to send to database  
var parameters = {};
parameters.input_1_text = sanitized_input_1_text; 
parameters.input_2_text = sanitized_input_2_text; 

// get token from google and send token and input to database
// see:  https://developers.google.com/recaptcha/docs/v3#programmatically_invoke_the_challenge
grecaptcha.ready(function() {
    grecaptcha.execute('site-key-here', { action: 'submit_entry' }).then(function(token) {
        parameters.token = token;
        jquery_ajax_call_to_my_api(parameters);
    });
});
}

POST 请求-服务器

var secret_key = process.env.RECAPTCHA_SECRET_SITE_KEY;
var token = req.body.token;
var url = `https://www.google.com/recaptcha/api/siteverify?secret=${secret_key}&response=${token}`;

// verify recaptcha token with google
var response = await fetch(url);
var response_json = await response.json();
var score = response_json.score;
var document = {};

// if google's response 'score' is greater than 0.5, 
// see: https://developers.google.com/recaptcha/docs/v3#interpreting_the_score  

if (score >= 0.5) {

// perform all the same sanitizations and validations to protect against
// POST requests direct to the API via curl or postman etc  
// if validations pass, add entry to the database with `moderated: false` property   


}); 

GET 请求 - 浏览器

逻辑:

  • 获取所有包含 moderated: true 属性
  • 的条目
  • HTML 在填充之前对值进行编码 DOM

服务器上的头盔配置

app.use(
    helmet({
        contentSecurityPolicy: {
            directives: {
                defaultSrc: ["'self'"],
                scriptSrc: ["'self'", "https://maps.googleapis.com", "https://www.google.com", "https://www.gstatic.com"],
                connectSrc: ["'self'", "https://some-domain.com", "https://some.other.domain.com"],
                styleSrc: ["'self'", "fonts.googleapis.com", "'unsafe-inline'"],
                fontSrc: ["'self'", "fonts.gstatic.com"],
                imgSrc: ["'self'", "https://maps.gstatic.com", "https://maps.googleapis.com", "data:", "https://another-domain.com"],
                frameSrc: ["'self'", "https://www.google.com"]
            }
        },
    })
);

回答我在 OP 中的问题:

  1. Should I add values to the MongoDB database as HTML encoded entities OR store them "as is" and just encode them before populating the DOM with them?

只要在客户端和服务器上对输入进行了清理和验证,您只需要在填充 DOM.

之前 HTML 进行编码
  1. If the values were to be saved as html entities in MongoDB, would this make searching the database for content difficult because searching for, for example <h1>hello there!</h1> <a href="">link</a> wouldn't return any results because the value in the database was &#x3C;h1&#x3E;hello there!&#x3C;/h1&#x3E; &#x3C;a href=&#x22;&#x22;&#x3E;link&#x3C;/a&#x3E;

我认为如果数据库条目充满了 HTML 编码值,它们会看起来很乱,所以我“按原样”存储经过清理、验证的条目。

  1. In my reading about securing web forms, much has been said about client side practises being fairly redundant as anything can be changed in the DOM, JavaScript can be disabled, and requests can be made directly to the API endpoint using curl or postman and therefore bypass any client side approaches.

  2. With that said should sanitization (DOMPurify), validation (validator.js) and encoding (he) be performed either: 1) client side only 2) client side and server side or 3) server side only?

选项 2,清理并验证客户端 服务器上的输入。