使用 cURL 检索网站 HTML 页面以及安全页面上的当前会话和 cookie 数据

Retrieving website HTML page using cURL with current session and cookie data on secured page

问题完全修改于:2 月 19 日

我想要的东西(简而言之):

我想使用 cURL 获得一个 HTML 页面,该页面受用户登录保护(在 cURL 请求时 用户已登录并有权访问页面).

更详细:

情况是用户正在访问类似 index.php?w=2344&y=lalala&x=something 的网页(通过安全脚本 class.Firewizz.Security.php)。在该页面上有一个 "print as pdf" 按钮。这会将用户发送到页面 getPDF.php 此页面查看请求的来源并使用 cURL 获取该页面,并且该输出将作为 PDF 打印发送到浏览器。

但现在我在 getPDF.php 页面中将页面变量设置为静态,因此它不会检查引荐来源网址,而且我 100% 确定它尝试获取的页面是正确的。

此外,输出只是按原样回显,尚未转换为 PDF,这样就不会干扰问题。

现在的预期输出与用户转到该页面时的预期输出相同。除非情况并非如此,否则用户将一无所获。

我们知道什么? 我们知道 $_SESSION 数据没有发送到 cURL,我知道这一点是事实,因为我在输出文件上回显了 $_SESSION 数据,表明它们是空的。

经过大量尝试,我们仍然没有任何解决方案,仍然没有“$_SESSION”数据。

我不想以任何方式破坏安全脚本,所以解决方案 "remove ini_set('session.use_only_cookies', 1); is NOT what i am looking for."

根据要求(对于那些专门提供帮助的人),我可以发送完整的脚本文件,但我会 post 下面的相关片段。

class.Firewizz.Security.php

<?php

/*
 * Firewizz UserLogin
 */

namespace Firewizz;



class Security
{

     // Start the session, with Cookie data
    public function Start_Secure_Session()
    {
        // Forces sessions to only use cookies.
        ini_set('session.use_only_cookies', 1);

        // Gets current cookies params
        $cookieParams = session_get_cookie_params();

        // Set Cookie Params
        session_set_cookie_params($cookieParams["lifetime"], $cookieParams["path"], $cookieParams["domain"], $this->isHTTPS, $this->deny_java_session_id);
        // Sets the session name
        session_name($this->session_name);

        // Start the php session
        session_start();

        // If new session or expired, generate new id
        if (!isset($_SESSION['new_session']))
        {
            $_SESSION['new_session'] = "true";

            // regenerate the session, delete the old one.
            session_regenerate_id(true);
        }
    }

    // Check of user is logged in to current session, return true or false;
    public function LOGGED_IN()
    {
        return $this->_login_check();
    }

    public function LOGOUT()
    {
    // Unset all session values
        $_SESSION = array();

        // get session parameters
        $params = session_get_cookie_params();

        // Delete the actual cookie.
        setcookie(session_name(), '', time() - 42000, $params["path"], $params["domain"], $params["secure"], $params["httponly"]);
        // Destroy session
        session_destroy();
        if (!headers_sent())
        {
            header("Location: " . $this->login_string, true);
        }
        else
        {
            echo '<script>window.location="/"</script>';
        }
    }

    // Must pass variables or send to login page!
    public function BORDER_PATROL($user_has_to_be_logged_in, $page_loaded_from_index)
    {
        $pass_border_partrol = true;

        if (!$this->LOGGED_IN() && $user_has_to_be_logged_in)
        {
            $pass_border_partrol = false;
        }
        if (filter_input(INPUT_SERVER, "PHP_SELF") != "/index.php" && $page_loaded_from_index)
        {
            $pass_border_partrol = false;
        }

        // Kick to login on fail
        if (!$pass_border_partrol)
        {
            $this->LOGOUT();
            exit();
        }

    }

    // Catch login, returns fail string or false if no errors
    public function CATCH_LOGIN()
    {
        if (filter_input(INPUT_POST, "id") == "login" && filter_input(INPUT_POST, "Verzenden") == "Verzenden")
        {
            // Variables from form.
            $email = filter_input(INPUT_POST, "email");
            $sha512Pass = filter_input(INPUT_POST, "p");

            // Database variables
            $db_accounts = mysqli_connect($this->mySQL_accounts_host, $this->mySQL_accounts_username, $this->mySQL_accounts_password, $this->mySQL_accounts_database);

            // Prepage sql
            if ($stmt = $db_accounts->prepare("SELECT account_id, verified, blocked ,login_email, login_password, login_salt, user_voornaam, user_tussenvoegsel, user_achternaam FROM accounts WHERE login_email = ? LIMIT 1"))
            {
                $stmt->bind_param('s', $email); // Bind "$email" to parameter.
                $stmt->execute(); // Execute the prepared query.
                $stmt->store_result();

                $stmt->bind_result($user_id, $verified, $blocked, $email, $db_password, $salt, $voornaam, $tussenvoegsel, $achternaam); // get variables from result.

                $stmt->fetch();
                $password = hash('sha512', $sha512Pass . $salt); // hash the password with the unique salt.
                $tussen = ' ';
                if ($tussenvoegsel != "")
                {
                    $tussen = " " . $tussenvoegsel . " ";
                }
                $username = $voornaam . $tussen . $achternaam;



                if ($stmt->num_rows == 1)
                { // If the user exists
                    // Check blocked
                    if ($blocked == "1")
                    {
                        return 'Deze acount is geblokkeerd, neem contact met ons op.';
                    }

                    // We check if the account is locked from too many login attempts
                    if ($this->_checkBrute($user_id, $db_accounts) == true)
                    {
                        // Account is locked
                        // Send an email to user saying their account is locked
                        return "Te vaak fout ingelogd,<br />uw account is voor " . $this->blockout_time . " minuten geblokkerd.";
                    }
                    else
                    {
                        if ($db_password == $password && $verified == 1)
                        {
                            // Password is correct!, update lastLogin
                            if ($stmt = $db_accounts->prepare("UPDATE accounts SET date_lastLogin=? WHERE account_id=?"))
                            {
                                $lastlogin = date("Y-m-d H:i:s");

                                $stmt->bind_param('ss', $lastlogin, $user_id); // Bind "$email" to parameter.
                                $stmt->execute();
                                $stmt->close();
                            }

                            $ip_address = $_SERVER['REMOTE_ADDR']; // Get the IP address of the user.
                            $user_browser = $_SERVER['HTTP_USER_AGENT']; // Get the user-agent string of the user.

                            $user_id = preg_replace("/[^0-9]+/", "", $user_id); // XSS protection as we might print this value
                            $_SESSION['user_id'] = $user_id;
                            $username = $username; // XSS protection as we might print this value
                            $_SESSION['username'] = $username;
                            $_SESSION['login_string'] = hash('sha512', $password . $ip_address . $user_browser);
                            // Login successful.

                            if ($this->MailOnLogin != FALSE)
                            {
                                mail($this->MailOnLogin, 'SECUREPLAY - LOGIN', $username . ' logged in to the secureplay platform..');
                            }
                            return false;
                        }
                        else
                        {
                            // Password is not correct
                            // We record this attempt in the database
                            $now = time();
                            $db_accounts->query("INSERT INTO login_attempts (userID, timestamp) VALUES (" . $user_id . ", " . $now . ")");

                            return "Onbekende gebruikersnaam en/of wachtwoord.";
                        }
                    }
                }
                else
                {
                    return "Onbekende gebruikersnaam en/of wachtwoord.";
                }
            }
            else
            {
                return 'SQL FAIL! ' . mysqli_error($db_accounts);
            }
            return "Onbekende fout!";
        }


        return false;
    }

    private function _checkBrute($user_id, $db_accounts)
    {
        // Get timestamp of current time
        $now = time();
        // All login attempts are counted from the past 2 hours.
        $valid_attempts = $now - ($this->blockout_time * 60);

        if ($stmt = $db_accounts->prepare("SELECT timestamp FROM login_attempts WHERE userID = ? AND timestamp > $valid_attempts"))
        {
            $stmt->bind_param('i', $user_id);
            // Execute the prepared query.
            $stmt->execute();
            $stmt->store_result();
            // If there has been more than 5 failed logins
            if ($stmt->num_rows > $this->max_login_fails)
            {
                return true;
            }
            else
            {
                return false;
            }
        }
        else
        {
            return true;
        }
    }

    // Login Check if user is logged in correctly
    private function _login_check()
    {
        // Database variables
        $db_accounts = mysqli_connect($this->mySQL_accounts_host, $this->mySQL_accounts_username, $this->mySQL_accounts_password, $this->mySQL_accounts_database);

        // Check if all session variables are set
        if (isset($_SESSION['user_id'], $_SESSION['username'], $_SESSION['login_string']))
        {
            $user_id = $_SESSION['user_id'];
            $login_string = $_SESSION['login_string'];
            $username = $_SESSION['username'];
            $ip_address = $_SERVER['REMOTE_ADDR']; // Get the IP address of the user.
            $user_browser = $_SERVER['HTTP_USER_AGENT']; // Get the user-agent string of the user.

            if ($stmt = $db_accounts->prepare("SELECT login_password FROM accounts WHERE account_id = ? LIMIT 1"))
            {
                $stmt->bind_param('i', $user_id); // Bind "$user_id" to parameter.
                $stmt->execute(); // Execute the prepared query.
                $stmt->store_result();

                if ($stmt->num_rows == 1)
                { // If the user exists
                    $stmt->bind_result($password); // get variables from result.
                    $stmt->fetch();
                    $login_check = hash('sha512', $password . $ip_address . $user_browser);
                    if ($login_check == $login_string)
                    {
                        // Logged In!!!!
                        return $user_id;
                    }
                    else
                    {
                        // Not logged in
                        return false;
                    }
                }
                else
                {
                    // Not logged in
                    return false;
                }
            }
            else
            {
                // Not logged in
                //die("f3");
                return false;
            }
        }
        else
        {
            // Not logged in
            return false;
        }
    }

}

secured_page

<?php
require_once 'assets/class.Firewizz.Security.php';

if (!isset($SECURITY))
{
    $SECURITY = new Firewizz\Security();
}

// Check if user is logged in or redirect to login page;
$SECURITY->BORDER_PATROL(true, true);


// CONTENT bla bla

?>

getPDF.php

<?php
// Requires
require_once 'assets/class.FirePDF.php';
require_once 'assets/class.Firewizz.Security.php';
$SECURITY = new \Firewizz\Security();
$SECURITY->Start_Secure_Session();

// Html file to scrape, if this works replace with referer so the page that does the request gets printed.(prepend by security so it can only be done from securePlay
$html_file = 'http://www.website.nl/?p=overzichten&sort=someSort&s=67';

// Output pdf filename
$pdf_fileName = 'Test_Pdf.pdf';

/*
 * cURL part
 */

// create curl resource
$ch = curl_init();

// set source url
curl_setopt($ch, CURLOPT_URL, $html_file);

// set cookies
$cookiesIn = "user_id=" . $_SESSION['user_id'] . "; username=" . $_SESSION['username'] . "; login_string=" . $_SESSION['login_string'] . ";";

// set cURL Options
$tmp = tempnam("/tmp", "CURLCOOKIE");
if ($tmp === FALSE)
{
    die('Could not generate a temporary cookie jar.');
}

$options = array(
    CURLOPT_RETURNTRANSFER => true, // return web page
    //CURLOPT_HEADER => true, //return headers in addition to content
    CURLOPT_ENCODING => "", // handle all encodings
    CURLOPT_AUTOREFERER => true, // set referer on redirect
    CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
    CURLOPT_TIMEOUT => 120, // timeout on response
    CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
    CURLINFO_HEADER_OUT => true,
    CURLOPT_SSL_VERIFYPEER => false, // Disabled SSL Cert checks
    CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
    CURLOPT_COOKIEJAR => $tmp,
    //CURLOPT_COOKIEFILE => $tmp,
    CURLOPT_COOKIE => $cookiesIn
);

// $output contains the output string
curl_setopt_array($ch, $options);
$output = curl_exec($ch);

// close curl resource to free up system resources
curl_close($ch);

// output the cURL
echo $output;
?>

我们如何测试

当前测试是通过让用户登录并转到我们希望使用 cURL 获取的正确页面并验证他是否看到该页面(有效)来完成的。现在我们 运行 新标签页中的 getPDF.php 页面。由于安全故障,我们在其中看到一个空白页面。如果我们添加 echo "session data:" . $_SESSION["login_string"];在安全脚本中,我们看到 $_SESSION 中的变量是空白的。当我们在 getPDF.php 中粘贴相同的行时,我们看到它被设置在那里。所以我们知道一个事实是没有被cURL转移。

一些简短的信息。

所以您的 $cookiesIn 需要定义您的 cookie。我将根据您的代码片段制作一个示例:

$cookiesIn = "user_id=" . $_SESSION['user_id'] . "; username=" . $_SESSION['username'] . "; login_string=" . $_SESSION['login_string'] . ";";

尝试在您的 pdfCreator 页面中进行设置。将 $cookiesIn = ""; 替换为上面的行,看看是否会给您带来不同的结果。

另外,这里有一个很好的 cURL 选项参考 cookie:

https://curl.haxx.se/libcurl/c/CURLOPT_COOKIE.html

如果您希望只发送所有 cookie 而不是指定它们,请使用此代码:

$tmp = tempnam("/tmp", "CURLCOOKIE");
if($tmp === FALSE) die('Could not generate a temporary cookie jar.');

$options = array(
    CURLOPT_RETURNTRANSFER => true, // return web page
    //CURLOPT_HEADER => true, //return headers in addition to content
    CURLOPT_ENCODING => "", // handle all encodings
    CURLOPT_AUTOREFERER => true, // set referer on redirect
    CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
    CURLOPT_TIMEOUT => 120, // timeout on response
    CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
    CURLINFO_HEADER_OUT => true,
    CURLOPT_SSL_VERIFYPEER => false, // Disabled SSL Cert checks
    CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
    CURLOPT_COOKIEJAR => $tmp,
    CURLOPT_COOKIEFILE => $tmp,
);

此代码将使用 COOKIEJAR 选项转储所有当前已知的用于 cURL 的 cookie。然后,当我们指定 COOKIEFILE 时,我们是在指定 cURL 应该在何处查找以将 cookie 包含在请求中。

就是说,我已经删除了 $cookiesIn 参考,因为如果您使用上面的代码,则不需要它。

ok解决了

经过大量研究。

Cookie 数据已传递,但不会使其成为会话数据.. 这是使用以下方法修复的:

private function Cookie2Session($name)
{
    if (filter_input(INPUT_COOKIE, $name))
    {
        $_SESSION[$name] = filter_input(INPUT_COOKIE, $name);
    }
}

// following lines put within the BORDER_PATROL Method
if (filter_input(INPUT_COOKIE, 'pdfCurl'))
{
    $this->Cookie2Session('user_id');
    $this->Cookie2Session('username');
    $this->Cookie2Session('login_string');
    $this->Cookie2Session('REMOTE_ADDR');
    $this->Cookie2Session('HTTP_USER_AGENT');
    $_SESSION['new_session'] = "true";
}

方法的小改动_login_check()

// Login Check if user is logged in correctly
private function _login_check()
{
    // Database variables
    $db_accounts = mysqli_connect($this->mySQL_accounts_host, $this->mySQL_accounts_username, $this->mySQL_accounts_password, $this->mySQL_accounts_database);

    // Check if all session variables are set
    if (isset($_SESSION['user_id'], $_SESSION['username'], $_SESSION['login_string']))
    {
        $user_id = $_SESSION['user_id'];
        $login_string = $_SESSION['login_string'];
        $username = $_SESSION['username'];
        $ip_address = $_SERVER['REMOTE_ADDR']; // Get the IP address of the user.
        $user_browser = $_SERVER['HTTP_USER_AGENT']; // Get the user-agent string of the user.

// =====>> add this code, because cURL req comes from server. <<=====
        if (isset($_SESSION["REMOTE_ADDR"]) && ($_SERVER['REMOTE_ADDR'] == $_SERVER['SERVER_ADDR']))
        {
            $ip_address = $_SESSION["REMOTE_ADDR"];
        }

// {rest of code}

getPHP.php 文件的小更新:

<?php
// Requires
require_once 'assets/class.FirePDF.php';
require_once 'assets/class.Firewizz.Security.php';
$SECURITY = new \Firewizz\Security();
$SECURITY->Start_Secure_Session();

// Html file to scrape, if this works replace with referer so the page that does the request gets printed.(prepend by security so it can only be done from securePlay
$html_file = 'http://www.secureplay.nl/?p=overzichten&sort=SpeelplaatsInspecties&s=67';

// Output pdf filename
$pdf_fileName = 'Test_Pdf.pdf';

/*
 * cURL part
 */

// create curl resource
$ch = curl_init();

// set source url
curl_setopt($ch, CURLOPT_URL, $html_file);

// set cookies
$cookiesIn = "user_id=" . $_SESSION['user_id'] . "; username=" . $_SESSION['username'] . "; login_string=" . $_SESSION['login_string'] . "; pdfCurl=true; REMOTE_ADDR=" . $_SERVER['REMOTE_ADDR'] . "; HTTP_USER_AGENT=" . $_SERVER['HTTP_USER_AGENT'];
$agent = $_SERVER['HTTP_USER_AGENT'];

// set cURL Options
$tmp = tempnam("/tmp", "CURLCOOKIE");
if ($tmp === FALSE)
{
    die('Could not generate a temporary cookie jar.');
}

$options = array(
    CURLOPT_RETURNTRANSFER => true, // return web page
    //CURLOPT_HEADER => true, //return headers in addition to content
    CURLOPT_ENCODING => "", // handle all encodings
    CURLOPT_AUTOREFERER => true, // set referer on redirect
    CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
    CURLOPT_TIMEOUT => 120, // timeout on response
    CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
    CURLINFO_HEADER_OUT => true,
    CURLOPT_SSL_VERIFYPEER => false, // Disabled SSL Cert checks
    CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
    CURLOPT_COOKIEJAR => $tmp,
    //CURLOPT_COOKIEFILE => $tmp,
    CURLOPT_COOKIE => $cookiesIn,
    CURLOPT_USERAGENT => $agent
);

// $output contains the output string
curl_setopt_array($ch, $options);
$output = curl_exec($ch);

// close curl resource to free up system resources
curl_close($ch);

// output the cURL
echo $output;
?>

有了以上知识,您完全可以使用 cURL 访问包含当前会话数据的安全页面,而对您的安全性的影响很小。

在这种情况下,如果 session 控制算法是合理的,您只需更改页面发送的格式。

使用 cURL 到 re-fetch 页面是一种方法,但它看起来像一个 XY 问题;您实际上 不想 使用 cURL,您想要控制输出格式,HTML 或 PDF。

一个可行的选择是在添加特定参数后重新加载页面,该参数将被注入页面上下文并修改输出函数。例如,您可以将整个页面包装在输出缓冲气泡中:

// Security checks as usual, then:

if (array_key_exists('output', $_GET)) {
    $format = $_GET['output']; // e.g. "pdf"
    // We could check whether the response handler has a printAs<FORMAT> method
    switch ($format) {
        case 'pdf': $outputFn = 'printAsPDF'; break;
        default:
            throw new \Exception("Output in {$format} format not supported");
    }
    ob_start($output);
}
// Page is generated normally

'printAsPDF' 输出将接收页面内容,并使用类似 dompdf 或 wkhtml2pdf 的格式将其格式化为 PDF 文件,添加适当的 Content-Type headers,以及 return 格式化的 PDF。

安全性不变,修改实际上可以在请求解码阶段实现。其他 objects 可以访问具有当前使用的输出格式的状态变量,这使他们能够根据情况做出不同的行为(例如, generateMenu() 函数可能会选择立即 return 而不是显示在 PDF 中没有意义的内容)。