无法在 MySQL 数据库中插入包含大量 Twitter 响应字符串的多行?
Unable to insert multiple rows containing huge Twitter Response strings in MySQL DB?
以下代码将 200 条最近的推文(JSON 由 Twitter API 提供)添加到 Twitter 用户的数据库中。这里的 $users
数组只包含一个用户(例如:@katyperry),但最终它会包含更多用户。对于数组中的每个用户,通过 Twitter API 引入了 200 条推文。所有这些数据收集工作正常。
现在的问题是:对于每个用户,我在 MySQL 数据库中插入 200 条推文(我必须只使用 MySQL,不其他选择)table。现在我明白每个 TwitterResp
JSON 字符串化都是巨大的(也许这就是问题所在)。
TwitterResp 示例:
{"created_at":"Thu Jul 23 18:25:30 +0000 2015","id":624284214704390145,"id_str":"624284214704390145","text":"when your fragrance is ud83dudd25#madpotion https://t.co/UfyPQIwIj4","source":"<a href="http://instagram.com" rel="nofollow">Instagram</a>","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":21447363,"id_str":"21447363","name":"KATY PERRY","screen_name":"katyperry","location":"","description":"CURRENTLYu2728BEAMINGu2728ON THE PRISMATIC WORLD TOUR 2014/2015!","url":"http://t.co/fxFJjKX30d","entities":{"url":{"urls":[{"url":"http://t.co/fxFJjKX30d","expanded_url":"http://www.katyperry.com","display_url":"katyperry.com","indices":[0,22]}]},"description":{"urls":[]}},"protected":false,"followers_count":73404466,"friends_count":157,"listed_count":143175,"created_at":"Fri Feb 20 23:45:56 +0000 2009","favourites_count":1663,"utc_offset":-28800,"time_zone":"Alaska","geo_enabled":false,"verified":true,"statuses_count":6566,"lang":"en","contributors_enabled":false,"is_translator":false,"is_translation_enabled":true,"profile_background_color":"CECFBC","profile_background_image_url":"http://pbs.twimg.com/profile_background_images/378800000168797027/kSZ-ewZo.jpeg","profile_background_image_url_https":"https://pbs.twimg.com/profile_background_images/378800000168797027/kSZ-ewZo.jpeg","profile_background_tile":false,"profile_image_url":"http://pbs.twimg.com/profile_images/609748341119844352/7dUd606e_normal.png","profile_image_url_https":"https://pbs.twimg.com/profile_images/609748341119844352/7dUd606e_normal.png","profile_banner_url":"https://pbs.twimg.com/profile_banners/21447363/1428015534","profile_link_color":"D55732","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"78C0A8","profile_text_color":"5E412F","profile_use_background_image":true,"has_extended_profile":false,"default_profile":false,"default_profile_image":false,"following":true,"follow_request_sent":false,"notifications":false},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":5366,"favorite_count":10510,"entities":{"hashtags":[{"text":"madpotion","indices":[24,34]}],"symbols":[],"user_mentions":[],"urls":[{"url":"https://t.co/UfyPQIwIj4","expanded_url":"https://instagram.com/p/5fRc5mP-YB/","display_url":"instagram.com/p/5fRc5mP-YB/","indices":[35,58]}]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"lang":"en"}
所以当我在一个循环中插入时,如代码所示,TwitterResp
是一个巨大的字符串,最后我在 table 中看到类似 154 行的内容不是 200。对于其他一些用户,我看到 186 而不是 200 等等。现在无论我 运行 katy perry 的代码多少次,我都只得到 154,其他用户也类似。我想知道为什么会这样?这个循环的插入过程是否很慢,因为它会跳过一些行插入大量字符串?
$users = array("result" => ["@katyperry"]);
foreach ($users['result'] as $user) {
// strip the initial character '@' and get 200 Twitter Responses for that screen-name.
$twitterResp = getTweet(substr($user, 1), 200);
$count = 1;
mysqli_query(getConnection(), "START TRANSACTION;");
foreach($twitterResp as $response){
$object = $response;
$query = "INSERT INTO Tweets(Number, TwitterHandle, TwitterResp) VALUES('".(string)$count."', '".$user."', '".json_encode($object)."');";
$count += 1;
$res = mysqli_query(getConnection(),$query);
}
mysqli_query(getConnection(), "COMMIT;");
}
PS:我还尝试通过基本上添加大量 VALUES()、VALUES()、..... 在一个查询中添加推文。
那也没用。
我该如何解决这个问题?有什么建议吗?
首先,修改您的代码以处理 MySQL 错误。这肯定会给你提示哪里出了问题。
$res = mysqli_query(getConnection(),$query);
if(false === $res) {
echo "Insertion error: " . mysqli_error();
}
我的猜测是您超过了 TEXT
或 BLOB
类型的最大长度,无论您在 TwitterResp 栏中使用了什么。
TEXT
或 BLOB
类型的数据长度似乎是无限的,但事实并非如此。 TEXT
/ BLOB
最多可处理 65536 字节 (~64KB),而 MEDIUMTEXT
/ MEDIUMBLOB
的容量约为 16MB,而 LONGTEXT
/ LONGBLOB
高达 ~4GB。
请注意,这些只是类型限制,您还必须考虑连接缓冲区的大小、可用内存的数量等,这也可能导致数据截断。有关详细信息,请参阅 MySQL documentation。
总之,您可以尝试将列类型更改为容量更大的MEDIUMTEXT
或MEDIUMBLOB
。然而,如果这还不够,我建议将数据存储在文件中,并将文件路径保存到数据库中。
以下代码将 200 条最近的推文(JSON 由 Twitter API 提供)添加到 Twitter 用户的数据库中。这里的 $users
数组只包含一个用户(例如:@katyperry),但最终它会包含更多用户。对于数组中的每个用户,通过 Twitter API 引入了 200 条推文。所有这些数据收集工作正常。
现在的问题是:对于每个用户,我在 MySQL 数据库中插入 200 条推文(我必须只使用 MySQL,不其他选择)table。现在我明白每个 TwitterResp
JSON 字符串化都是巨大的(也许这就是问题所在)。
TwitterResp 示例:
{"created_at":"Thu Jul 23 18:25:30 +0000 2015","id":624284214704390145,"id_str":"624284214704390145","text":"when your fragrance is ud83dudd25#madpotion https://t.co/UfyPQIwIj4","source":"<a href="http://instagram.com" rel="nofollow">Instagram</a>","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":21447363,"id_str":"21447363","name":"KATY PERRY","screen_name":"katyperry","location":"","description":"CURRENTLYu2728BEAMINGu2728ON THE PRISMATIC WORLD TOUR 2014/2015!","url":"http://t.co/fxFJjKX30d","entities":{"url":{"urls":[{"url":"http://t.co/fxFJjKX30d","expanded_url":"http://www.katyperry.com","display_url":"katyperry.com","indices":[0,22]}]},"description":{"urls":[]}},"protected":false,"followers_count":73404466,"friends_count":157,"listed_count":143175,"created_at":"Fri Feb 20 23:45:56 +0000 2009","favourites_count":1663,"utc_offset":-28800,"time_zone":"Alaska","geo_enabled":false,"verified":true,"statuses_count":6566,"lang":"en","contributors_enabled":false,"is_translator":false,"is_translation_enabled":true,"profile_background_color":"CECFBC","profile_background_image_url":"http://pbs.twimg.com/profile_background_images/378800000168797027/kSZ-ewZo.jpeg","profile_background_image_url_https":"https://pbs.twimg.com/profile_background_images/378800000168797027/kSZ-ewZo.jpeg","profile_background_tile":false,"profile_image_url":"http://pbs.twimg.com/profile_images/609748341119844352/7dUd606e_normal.png","profile_image_url_https":"https://pbs.twimg.com/profile_images/609748341119844352/7dUd606e_normal.png","profile_banner_url":"https://pbs.twimg.com/profile_banners/21447363/1428015534","profile_link_color":"D55732","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"78C0A8","profile_text_color":"5E412F","profile_use_background_image":true,"has_extended_profile":false,"default_profile":false,"default_profile_image":false,"following":true,"follow_request_sent":false,"notifications":false},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":5366,"favorite_count":10510,"entities":{"hashtags":[{"text":"madpotion","indices":[24,34]}],"symbols":[],"user_mentions":[],"urls":[{"url":"https://t.co/UfyPQIwIj4","expanded_url":"https://instagram.com/p/5fRc5mP-YB/","display_url":"instagram.com/p/5fRc5mP-YB/","indices":[35,58]}]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"lang":"en"}
所以当我在一个循环中插入时,如代码所示,TwitterResp
是一个巨大的字符串,最后我在 table 中看到类似 154 行的内容不是 200。对于其他一些用户,我看到 186 而不是 200 等等。现在无论我 运行 katy perry 的代码多少次,我都只得到 154,其他用户也类似。我想知道为什么会这样?这个循环的插入过程是否很慢,因为它会跳过一些行插入大量字符串?
$users = array("result" => ["@katyperry"]);
foreach ($users['result'] as $user) {
// strip the initial character '@' and get 200 Twitter Responses for that screen-name.
$twitterResp = getTweet(substr($user, 1), 200);
$count = 1;
mysqli_query(getConnection(), "START TRANSACTION;");
foreach($twitterResp as $response){
$object = $response;
$query = "INSERT INTO Tweets(Number, TwitterHandle, TwitterResp) VALUES('".(string)$count."', '".$user."', '".json_encode($object)."');";
$count += 1;
$res = mysqli_query(getConnection(),$query);
}
mysqli_query(getConnection(), "COMMIT;");
}
PS:我还尝试通过基本上添加大量 VALUES()、VALUES()、..... 在一个查询中添加推文。 那也没用。
我该如何解决这个问题?有什么建议吗?
首先,修改您的代码以处理 MySQL 错误。这肯定会给你提示哪里出了问题。
$res = mysqli_query(getConnection(),$query);
if(false === $res) {
echo "Insertion error: " . mysqli_error();
}
我的猜测是您超过了 TEXT
或 BLOB
类型的最大长度,无论您在 TwitterResp 栏中使用了什么。
TEXT
或 BLOB
类型的数据长度似乎是无限的,但事实并非如此。 TEXT
/ BLOB
最多可处理 65536 字节 (~64KB),而 MEDIUMTEXT
/ MEDIUMBLOB
的容量约为 16MB,而 LONGTEXT
/ LONGBLOB
高达 ~4GB。
请注意,这些只是类型限制,您还必须考虑连接缓冲区的大小、可用内存的数量等,这也可能导致数据截断。有关详细信息,请参阅 MySQL documentation。
总之,您可以尝试将列类型更改为容量更大的MEDIUMTEXT
或MEDIUMBLOB
。然而,如果这还不够,我建议将数据存储在文件中,并将文件路径保存到数据库中。