Objective C 使用 NSScanner 从 html 获取 <body>
Objective C Using NSScanner to obtain <body> from html
我正在尝试创建一个 iOS 应用程序来提取网页部分。
我的代码可以连接到 URL 并将 HTML 存储在 NSString
中
我已经试过了,但我得到的结果是空字符串
NSScanner* newScanner = [NSScanner scannerWithString:htmlData];
// Create a new scanner and give it the html data to parse.
while (![newScanner isAtEnd])
{
[newScanner scanUpToString:@"<body>" intoString:NULL];
// Scam until <body> tag is found
[newScanner scanUpToString:@"</body>" intoString:&bodyText];
// Everything up to the end tag will get placed into the memory address of the result string
}
我试过另一种方法...
NSScanner* newScanner = [NSScanner scannerWithString:htmlData];
// Create a new scanner and give it the html data to parse.
while (![newScanner isAtEnd])
{
[newScanner scanUpToString:@"<body" intoString:NULL];
// Scam until <body> tag is found
[newScanner scanUpToString:@">" intoString:NULL];
// Go to end of opening <body> tag
[newScanner scanUpToString:@"</body>" intoString:&bodyText];
// Everything up to the end tag will get placed into the memory address of the result string
}
第二种方式 returns 以 >< script...
等开头的字符串
老实说,我没有很好的 URL 来测试它,我认为在删除正文中的标签方面也有一些帮助可能会更容易(比如 <p></p>
)
非常感谢任何帮助
我不知道为什么你的第一种方法不起作用。我假设您在该片段之前定义了 bodyText。这段代码对我来说很好用,
- (void)viewDidLoad {
[super viewDidLoad];
NSString *htmlData = @"This is some stuff before <body> this is the body </body> with some more stuff";
NSScanner* newScanner = [NSScanner scannerWithString:htmlData];
NSString *bodyText;
while (![newScanner isAtEnd]) {
[newScanner scanUpToString:@"<body>" intoString:NULL];
[newScanner scanString:@"<body>" intoString:NULL];
[newScanner scanUpToString:@"</body>" intoString:&bodyText];
}
NSLog(@"%@",bodyText); // 2015-01-28 15:58:00.360 ScanningOfHTMLProblem[1373:661934] this is the body
}
请注意,我添加了对 scanString:intoString:
的调用以通过第一个 "<body>"
。
我正在尝试创建一个 iOS 应用程序来提取网页部分。
我的代码可以连接到 URL 并将 HTML 存储在 NSString
中我已经试过了,但我得到的结果是空字符串
NSScanner* newScanner = [NSScanner scannerWithString:htmlData];
// Create a new scanner and give it the html data to parse.
while (![newScanner isAtEnd])
{
[newScanner scanUpToString:@"<body>" intoString:NULL];
// Scam until <body> tag is found
[newScanner scanUpToString:@"</body>" intoString:&bodyText];
// Everything up to the end tag will get placed into the memory address of the result string
}
我试过另一种方法...
NSScanner* newScanner = [NSScanner scannerWithString:htmlData];
// Create a new scanner and give it the html data to parse.
while (![newScanner isAtEnd])
{
[newScanner scanUpToString:@"<body" intoString:NULL];
// Scam until <body> tag is found
[newScanner scanUpToString:@">" intoString:NULL];
// Go to end of opening <body> tag
[newScanner scanUpToString:@"</body>" intoString:&bodyText];
// Everything up to the end tag will get placed into the memory address of the result string
}
第二种方式 returns 以 >< script...
等开头的字符串
老实说,我没有很好的 URL 来测试它,我认为在删除正文中的标签方面也有一些帮助可能会更容易(比如 <p></p>
)
非常感谢任何帮助
我不知道为什么你的第一种方法不起作用。我假设您在该片段之前定义了 bodyText。这段代码对我来说很好用,
- (void)viewDidLoad {
[super viewDidLoad];
NSString *htmlData = @"This is some stuff before <body> this is the body </body> with some more stuff";
NSScanner* newScanner = [NSScanner scannerWithString:htmlData];
NSString *bodyText;
while (![newScanner isAtEnd]) {
[newScanner scanUpToString:@"<body>" intoString:NULL];
[newScanner scanString:@"<body>" intoString:NULL];
[newScanner scanUpToString:@"</body>" intoString:&bodyText];
}
NSLog(@"%@",bodyText); // 2015-01-28 15:58:00.360 ScanningOfHTMLProblem[1373:661934] this is the body
}
请注意,我添加了对 scanString:intoString:
的调用以通过第一个 "<body>"
。