Objective C 使用 NSScanner 从 html 获取 <body>

Question

我正在尝试创建一个 iOS 应用程序来提取网页部分。

我的代码可以连接到 URL 并将 HTML 存储在 NSString

中

我已经试过了，但我得到的结果是空字符串

    NSScanner* newScanner = [NSScanner scannerWithString:htmlData];
    // Create a new scanner and give it the html data to parse.

    while (![newScanner isAtEnd])
    {
        [newScanner scanUpToString:@"<body>" intoString:NULL];
        // Scam until <body> tag is found

        [newScanner scanUpToString:@"</body>" intoString:&bodyText];
        // Everything up to the end tag will get placed into the memory address of the result string

    }

我试过另一种方法...

    NSScanner* newScanner = [NSScanner scannerWithString:htmlData];
    // Create a new scanner and give it the html data to parse.

    while (![newScanner isAtEnd])
    {
        [newScanner scanUpToString:@"<body" intoString:NULL];
        // Scam until <body> tag is found

        [newScanner scanUpToString:@">" intoString:NULL];
        // Go to end of opening <body> tag

        [newScanner scanUpToString:@"</body>" intoString:&bodyText];
        // Everything up to the end tag will get placed into the memory address of the result string

    }

第二种方式 returns 以 >< script... 等开头的字符串

老实说，我没有很好的 URL 来测试它，我认为在删除正文中的标签方面也有一些帮助可能会更容易（比如 <p></p>）

非常感谢任何帮助

Answer 1

我不知道为什么你的第一种方法不起作用。我假设您在该片段之前定义了 bodyText。这段代码对我来说很好用，

- (void)viewDidLoad {
    [super viewDidLoad];
    NSString *htmlData = @"This is some stuff before <body> this is the body </body> with some more stuff";
    NSScanner* newScanner = [NSScanner scannerWithString:htmlData];
    NSString *bodyText;
    while (![newScanner isAtEnd]) {
        [newScanner scanUpToString:@"<body>" intoString:NULL];
        [newScanner scanString:@"<body>" intoString:NULL];
        [newScanner scanUpToString:@"</body>" intoString:&bodyText];
    }
    NSLog(@"%@",bodyText); // 2015-01-28 15:58:00.360 ScanningOfHTMLProblem[1373:661934] this is the body 
}

请注意，我添加了对 scanString:intoString: 的调用以通过第一个 "<body>"。

Objective C 使用 NSScanner 从 html 获取 <body>

Objective C Using NSScanner to obtain <body> from html

html

objective-c

ios

nsscanner