使用 python 模块,例如 beautifulSoup 并在 Visual Studio C# 中请求

Using python modules like beautifulSoup and request in Visual Studio C#

我正在尝试 运行 在 visual studio 中使用 c# 的外部 python 脚本。我正在使用像 BeautifulSoup 和 requests

这样的模块

但我收到以下错误

No module named requests

早些时候我在 BeautifulSoup 中遇到了同样的错误,我将以下行添加到我的 python 脚本中并且错误得到解决

sys.path.append("[Path to Python]\Python\Python35-32\Lib\site-packages")

我在 Visual Studio 2015 年使用 IronPython。我是否可以克服这个错误?如果不可能,是否有任何其他方法可以在 c# 环境中 运行 python 脚本(具有上述模块)。

我尝试使用 denfromufa 给出的解决方案,但随后出现以下错误

这是我的Python代码

import sys
import requests
import re
import io
from bs4 import BeautifulSoup
from math import floor

r  = requests.get("https://www.google.com/") 
data = r.text 
soup = BeautifulSoup(data, 'html.parser')
result = []

for item in soup.find_all(attrs={'class' :'something'}):
        for m in item.select('a[href^="something"]'):

        m1 = m['href'].replace("something","",1)

        m2 = re.sub(r'&.*$', "", m1)

        m3 = re.sub(r'%3F.*$', "", m2)

        m4 = m3.replace("%2F","/")

        m5 = m4.replace("%3A",":")

        result.append(m5)
        result.append(m.get_text())


for image in item.find_all('img'):
    k1 = re.sub(r'&cfs.*$',"",image['src'])
    k2 = re.sub(r'^https://something.*$',"",k1)
    k3 = re.sub(r'.*url=',"",k2)
    k4 = re.sub(r'%3F.*$', "", k3)
    k5 = k4.replace("%2F","/")
    k6 = k5.replace("%3A",":")
    k7 = re.sub(r'.*\.gif',"",k6)
    result.append(k7)



seen = set()
result_final = []
for item in result:
    if item not in seen:
        seen.add(item)
        result_final.append(item)

result_final = list(result_final)

我的c#代码如下

 using (Py.GIL())
                {
                    dynamic sys = Py.Import("sys");
                dynamic requests = Py.Import("requests");
                dynamic re = Py.Import("re");
                dynamic io = Py.Import("io");
                dynamic BeautifulSoup = Py.Import("bs4");
                dynamic math = Py.Import("math");
                Console.WriteLine(5);
                dynamic r = requests.get("https://www.google.com/");
                dynamic data = r.text;
                dynamic soup = BeautifulSoup.BeautifulSoup(data, "html.parser");
                }

我用了

    var divExp = new { _class = "smoething" };
    var item = soup.find_all(Py.kw("class", divExp._class));

我正在得到结果。但是当我尝试在 item 变量上实现 select 方法时,我收到一条错误消息,指出 Python 对象不包含 'select'

的定义
item.select("a[href^='https://www.google.com/']");

最终答案

using (Py.GIL())
                {
                    dynamic sys = Py.Import("sys");
                    dynamic requests = Py.Import("requests");
                    dynamic re = Py.Import("re");
                    dynamic io = Py.Import("io");
                    dynamic BeautifulSoup = Py.Import("bs4");
                    dynamic math = Py.Import("math");
                    Console.WriteLine(5);
                    dynamic r = requests.get(url);
                    dynamic data = r.text;
                    dynamic soup = BeautifulSoup.BeautifulSoup(data, "html.parser");

                    var divExp = new { _class = "className" };
                    var item = soup.find_all(Py.kw("class", divExp._class));
                    dynamic tag = soup.select("a[href^='https://something.com/']");
                    for (var i = 1; i < item.Length(); i++)
                    {
                        // Extrxting the required info using regex

                        String input = Convert.ToString(item[i]);
                        string pattern_link = "(.*href=\"https:[\/][\/]something.com[\/]a.php\?u=)|(&.*)";
                        string replacement_link = " ";
                        Regex rgx_link = new Regex(pattern_link);
                        string result_link = rgx_link.Replace(input, replacement_link);

                        .
                        .
                        .
                        .
                        string pattern_link_1 = "(http|https)%.*";
                        Regex rgx_link_1 = new Regex(pattern_link_1);
                        Match result_link_1 = rgx_link_1.Match(result_link);
                        String input_1_1 = Convert.ToString(result_link_1.Value);



                        result_link_2 = result_link_2.Replace("%2F", "/").Replace("%3A", ":");                       

                    }

                }

为什么不使用 HTML 敏捷包?它是 C# 的等价物。

http://html-agility-pack.net/

您可以将其导入您的解决方案。

对于外部 python 脚本,请参阅,

https://www.codeproject.com/articles/121374/step-by-step-guidance-of-calling-iron-python-funct

  • 安装 CPython,2.7、3.4+ 版本之一
  • pip install pythonnet
  • 参考安装 Python.Runtime.DLL 在您的 .NET 项目中
  • 遵循 www.python4.net 上的教程,嵌入部分。

```

> scriptcs (ctrl-c to exit or :help for help)

> #r "C:\Python\Anaconda3_64b\Lib\site-packages\Python.Runtime.dll"
> using Python.Runtime;
> dynamic bs4;
> using (Py.GIL()) {bs4=Py.Import("bs4");}
> bs4.__file__.ToString()
C:\Python\Anaconda3_64b\lib\site-packages\bs4\__init__.py
> dynamic rq;
> using (Py.GIL()) {rq=Py.Import("requests");}
> dynamic r=rq.get("https://www.google.com/")
> dynamic soup = bs4.BeautifulSoup(r.text,"html.parser");
> soup.ToString()

```