如何通过正则表达式计算 http:// 地址列表中的所有域?
How to count all domains from http:// adresses list by regular expressions?
所以,我有一个 http:// 地址列表,我需要在 JS 中通过正则表达式来计算域。我不知道该怎么做,因为它们的长度不同,有些彼此相似。我怎样才能做到这一点?正则表达式是我的噩梦。 here is my list
您可以使用String.prototype.match()方法。
使用此线程中的修改后的正则表达式 What is a good regular expression to match a URL? 您可以这样计算匹配项的数量:
// Your original list of addresses
const data = `
http://www.gaba.ch/fr_CH/519/Netuschil-L-et-al-Eur-J-Oral-Sci-103-1995-355-361.htm?Subnav2=ResearchProducts&Article=17516
http://www.gaba.fi/fi_FI/725/Suche.htm?Page=42
http://www.gaba.ch/fr_CH/538/Recomend-Page.htm?LinkID=576&Brand=meridolHalitosis&Subnav=&Product=312435
http://www.gaba.com/en/1071/Professor-Edwin-G-Winkel.htm
http://www.gaba.ch/fr_CH/580/Congress-Calendar.htm?CongressId=289461&Page=6
// ... etc
`;
// Make sure you include the g flag to find all the matches and not just one
const addresses = data.match(/https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b(?:[-a-zA-Z0-9@:%_\+.~#?&//=]*)/g);
// Get length of the matched array
// - In this example: 5
// - In your case: 4815
const addressesCount = addresses.length;
编辑:
根据您的意见,我对代码做了一些调整:
// Your original list of addresses
const data = `
http://www.gaba.ch/fr_CH/519/Netuschil-L-et-al-Eur-J-Oral-Sci-103-1995-355-361.htm?Subnav2=ResearchProducts&Article=17516
http://www.gaba.fi/fi_FI/725/Suche.htm?Page=42
http://www.gaba.ch/fr_CH/538/Recomend-Page.htm?LinkID=576&Brand=meridolHalitosis&Subnav=&Product=312435
http://www.gaba.com/en/1071/Professor-Edwin-G-Winkel.htm
http://www.gaba.ch/fr_CH/580/Congress-Calendar.htm?CongressId=289461&Page=6
// ... etc
`;
// Find all valid domains (excluding http and www)
const addresses = data.match(/https?:\/\/(?:www)?\.((?:.+?)\.[\w\.]{2,5})/g);
// Filter the addresses to only unique ones
const unique = addresses.reduce((acc, cur) => acc.indexOf(cur) > -1 ? acc : acc.concat(cur), []);
// Get number of unique addresses found
// - In this example: 3
// - In your case: 28
const length = unique.length;
注意:像这样的地址 http:/www.bnf.org/bnf/bnf/54/%3C
将不会被匹配,因为它们无效。
所以,我有一个 http:// 地址列表,我需要在 JS 中通过正则表达式来计算域。我不知道该怎么做,因为它们的长度不同,有些彼此相似。我怎样才能做到这一点?正则表达式是我的噩梦。 here is my list
您可以使用String.prototype.match()方法。
使用此线程中的修改后的正则表达式 What is a good regular expression to match a URL? 您可以这样计算匹配项的数量:
// Your original list of addresses
const data = `
http://www.gaba.ch/fr_CH/519/Netuschil-L-et-al-Eur-J-Oral-Sci-103-1995-355-361.htm?Subnav2=ResearchProducts&Article=17516
http://www.gaba.fi/fi_FI/725/Suche.htm?Page=42
http://www.gaba.ch/fr_CH/538/Recomend-Page.htm?LinkID=576&Brand=meridolHalitosis&Subnav=&Product=312435
http://www.gaba.com/en/1071/Professor-Edwin-G-Winkel.htm
http://www.gaba.ch/fr_CH/580/Congress-Calendar.htm?CongressId=289461&Page=6
// ... etc
`;
// Make sure you include the g flag to find all the matches and not just one
const addresses = data.match(/https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b(?:[-a-zA-Z0-9@:%_\+.~#?&//=]*)/g);
// Get length of the matched array
// - In this example: 5
// - In your case: 4815
const addressesCount = addresses.length;
编辑:
根据您的意见,我对代码做了一些调整:
// Your original list of addresses
const data = `
http://www.gaba.ch/fr_CH/519/Netuschil-L-et-al-Eur-J-Oral-Sci-103-1995-355-361.htm?Subnav2=ResearchProducts&Article=17516
http://www.gaba.fi/fi_FI/725/Suche.htm?Page=42
http://www.gaba.ch/fr_CH/538/Recomend-Page.htm?LinkID=576&Brand=meridolHalitosis&Subnav=&Product=312435
http://www.gaba.com/en/1071/Professor-Edwin-G-Winkel.htm
http://www.gaba.ch/fr_CH/580/Congress-Calendar.htm?CongressId=289461&Page=6
// ... etc
`;
// Find all valid domains (excluding http and www)
const addresses = data.match(/https?:\/\/(?:www)?\.((?:.+?)\.[\w\.]{2,5})/g);
// Filter the addresses to only unique ones
const unique = addresses.reduce((acc, cur) => acc.indexOf(cur) > -1 ? acc : acc.concat(cur), []);
// Get number of unique addresses found
// - In this example: 3
// - In your case: 28
const length = unique.length;
注意:像这样的地址 http:/www.bnf.org/bnf/bnf/54/%3C
将不会被匹配,因为它们无效。