获取字符前的部分子串
Get part of substring before character
我有一个url,像这样:
https://www.example.com/exampletitle21sep11oct2020/index.html
我需要的部分是在最后一个和倒数第二个“/”字符之间。但我不需要整个部分,我特别需要最后一个“/”字符之前的最后日期。如您所见,两个日期紧挨着,中间没有分隔符,因此很难使用 substring
或 indexOf
方法。更难的是,第一个日期只包含日和月,而最后一个日期包含整个日期。
我有什么方法可以从这个 url 中提取最后一个 '/' 字符之前的最后日期吗?
您可以找到并解析包含以下模式的路径:
^ Line start
.+ One or more of anything
(\d{2}) 2-digit date
(\w{3}) 3-letter month (lowercase)
(\d{2}) 2-digit date
(\w{3}) 3-letter month (lowercase)
(\d{4}) 4-digit year
$ Line end
例子
我使用 moment 来处理日期解析。
const expression = /^.+(\d{2})(\w{3})(\d{2})(\w{3})(\d{4})$/;
const format = 'DD MMM YYYY';
const toTitleCase = (str) => str.charAt(0).toUpperCase() + str.slice(1);
const parseDates = (path) => {
const url = new URL(path),
tokens = url.pathname.split('/'),
found = tokens.find(token => token.match(expression));
if (!found) return null;
const [
, startDate, startMonth, endDate, endMonth, year
] = found.match(expression);
return {
start : moment(`${startDate} ${toTitleCase(startMonth)} ${year}`, format),
end : moment(`${endDate} ${toTitleCase(endMonth)} ${year}`, format)
};
};
const dates = parseDates('https://www.example.com/exampletitle21sep11oct2020/index.html');
console.log(dates);
<script src="https://cdnjs.cloudflare.com/ajax/libs/moment.js/2.29.1/moment.min.js"></script>
试试这个已更新
const url = "https://www.example.com/exampletitle21sep11oct2020/index.html";
const urlData = url.split('/');
const datePart = urlData[urlData.length-2];
const res = datePart.slice(-9); <-- this will give you "11oct2020" -->
使用正则表达式您可以获得第二个日期,如下所示:
const regex = /\/(?:.*?(\d{1,2}\w{3}\d{0,4}))\/.*?$/;
const [, date] = regex.exec("https://www.example.com/exampletitle21sep11oct2020/index.html");
console.log({ date })
const regex = /\/(?:.*?(\d{1,2}\w{3}\d{0,4}))\/.*?$/;
const [, date] = regex.exec("https://www.example.com/exampletitle21sep9oct2020/index.html");
console.log({ date });
console.log(regex.exec("https://www.example.com/exampletitle21sep9oct/index.html")[1])
只用一个正则表达式,一切都会简单得多:
var url = 'https://www.example.com/exampletitle21sep11oct2020/index.html'
var res = url.match( /.*?(\d+[a-z]+\d{4})\/.*?$/i );
// res === [ "https://www.example.com/exampletitle21sep11oct2020/index.html", "11oct2020" ]
var endDate = res[1];
// endDate === "11oct2020"
或(但“exampletitle”不得以数字结尾):
var res = url.match( /.*?(\d+[a-z]+)(\d+[a-z]+)(\d{4})\/.*?$/i );
// [ "https://www.example.com/exampletitle21sep11oct2020/index.html", "21sep", "11oct", "2020" ]
或:
var res = url.match( /.*?(\d+)([a-z]+)(\d+)([a-z]+)(\d{4})\/.*?$/i );
// [ "https://www.example.com/exampletitle21sep11oct2020/index.html", "21", "sep", "11", "oct", "2020" ]
但是,如果您知道日期总是 2 位数字(总是“01”,而不是“1”),那么“exampletitle”可以是任何字符串:
var res = url.match( /.*?(\d{2}[a-z]+\d{4})\/.*?$/i );
var res = url.match( /.*?(\d{2}[a-z]+)(\d+[a-z]+)(\d{4})\/.*?$/i );
var res = url.match( /.*?(\d{2})([a-z]+)(\d+)([a-z]+)(\d{4})\/.*?$/i );
我有一个url,像这样:
https://www.example.com/exampletitle21sep11oct2020/index.html
我需要的部分是在最后一个和倒数第二个“/”字符之间。但我不需要整个部分,我特别需要最后一个“/”字符之前的最后日期。如您所见,两个日期紧挨着,中间没有分隔符,因此很难使用 substring
或 indexOf
方法。更难的是,第一个日期只包含日和月,而最后一个日期包含整个日期。
我有什么方法可以从这个 url 中提取最后一个 '/' 字符之前的最后日期吗?
您可以找到并解析包含以下模式的路径:
^ Line start
.+ One or more of anything
(\d{2}) 2-digit date
(\w{3}) 3-letter month (lowercase)
(\d{2}) 2-digit date
(\w{3}) 3-letter month (lowercase)
(\d{4}) 4-digit year
$ Line end
例子
我使用 moment 来处理日期解析。
const expression = /^.+(\d{2})(\w{3})(\d{2})(\w{3})(\d{4})$/;
const format = 'DD MMM YYYY';
const toTitleCase = (str) => str.charAt(0).toUpperCase() + str.slice(1);
const parseDates = (path) => {
const url = new URL(path),
tokens = url.pathname.split('/'),
found = tokens.find(token => token.match(expression));
if (!found) return null;
const [
, startDate, startMonth, endDate, endMonth, year
] = found.match(expression);
return {
start : moment(`${startDate} ${toTitleCase(startMonth)} ${year}`, format),
end : moment(`${endDate} ${toTitleCase(endMonth)} ${year}`, format)
};
};
const dates = parseDates('https://www.example.com/exampletitle21sep11oct2020/index.html');
console.log(dates);
<script src="https://cdnjs.cloudflare.com/ajax/libs/moment.js/2.29.1/moment.min.js"></script>
试试这个已更新
const url = "https://www.example.com/exampletitle21sep11oct2020/index.html";
const urlData = url.split('/');
const datePart = urlData[urlData.length-2];
const res = datePart.slice(-9); <-- this will give you "11oct2020" -->
使用正则表达式您可以获得第二个日期,如下所示:
const regex = /\/(?:.*?(\d{1,2}\w{3}\d{0,4}))\/.*?$/;
const [, date] = regex.exec("https://www.example.com/exampletitle21sep11oct2020/index.html");
console.log({ date })
const regex = /\/(?:.*?(\d{1,2}\w{3}\d{0,4}))\/.*?$/;
const [, date] = regex.exec("https://www.example.com/exampletitle21sep9oct2020/index.html");
console.log({ date });
console.log(regex.exec("https://www.example.com/exampletitle21sep9oct/index.html")[1])
只用一个正则表达式,一切都会简单得多:
var url = 'https://www.example.com/exampletitle21sep11oct2020/index.html'
var res = url.match( /.*?(\d+[a-z]+\d{4})\/.*?$/i );
// res === [ "https://www.example.com/exampletitle21sep11oct2020/index.html", "11oct2020" ]
var endDate = res[1];
// endDate === "11oct2020"
或(但“exampletitle”不得以数字结尾):
var res = url.match( /.*?(\d+[a-z]+)(\d+[a-z]+)(\d{4})\/.*?$/i );
// [ "https://www.example.com/exampletitle21sep11oct2020/index.html", "21sep", "11oct", "2020" ]
或:
var res = url.match( /.*?(\d+)([a-z]+)(\d+)([a-z]+)(\d{4})\/.*?$/i );
// [ "https://www.example.com/exampletitle21sep11oct2020/index.html", "21", "sep", "11", "oct", "2020" ]
但是,如果您知道日期总是 2 位数字(总是“01”,而不是“1”),那么“exampletitle”可以是任何字符串:
var res = url.match( /.*?(\d{2}[a-z]+\d{4})\/.*?$/i );
var res = url.match( /.*?(\d{2}[a-z]+)(\d+[a-z]+)(\d{4})\/.*?$/i );
var res = url.match( /.*?(\d{2})([a-z]+)(\d+)([a-z]+)(\d{4})\/.*?$/i );