获取字符前的部分子串

Get part of substring before character

我有一个url,像这样:

https://www.example.com/exampletitle21sep11oct2020/index.html

我需要的部分是在最后一个和倒数第二个“/”字符之间。但我不需要整个部分,我特别需要最后一个“/”字符之前的最后日期。如您所见,两个日期紧挨着,中间没有分隔符,因此很难使用 substringindexOf 方法。更难的是,第一个日期只包含日和月,而最后一个日期包含整个日期。

我有什么方法可以从这个 url 中提取最后一个 '/' 字符之前的最后日期吗?

您可以找到并解析包含以下模式的路径:

^         Line start
.+        One or more of anything
(\d{2})   2-digit date
(\w{3})   3-letter month (lowercase)
(\d{2})   2-digit date
(\w{3})   3-letter month (lowercase)
(\d{4})   4-digit year
$         Line end

例子

我使用 moment 来处理日期解析。

const expression = /^.+(\d{2})(\w{3})(\d{2})(\w{3})(\d{4})$/;
const format = 'DD MMM YYYY';
const toTitleCase = (str) => str.charAt(0).toUpperCase() + str.slice(1);

const parseDates = (path) => {
  const url    = new URL(path),
        tokens = url.pathname.split('/'),
        found  = tokens.find(token => token.match(expression));
  if (!found) return null;
  const [
    , startDate, startMonth, endDate, endMonth, year
  ] = found.match(expression);
  return {
    start : moment(`${startDate} ${toTitleCase(startMonth)} ${year}`, format),
    end   : moment(`${endDate} ${toTitleCase(endMonth)} ${year}`, format)
  };
};

const dates = parseDates('https://www.example.com/exampletitle21sep11oct2020/index.html');

console.log(dates);
<script src="https://cdnjs.cloudflare.com/ajax/libs/moment.js/2.29.1/moment.min.js"></script>

试试这个已更新

const url = "https://www.example.com/exampletitle21sep11oct2020/index.html";
const urlData = url.split('/');
const datePart = urlData[urlData.length-2];
const res = datePart.slice(-9); <-- this will give you "11oct2020" -->

使用正则表达式您可以获得第二个日期,如下所示:

const regex = /\/(?:.*?(\d{1,2}\w{3}\d{0,4}))\/.*?$/;

const [, date] = regex.exec("https://www.example.com/exampletitle21sep11oct2020/index.html");
console.log({ date })

const regex = /\/(?:.*?(\d{1,2}\w{3}\d{0,4}))\/.*?$/;

const [, date] = regex.exec("https://www.example.com/exampletitle21sep9oct2020/index.html");
console.log({ date });
console.log(regex.exec("https://www.example.com/exampletitle21sep9oct/index.html")[1])

只用一个正则表达式,一切都会简单得多:

var url = 'https://www.example.com/exampletitle21sep11oct2020/index.html'

var res = url.match( /.*?(\d+[a-z]+\d{4})\/.*?$/i );
// res === [ "https://www.example.com/exampletitle21sep11oct2020/index.html", "11oct2020" ]
var endDate = res[1];
// endDate === "11oct2020"

或(但“exampletitle”不得以数字结尾):

var res = url.match( /.*?(\d+[a-z]+)(\d+[a-z]+)(\d{4})\/.*?$/i );
// [ "https://www.example.com/exampletitle21sep11oct2020/index.html", "21sep", "11oct", "2020" ]

或:

var res = url.match( /.*?(\d+)([a-z]+)(\d+)([a-z]+)(\d{4})\/.*?$/i );
// [ "https://www.example.com/exampletitle21sep11oct2020/index.html", "21", "sep", "11", "oct", "2020" ]

但是,如果您知道日期总是 2 位数字(总是“01”,而不是“1”),那么“exampletitle”可以是任何字符串:

var res = url.match( /.*?(\d{2}[a-z]+\d{4})\/.*?$/i );
var res = url.match( /.*?(\d{2}[a-z]+)(\d+[a-z]+)(\d{4})\/.*?$/i );
var res = url.match( /.*?(\d{2})([a-z]+)(\d+)([a-z]+)(\d{4})\/.*?$/i );