F# CsvTypeProvider 从略有不同的 csv 文件中提取相同的列
F# CsvTypeProvider extracting the same columns from slightly different csv-files
我正在创建一个从不同 CSV 文件读取足球比赛的程序。我感兴趣的列存在于所有文件中,但文件的列数不同。
这让我为文件的每个变体创建了一个单独的映射函数,每个类型都有一个不同的样本:
type GamesFile14 = CsvProvider<"./data/sample_14.csv">
type GamesFile15 = CsvProvider<"./data/sample_15.csv">
type GamesFile1617 = CsvProvider<"./data/sample_1617.csv">
let mapRows14 (rows:seq<GamesFile14.Row>) = rows |> Seq.map ( fun c -> { Division = c.Div; Date = DateTime.Parse c.Date;
HomeTeam = { Name = c.HomeTeam; Score = c.FTHG; Shots = c.HS; ShotsOnTarget = c.HST; Corners = c.HC; Fouls = c.HF };
AwayTeam = { Name = c.AwayTeam; Score = c.FTAG; Shots = c.AS; ShotsOnTarget = c.AST; Corners = c.AC; Fouls = c.AF };
Odds = { H = float c.B365H; U = float c.B365D; B = float c.B365A } } )
let mapRows15 (rows:seq<GamesFile15.Row>) = rows |> Seq.map ( fun c -> { Division = c.Div; Date = DateTime.Parse c.Date;
HomeTeam = { Name = c.HomeTeam; Score = c.FTHG; Shots = c.HS; ShotsOnTarget = c.HST; Corners = c.HC; Fouls = c.HF };
AwayTeam = { Name = c.AwayTeam; Score = c.FTAG; Shots = c.AS; ShotsOnTarget = c.AST; Corners = c.AC; Fouls = c.AF };
Odds = { H = float c.B365H; U = float c.B365D; B = float c.B365A } } )
let mapRows1617 (rows:seq<GamesFile1617.Row>) = rows |> Seq.map ( fun c -> { Division = c.Div; Date = DateTime.Parse c.Date;
HomeTeam = { Name = c.HomeTeam; Score = c.FTHG; Shots = c.HS; ShotsOnTarget = c.HST; Corners = c.HC; Fouls = c.HF };
AwayTeam = { Name = c.AwayTeam; Score = c.FTAG; Shots = c.AS; ShotsOnTarget = c.AST; Corners = c.AC; Fouls = c.AF };
Odds = { H = float c.B365H; U = float c.B365D; B = float c.B365A } } )
这些再次被 loadGames 函数消耗:
let loadGames season resource =
if season.Year = 14 then GamesFile14.Load(resource).Rows |> mapRows14
else if season.Year = 15 then GamesFile15.Load(resource).Rows |> mapRows15
else GamesFile1617.Load(resource).Rows |> mapRows1617
在我看来,必须有更好的方法来解决这个问题。
有什么方法可以让我的映射函数更通用,这样我就不必一遍又一遍地重复相同的函数了吗?
是否可以根据资源即时创建 CsvProvider,或者我是否需要像上面的代码一样为我的 csv 文件的每个变体显式声明一个示例?
其他建议?
在您的场景中,您可能会从 FSharp.Data's CsvFile
type 中获得更好的结果。它使用更动态的 CSV 解析方法,使用动态 ?
运算符进行数据访问:您失去了类型提供程序为您提供的一些类型安全保证,因为每个单独的 CSV 文件都将加载到保存中CsvRow
类型——这意味着您不能在编译时保证任何给定的列都会在文件中,并且您必须为运行时错误做好准备。但在你的情况下,这正是你想要的,因为它可以让你的三个函数像这样重写:
let mapRows14 rows = rows |> Seq.map ( fun c -> { Division = c?Div; Date = DateTime.Parse c?Date;
HomeTeam = { Name = c?HomeTeam; Score = c?FTHG; Shots = c?HS; ShotsOnTarget = c?HST; Corners = c?HC; Fouls = c?HF };
AwayTeam = { Name = c?AwayTeam; Score = c?FTAG; Shots = c?AS; ShotsOnTarget = c?AST; Corners = c?AC; Fouls = c?AF };
Odds = { H = float c?B365H; U = float c?B365D; B = float c?B365A } } )
试试 CsvFile
看看它是否能解决您的问题。
我正在创建一个从不同 CSV 文件读取足球比赛的程序。我感兴趣的列存在于所有文件中,但文件的列数不同。
这让我为文件的每个变体创建了一个单独的映射函数,每个类型都有一个不同的样本:
type GamesFile14 = CsvProvider<"./data/sample_14.csv">
type GamesFile15 = CsvProvider<"./data/sample_15.csv">
type GamesFile1617 = CsvProvider<"./data/sample_1617.csv">
let mapRows14 (rows:seq<GamesFile14.Row>) = rows |> Seq.map ( fun c -> { Division = c.Div; Date = DateTime.Parse c.Date;
HomeTeam = { Name = c.HomeTeam; Score = c.FTHG; Shots = c.HS; ShotsOnTarget = c.HST; Corners = c.HC; Fouls = c.HF };
AwayTeam = { Name = c.AwayTeam; Score = c.FTAG; Shots = c.AS; ShotsOnTarget = c.AST; Corners = c.AC; Fouls = c.AF };
Odds = { H = float c.B365H; U = float c.B365D; B = float c.B365A } } )
let mapRows15 (rows:seq<GamesFile15.Row>) = rows |> Seq.map ( fun c -> { Division = c.Div; Date = DateTime.Parse c.Date;
HomeTeam = { Name = c.HomeTeam; Score = c.FTHG; Shots = c.HS; ShotsOnTarget = c.HST; Corners = c.HC; Fouls = c.HF };
AwayTeam = { Name = c.AwayTeam; Score = c.FTAG; Shots = c.AS; ShotsOnTarget = c.AST; Corners = c.AC; Fouls = c.AF };
Odds = { H = float c.B365H; U = float c.B365D; B = float c.B365A } } )
let mapRows1617 (rows:seq<GamesFile1617.Row>) = rows |> Seq.map ( fun c -> { Division = c.Div; Date = DateTime.Parse c.Date;
HomeTeam = { Name = c.HomeTeam; Score = c.FTHG; Shots = c.HS; ShotsOnTarget = c.HST; Corners = c.HC; Fouls = c.HF };
AwayTeam = { Name = c.AwayTeam; Score = c.FTAG; Shots = c.AS; ShotsOnTarget = c.AST; Corners = c.AC; Fouls = c.AF };
Odds = { H = float c.B365H; U = float c.B365D; B = float c.B365A } } )
这些再次被 loadGames 函数消耗:
let loadGames season resource =
if season.Year = 14 then GamesFile14.Load(resource).Rows |> mapRows14
else if season.Year = 15 then GamesFile15.Load(resource).Rows |> mapRows15
else GamesFile1617.Load(resource).Rows |> mapRows1617
在我看来,必须有更好的方法来解决这个问题。
有什么方法可以让我的映射函数更通用,这样我就不必一遍又一遍地重复相同的函数了吗?
是否可以根据资源即时创建 CsvProvider,或者我是否需要像上面的代码一样为我的 csv 文件的每个变体显式声明一个示例?
其他建议?
在您的场景中,您可能会从 FSharp.Data's CsvFile
type 中获得更好的结果。它使用更动态的 CSV 解析方法,使用动态 ?
运算符进行数据访问:您失去了类型提供程序为您提供的一些类型安全保证,因为每个单独的 CSV 文件都将加载到保存中CsvRow
类型——这意味着您不能在编译时保证任何给定的列都会在文件中,并且您必须为运行时错误做好准备。但在你的情况下,这正是你想要的,因为它可以让你的三个函数像这样重写:
let mapRows14 rows = rows |> Seq.map ( fun c -> { Division = c?Div; Date = DateTime.Parse c?Date;
HomeTeam = { Name = c?HomeTeam; Score = c?FTHG; Shots = c?HS; ShotsOnTarget = c?HST; Corners = c?HC; Fouls = c?HF };
AwayTeam = { Name = c?AwayTeam; Score = c?FTAG; Shots = c?AS; ShotsOnTarget = c?AST; Corners = c?AC; Fouls = c?AF };
Odds = { H = float c?B365H; U = float c?B365D; B = float c?B365A } } )
试试 CsvFile
看看它是否能解决您的问题。