在 Vegalite 上添加回归线

Adding a regression line on Vegalite

我正在尝试在散点图上添加回归线和 R 平方值。我知道我应该使用图层功能和

 "transform": [
                {
                    "regression": "GDP per capita",
                    "on": "Educationalattainment",
                }

但在尝试了一百万次之后,我无法确定在何处插入代码行。这是我的图表的代码

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "title": {
    "text": "GDP per capita and Education Attainment",
    "subtitle": "From 2015-2020. Sources: World Bank",
    "subtitleFontStyle": "italic",
    "subtitleFontSize": 10,
    "anchor": "start",
    "color": "black"
  },
  "height": 300,
  "width": 300,
  "data": {
    "url": "https://raw.githubusercontent.com/jamieprince/jamieprince.github.io/main/correlation.csv"
  },

  "transform": [
    {"calculate": "datum.Educationalattainment/100", "as": "percent"},
    {"filter": {
    "field": "Educationalattainment",
    "gt": 0
  }}
  ],

  "selection": {
    "paintbrush": {
      "type": "multi",
      "on": "mouseover",
      "nearest": true
    },

    "grid": {
      "type": "interval",
      "bind": "scales"
    }
  },

  "mark": {
    "type": "circle",
    "opacity": 0.5,
    "color": "#EC9D3E"
  },

  "encoding": {
    "x": {
      "field": "GDP per capita",
      "type": "quantitative",
      "axis": {
        "title": "GDP per capita",
        "grid": false,
        "tickCount": 10,
        "labelOverlap": "greedy"
      }
    },

    "y": {
      "field": "percent",
      "type": "quantitative",
      "axis": {
        "title": "Educational Attainment",
        "grid": false, "format":"%"
      }
    },

    "size": {
      "condition": {
        "selection": "paintbrush",
        "value": 300,
        "init": {
          "value": 70
        }
      },
      "value": 70
    },


    "tooltip": [
       {
        "field": "Year",
        "type": "nominal",
        "title": "Year"
      },
      {
        "field": "Country",
        "type": "ordinal",
        "title": "Country"
      },
      {
        "field": "GDP per capita",
        "type": "nominal",
        "title": "GDP per capita"
      },
      {
        "field": "Educationalattainment",
        "type": "nominal",
        "title": "Educational attainment at least completed short-cycle tertiary population 25+ total (%) (cumulative)"
      }
    ]
  }
}

这是我的参考图表

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "description": "Figure 5: Plotting a regression of Social Mobility Index on Global Entrepreneurship Index, equation acquired via Python",
  "data": {
    "url": "https://raw.githubusercontent.com/marinabrts/marinabrts.github.io/main/GEIxSMI.csv",
    "format": {"type": "csv"}
  },
  "background": "#E0E0E0",
  "config": {"axis": {"grid": true, "gridColor": "#FFFFFF"}},
  "title": {
    "text": "Figure 5: Regressing SMI on Global Entrepreneurship Index",
    "subtitle": "Source: World Economic Forum (2020), Global Entrepreneurship & Development Institute (2019)",
    "subtitleFontStyle": "italic",
    "subtitleFontSize": 10,
    "anchor": "start"
  },
  "height": 300,
  "width": 370,
  "layer": [
    {
      "mark": {"type": "point", "size": 30, "color": "#FF3399"},
      "encoding": {
        "x": {
          "field": "GEI",
          "type": "quantitative",
          "title": "Global Entrepreneurship Index (GEI)"
        },
        "y": {
          "field": "Index Score",
          "type": "quantitative",
          "title": "Social Mobility Index (SMI)",
          "scale": {"domain": [30, 90]}
        },
        "tooltip": [
          {"field": "Country", "type": "nominal", "title": "Country"},
          {"field": "GEI", "type": "quantitative", "title": "GEI"},
          {"field": "Index Score", "type": "quantitative", "title": "SMI"}
        ]
      }
    },
    {
      "mark": {"type": "line", "color": "#7F00FF", "size": 3},
      "transform": [{"regression": "Index Score", "on": "GEI"}],
      "encoding": {
        "x": {"field": "GEI", "type": "quantitative"},
        "y": {"field": "Index Score", "type": "quantitative"}
      }
    },
    {
      "transform": [
        {"regression": "Index Score", "on": "GEI", "params": true},
        {"calculate": "'R²= '+format(datum.rSquared, '.2f')", "as": "R2"}
      ],
      "mark": {
        "type": "text",
        "color": "red",
        "size": 14,
        "x": "width",
        "align": "center",
        "y": -5
      },
      "encoding": {"text": {"type": "nominal", "field": "R2"}}
    }
  ]
}

如果有任何帮助,我将不胜感激。谢谢!

编辑:

R 平方值代码

{
      "transform": [
        {
          "regression": "GDP per capita",
          "on": "percent",
          "params": true
        },
        {"calculate": "'R²: '+format(datum.rSquared, '.2f')", "as": "R2"}
      ],
      "mark": {
        "type": "text",
        "color": "black",
        "x": "width",
        "align": "right",
        "y": -5
      },
      "encoding": {
        "text": {"type": "nominal", "field": "R2"}
      }
    }

完成未出现值的图表代码

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "title": {
    "text": null,
    "subtitle": null,
    "subtitleFontStyle": "italic",
    "subtitleFontSize": 10,
    "anchor": "start",
    "color": "black"
  },
  "height": 100,
  "width": 100,
  "data": {
    "url": "https://raw.githubusercontent.com/jamieprince/jamieprince.github.io/main/correlation.csv"
  },
  "transform": [
    {"calculate": "datum.Educationalattainment/100", "as": "percent"},
    {"filter": {"field": "Educationalattainment", "gt": 0}}
  ],
  "layer": [
    {
      "selection": {
        "paintbrush": {"type": "multi", "on": "mouseover", "nearest": true},
        "grid": {"type": "interval", "bind": "scales"}
      },
      "mark": {"type": "circle", "opacity": 0.5, "color": "#EC9D3E"},
      "encoding": {
        "x": {
          "field": "GDP per capita",
          "type": "quantitative",
          "axis": {
            "title": "GDP per capita",
            "grid": false,
            "tickCount": 10,
            "labelOverlap": "greedy"
          }
        },
        "y": {
          "field": "percent",
          "type": "quantitative",
          "axis": {
            "title": "Educational Attainment",
            "grid": false,
            "format": "%"
          }
        },
        "size": {
          "condition": {
            "selection": "paintbrush",
            "value": 300,
            "init": {"value": 70}
          },
          "value": 70
        },
        "tooltip": [
          {"field": "Year", "type": "nominal", "title": "Year"},
          {"field": "Country", "type": "ordinal", "title": "Country"},
          {
            "field": "GDP per capita",
            "type": "nominal",
            "title": "GDP per capita"
          },
          {
            "field": "Educationalattainment",
            "type": "nominal",
            "title": "Educational attainment at least completed short-cycle tertiary population 25+ total (%) (cumulative)"
          }
        ]
      }
    },
    {
      "mark": {"type": "line", "color": "#347DB6", "size": 3},
      "transform": [{"regression": "GDP per capita", "on": "percent"}],
      "encoding": {
        "x": {"field": "GDP per capita", "type": "quantitative"},
        "y": {"field": "percent", "type": "quantitative"}
      }
    },
    {
      "transform": [
        {
          "regression": "GDP per capita",
          "on": "percent",
          "params": true
        },
        {"calculate": "'R²: '+format(datum.rSquared, '.2f')", "as": "R2"}
      ],
      "mark": {
        "type": "text",
        "color": "black",
        "x": "width",
        "align": "right",
        "y": -5
      },
      "encoding": {
        "text": {"type": "nominal", "field": "R2"}
      }
    }
  ]
}

您只需将 scatter 图表和 line 标记添加到一个图层中,这两个图层将相互堆叠。然后,对line标记进行regression变换。您的问题中提供的转换似乎是错误的,因为没有 x 或 y 字段具有 Educationalattainment,所以我对 percent 字段进行了回归,因为它是从 [=17= 计算和得出的] 字段:

"transform": [
                {
                    "regression": "GDP per capita",
                    "on": "Educationalattainment",
                }]

以下为修改后的配置或参考editor:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "title": {
    "text": null,
    "subtitle": null,
    "subtitleFontStyle": "italic",
    "subtitleFontSize": 10,
    "anchor": "start",
    "color": "black"
  },
  "height": 100,
  "width": 100,
  "data": {
    "url": "https://raw.githubusercontent.com/jamieprince/jamieprince.github.io/main/correlation.csv"
  },
  "transform": [
    {"calculate": "datum.Educationalattainment/100", "as": "percent"},
    {"filter": {"field": "Educationalattainment", "gt": 0}}
  ],
  "layer": [
    {
      "selection": {
        "paintbrush": {"type": "multi", "on": "mouseover", "nearest": true},
        "grid": {"type": "interval"}
      },
      "mark": {"type": "circle", "opacity": 0.5, "color": "#EC9D3E"},
      "encoding": {
        "x": {
          "field": "GDP per capita",
          "type": "quantitative",
          "axis": {
            "title": "GDP per capita",
            "grid": false,
            "tickCount": 10,
            "labelOverlap": "greedy"
          }
        },
        "y": {
          "field": "percent",
          "type": "quantitative",
          "axis": {
            "title": "Educational Attainment",
            "grid": false,
            "format": "%"
          }
        },
        "size": {
          "condition": {
            "selection": "paintbrush",
            "value": 300,
            "init": {"value": 70}
          },
          "value": 70
        },
        "tooltip": [
          {"field": "Year", "type": "nominal", "title": "Year"},
          {"field": "Country", "type": "ordinal", "title": "Country"},
          {
            "field": "GDP per capita",
            "type": "nominal",
            "title": "GDP per capita"
          },
          {
            "field": "Educationalattainment",
            "type": "nominal",
            "title": "Educational attainment at least completed short-cycle tertiary population 25+ total (%) (cumulative)"
          }
        ]
      }
    },
    {
      "mark": {"type": "line", "color": "#347DB6", "size": 3},
      "transform": [{"regression": "GDP per capita", "on": "percent"}],
      "encoding": {
        "x": {"field": "GDP per capita", "type": "quantitative"},
        "y": {"field": "percent", "type": "quantitative"}
      }
    },
    {
      "transform": [
        {"regression": "GDP per capita", "on": "percent", "params": true},
        {"calculate": "'R²: '+format(datum.rSquared, '.2f')", "as": "R2"}
      ],
      "mark": {
        "type": "text",
        "color": "black",
        "x": "width",
        "align": "right",
        "y": -5
      },
      "encoding": {"text": {"type": "nominal", "field": "R2"}}
    }
  ]
}

编辑

为了显示文本,我从您的 grid 选择中删除了 bind 配置。删除它后,文本正确可见,这可能是一个问题,或者背后有一些原因。 更新了上面代码段中的以下行:

"selection": {
        "paintbrush": {"type": "multi", "on": "mouseover", "nearest": true},
        "grid": {"type": "interval"}
      },