---
title: "An overview of package plot3logit"
author: "Flavio Santi, Maria Michela Dickson, Giuseppe Espa, Diego Giuliani"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{An overview of package plot3logit}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: ../inst/REFERENCES.bib
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6,
  fig.height = 4,
  fig.align = "center"
)
```




## Introduction

The package permits the covariate effects of trinomial regression models to be
represented graphically by means of a ternary plot. The aim of the plots is
helping the interpretation of regression coefficients in terms of the effects
that a change in regressors' values has on the probability distribution of the
dependent variable. Such changes may involve either a single regressor, or a
group of them (composite changes), and the package permits both cases to be
handled in a user-friendly way. Methodological details are illustrated and
discussed in @santi2019.

The package can read the results of **both categorical and ordinal trinomial
logit** regression fitted by various functions (see the next section) and
creates a `field3logit` object which may be represented by means of functions
`gg3logit` and `stat_field3logit`.

The `plot3logit` package inherits graphical classes and methods from the package
`ggtern` [@hamilton2018] which, in turn, is based on the package `ggplot2`
[@wickham2016a].

Graphical representation based on **standard graphics** is made available
through the package `Ternary` [@smith2017] by functions `plot3logit` and
`TernaryField`, and by the `plot` method of `field3logit` objects.

See the help of `field3logit` for representing composite effects and
`multifield3logit` for drawing multiple fields.

See @santi2022 for a detailed presentation of the package and its features, or
read the main vignette of the package which is based on it.



## Compatibility

Function `field3logit` of package `plot3logit` can read trinomial regression
estimates from the output of the following functions:

* `polr` of package `MASS` (ordinal logit regression);
* `mlogit` of package `mlogit` (logit regression);
* `multinom` of package `nnet` (logit regression);
* `clm` and `clm2` of package `ordinal` (ordinal logit regression);
* `vgam` and `vglm` of package `VGAM` (logit regression).

Explicit estimates can be passed to `field3logit()` by means of a named list too
(see the help of `field3logit` or the main vignette of the package); moreover,
users can easily extend the range of objects that can be processed by
`field3logit()` by implementing the S3 methods of the generic `extract3logit`
(see the help).


In the following, an example is provided for each object `field3logit` can
work with. First, however, package `plot3logit` must be attached and dataset
`cross_1year` must be loaded:


```{r, message=FALSE}
library(plot3logit)
data(cross_1year)
str(cross_1year)
```



### Package `MASS`

Function `polr` of package `MASS` [@venables2002] fits a *proportional odds
logistic regression* by means of a fairly simple syntax, however the order of
labels of the dependent variable must be explicitly stated:
```{r, message=FALSE, results='hide'}
library(MASS)

mydata <- cross_1year
mydata$finalgrade <- factor(
  x = mydata$finalgrade,
  levels = c('Low', 'Average', 'High'),
  ordered = TRUE
)

mod0 <- polr(finalgrade ~ gender + irregularity, data = mydata)
field3logit(mod0, 'genderFemale')
```



### Package `mlogit`

Package `mlogit` of package `mlogit` [@mlogit] can fit a wide range of models.
The current implementation of `plot3logit` only works with pure trinomial
models, where only specific coefficients are considered:
```{r, message=FALSE, results='hide'}
library(mlogit)
mydata <- mlogit.data(cross_1year, choice = 'employment_sit', shape = 'wide')
mod0 <- mlogit(employment_sit ~ 0 | gender + finalgrade, data = mydata)
field3logit(mod0, 'genderFemale')
```



### Package `nnet`

Function `multinom` of package `nnet` [@venables2002] fits a multinomial model
by means of a very simple syntax:
```{r, message=FALSE, results='hide'}
library(nnet)
mod0 <- multinom(employment_sit ~ gender + finalgrade, data = cross_1year)
field3logit(mod0, 'genderFemale')
```



### Package `ordinal`

Package `ordinal` [@ordinal] permits ordinal multinomial logistic models to be
fitted by means of two functions: `clm` and `clm2`. The latter is "a new
improved implementation" of the former (see the help of `clm2`), in both cases
the syntax is simple and standard. Unlike `polr` of package `MASS`
[@venables2002], the dependent variable may be an unordered factor, and in that
case both `clm` and `clm2` consider the vector of the labels as ordered: if this
is not the case, the dependent variable should be properly redefined.
```{r, message=FALSE, results='hide'}
library(ordinal)

mydata$finalgrade <- factor(mydata$finalgrade, c('Low', 'Average', 'High'))

mod0 <- clm(finalgrade ~ gender + irregularity, data = mydata)
field3logit(mod0, 'genderFemale')

mod0 <- clm2(finalgrade ~ gender + irregularity, data = mydata)
field3logit(mod0, 'genderFemale')
```



### Package `VGAM`

Package `VGAM` [@yee2010] permits multinomial logistic models to be
fitted by means of two functions: `vgam` and `vglm`. In case of multinomial
logistic models they share the same syntax:
```{r, message=FALSE, results='hide'}
library(VGAM)

mod0 <- vgam(
  formula = employment_sit ~ gender + finalgrade,
  family = multinomial(),
  data = cross_1year
)
field3logit(mod0, 'genderFemale')

mod0 <- vglm(
  formula = employment_sit ~ gender + finalgrade,
  family = multinomial(),
  data = cross_1year
)
field3logit(mod0, 'genderFemale')
```



### Reading from `list`

Point estimates and other information on a fitted model may be passed as a named
list to `field3logit`. In case of a non-ordinal trinomial model, the matrix of
coefficients (component `B`) should be a matrix with named rows and two columns,
whereas possible levels of the dependent variable should be included as a
separate component (`levels`) which should be a character vector of length
three, whose first component is the reference level of the model (see the help
of `extract3logit` for details):
```{r, message=FALSE, results='hide'}
mod0 <- list(
  B = matrix(
    data = c(-2.05, 0.46, -2.46, 0.37, -1.2, 0.8, 0.7, 0.4),
    ncol = 2,
    dimnames = list(c('Intercept', 'X1', 'X2', 'X3'))
  ),
  levels = c('Class A', 'Class B', 'Class C')
)
field3logit(mod0, 'X1')
```


## A brief example


Fit a trilogit model by means of package `nnet` where the student's employment
situation is analysed with respect to all variables in the dataset
`cross_1year`:
```{r, message = FALSE}
library(plot3logit)
data(cross_1year)
library(nnet)
mod0 <- multinom(employment_sit ~ ., data = cross_1year)
```




The gender effect is analysed by means of a ternary plot which is generated in
two steps.

Firstly, the vector field is computed:
```{r}
field0 <- field3logit(mod0, 'genderFemale')
```

Secondly, the field is represented on a ternary plot, using either
`gg`-graphics:
```{r}
gg3logit(field0) + stat_field3logit()
```

or standard graphics:
```{r, fig.width=6, fig.asp=1}
plot(field0)
```



## How to specify the covariate changes

Ternary plots represent the effect of a change in covariate values on the
probability distribution of the dependent variable. The function `field3logit`
permits such change to be specified in three different ways: explicitly, as a
numeric vector, or implicitly, either by means of a named numeric vector, or by
means of an R expressions. All three methods are briefly illustrated below.

As an example, the following subsections refer to this trinomial logistic
regression model:
```{r}
library(plot3logit)
library(nnet)
mod0 <- multinom(employment_sit ~ finalgrade + irregularity + hsscore, cross_1year)
mod0
```




### Explicit specification of $\Delta x$

This method for specifying the change in the covariate values requires the
vector $\Delta x$ to be explicitly defined, thus it may be suitable when
$\Delta x$ results from some calculations. On the other hand, it is less
user-friendly than implicit syntax, as it depends on the order of regressors
in the design matrix.



#### First example

If the effect of a *high final grade* has to be assessed, the vector of changes
$\Delta x$ can be set according to the position of the dummy variable
`finalgradeHigh` in the matrix of coefficients of the model `mod0`:
```{r}
coef(mod0)
```
in this case, we have that
\[
\Delta x=[0, 0, 1, 0, 0, 0]'
\]
since `finalgradeHigh` is the fourth coefficient (including the intercept) of
the matrix of coefficients.

It follows that the function `field3logit` can be invoked as it follows:
```{r}
field3logit(mod0, c(0, 0, 1, 0, 0, 0))
```



#### Second example

It is also possible to set $\Delta x$ so as to consider changes involving more
than one regressor, as well as fractional changes. In such cases, $\Delta x$
will consist in a vector where there are several non-zero elements which may 
take any positive or negative value.

Assume, for example, that we want to study the effect of a decrease by 10 in the
high school final score, associated to an high final grade. In such a case,
we have that:
\[
\Delta x =[0, 0, 1, 0, 0, -10]'\,,
\]
thus:
```{r}
field3logit(mod0, c(0, 0, 1, 0, 0, -10))
```



### Implicit specifications of $\Delta x$

Unlike the explicit method, the implicit syntaxes allows the user to initialise
the vector $\Delta x$ by specifying only the covariates which should vary. There
are two possible syntaxes: one is based on named numeric vectors, the other is
based on `R` code. In the following, both of them are illustrated.



#### First example

If the effect of a *high final grade* has to be assessed, the implicit syntaxes
which allow to assess the effect of a unitary change of `finalgradeHigh` are the
following:
```{r}
# Named numeric vectors:
field3logit(mod0, c(finalgradeHigh = 1))

# R code:
field3logit(mod0, 'finalgradeHigh')
```

Note that the console output produced by printing the output of `field3logit`
shows both the implicit effect (line `Effect`) and the associated vector
$\Delta x$ (line `Explicit effect`).




#### Second example

If we want to study the effect of a decrease by 10 in the high school final
score, associated to an high final grade, the implicit syntaxes are:
```{r}
# Named numeric vectors:
field3logit(mod0, c(finalgradeHigh = 1, hsscore = -10))

# R code:
field3logit(mod0, 'finalgradeHigh - 10 * hsscore')
```

Compare the line `Explicit effect` of this output to the line `Effect` of the
same example in the previous section: as expected, they are the same.






## Comparing multiple effects

When effects of multiple changes have to be compared at a time, multiple fields
should be computed and represented on the same plot. This task can be easily
done by creating a `multifield3logit` object and directly representing it.

Since objects `multifield3logit` result by putting together two or more
`field3logit` objects, the package `plot3logit` allows the user to create a
`multifield3logit` object by adding up two or more `filed3logit` or
`multifield3logit` objects using standard sum operator `+`.

Here it is an example. The following command fit a trilogit model where all
available variables are used as regressors. Then four `fields3logit` objects are
computed for assessing the effects of a some combined changes in the duration of
studies and in students' final degree score.

Note that each field is computed just with respect to a single probability
distribution (`refpoint`) of the dependent variable, and only one arrow is
computed. The reason of this is that we have to represent four fields on the
same plot, thus olny a small number of arrows can be drawn in order to preserve
the clarity of the graph.

```{r, results='hide'}
data(cross_1year)
mod0 <- nnet::multinom(employment_sit ~ ., data = cross_1year)

refpoint <- list(c(0.7, 0.15, 0.15))

field_Sdur <- field3logit(mod0, 'durationShort', label = 'Short duration', p0 = refpoint, narrows = 1)
field_Ldur <- field3logit(mod0, 'durationLong', label = 'Long duration', p0 = refpoint, narrows = 1)
field_Hfgr <- field3logit(mod0, 'finalgradeHigh', label = 'High final grade', p0 = refpoint, narrows = 1)
field_Lfgr <- field3logit(mod0, 'finalgradeLow', label = 'Low final grade', p0 = refpoint, narrows = 1)
```

Now the `multifield3logit` object can be created by adding all the `field3logit`
objects up together:
```{r}
mfields <- field_Sdur + field_Ldur + field_Lfgr + field_Hfgr
mfields
```

and the `multifield3logit` object `mfield` can be represented in a graph:
```{r, message = FALSE}
gg3logit(mfields) +
  stat_field3logit(aes(colour = label)) +
  theme_zoom_L(0.45)
```

The code needed for generating the object `mfields` may be conveniently made
shorter in this way (see the main vignette of the package or the help of
`field3logit` for details on syntax):
```{r}
depo <- list(
  list(delta = 'durationShort', label = 'Short duration'),
  list(delta = 'durationLong', label = 'Long duration'),
  list(delta = 'finalgradeHigh', label = 'High final grade'),
  list(delta = 'finalgradeLow', label = 'Low final grade')
)

mfields <- field3logit(mod0, delta = depo, p0 = refpoint, narrows = 1)
mfields
```


It is also possible to rely on syntax based on delimiters `<<...>>` (see
the help of `field3logit`, section **Details** or the main vignette of the
package) in order to create a `multifield3logit` object consisting of fields
based on the effect of each dummy variable generated by the same `factor`
regressor:
```{r}
field3logit(mod0, delta = '<<finalgrade>>', p0 = refpoint, narrows = 1)
```



## Confidence regions

The package `plot3logit` allows also to draw the confidence regions associated
to each effect, both in case of `field3logit` and `multifield3logit` objects.

The confidence regions can be computed when the function `field3logit` is
called by setting the argument `conf`. Otherwise, they can be added later
through the function `add_confregions` as it follows:
```{r}
field0 <- add_confregions(field0, conf = 0.95)
field0
```
and through the same syntax in case of `multifield3logit` objects:
```{r}
mfields <- add_confregions(mfields, conf = 0.95)
```

The statistic `stat_conf3logit` permits confidence regions to be drawn,
if available:
```{r, eval=FALSE}
gg3logit(field0) + stat_field3logit() + stat_conf3logit()
```
and
```{r, message = FALSE}
gg3logit(mfields) +
  stat_field3logit(aes(colour = label)) +
  stat_conf3logit(aes(fill = label)) +
  theme_zoom_L(0.45)
```




## References


