0% found this document useful (0 votes)
14 views37 pages

Statistic and Data Science Ii PDF

The document outlines key concepts in machine learning, focusing on supervised and unsupervised learning, model building, and evaluation techniques. It covers statistical methods such as t-tests, ANOVA, linear regression, and cross-validation, emphasizing the importance of understanding data relationships and model performance metrics. Additionally, it introduces generalized linear models and logistic regression for classification tasks, including methods for handling multiple classes and evaluating model performance through ROC-AUC analysis.

Uploaded by

wkg6ndm9dt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views37 pages

Statistic and Data Science Ii PDF

The document outlines key concepts in machine learning, focusing on supervised and unsupervised learning, model building, and evaluation techniques. It covers statistical methods such as t-tests, ANOVA, linear regression, and cross-validation, emphasizing the importance of understanding data relationships and model performance metrics. Additionally, it introduces generalized linear models and logistic regression for classification tasks, including methods for handling multiple classes and evaluating model performance through ROC-AUC analysis.

Uploaded by

wkg6ndm9dt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

STATISTIC AND DATA SCIENCE II

1. ML Projects: types and steps


a. Supervised learning
Labeled training data: there is a target variable, also called label.

b. Unsupervised learning
Unlabeled data: no specific output is provided → no target variable. Thus, we cannot assess if
the outcome is right or not, there is not correct value to compare with.

Supervised learning learns patterns from labeled data to make predictions, while
unsupervised learning finds patterns in unlabeled data.
ML projects: Common steps
0. Collecting data

EDA and feature engineering


- Descriptive statistics
- Analysis of target variable
- Univariate and bivariate analysis + visualizations
- Missing values
- Correlation among explanatory variables
- Hypothesis validation
- Relationships target - explanatory
- Hypothesis testing
- Feature engineering
- Feature generation
- Encoding
- Feature selection

RECAP
- How to compare a variable between 2 groups → t-Test
- We want to check whether differences between groups are statistically
significant.
- In particular, we want to compare means:
- General mean
- One mean value per group
- All t-test methods such as t-test (assuming normality) or Wilcoxon test (not
necessary data to be normally distributed) are based on the idea of analyzing
residual distribution between a base linear model and 2 adapted linear models,
one per group.
- How to compare a variable between more than 2 groups → ANOVA
- We want to check whether differences between groups are statistically
significant
- In particular, we want to compare means:
- General mean
- One mean per group
- ANOVA to check whether there exists statistical differences, and
Kruskal-Wallis for non normal.
- Tukey-Honest significant differences, or Dunn´s for not normal.

MODELLING DATA
What is a model?
It is a mathematical formula that defines a variable y as a function of other variables x:
- One dependent variable (y) → target
- Several independent ones (xs)

In some contexts, we have discovered exact analytical formulas that fit the data almost
perfectly.
Data is not that perfect, though. For a given value of x, we may find several values for y. It is
not possible to build an exact equation fitting our data. So, as perfection does not exist, we
use statistical approaches.
Whenever we model a dataset, we think about the process that could have generated the
observed data. This is the data generation process.

Some concepts
- Predictions or fitted values: what our model predicts for y when x is substituted by
concrete values. (Regression line, blue)
- True Y values (black dots)
- Residuals or prediction error: difference between our prediction and the real value
(red lines).

-
Our goal is to minimize the prediction error → making our predictions as closed as possible
to the real data.

Linear relationship between variables


- Correlation Coefficient (Pearson)
- A quantity to measure the degree of linear covariation of 2 variables.
- It may move from -1 to +1.
- It is not enough to understand the data.
- Linear correlation coefficient is very sensitive to outliers.
- There are other methods to compute correlation robustly.

Linear Regression Model


Linear regression is used to predict a quantitative outcome variable, y, on the basis of one or
more predictor variables, x. The data generation process for linear regression involves
simulating a linear relationship between x and y.

Assumptions:
1. Linearity
2. Normality
3. Homoscedasticity
4. Independence
5. No multicollinearity
6. Exogeneity
Coefficients
There are two ways of computing them:
- Search for coefficients minimizing MSE → ordinary least squares, OLS
- Search for them maximizing log-likelihood
In the case of a linear regression, it can be shown that both methods return the same results.
Significance metrics:
- P-value of each coefficient → t-test to check if they are not 0
- F-statistic → overall model significance

Interpretation

- Ceteris paribus, for each increase of one unit in x, y changes in a 1.


- Dummy variables have a different interpretation.
- If you transform variables to make them “Normal”, interpretation also changes.

Results
Obviously, if our data does not show a linear relationship between target and independent
variables, our model is not going to properly fit the data.
- Residuals vs Fitted plot (patterns indicate potential non-linearity)

We must always check to what extent our model represents data robustly. To do so we check:
- Coefficient of determination (R2) & Adjusted R2.
- Mean squared error (MSE) & Root mean squared error (RMSE)
- Residual standard error (RSE) = RMSE adjusted by # predictors
- Mean absolute error (MAE)

- Residuals vs Leverage: check whether some x predictor have extreme values


- Influential points, those that, if omitted, would significantly change the fit of the
model.
- Cook´s distance: check whether some points influence predictions too much. Not all
outliers are influential.
Residual analysis
- Normality
Residuals should have zero mean and follow a normal distribution. There are ways to
check this, such as the QQ plot, or the Shapiro-Wilk normality test.
- Independence
We assume every row in our data is extracted independently, there is no auto-correlation
between different rows of data. This is the typical case when we analyze customers, surveys,
etc. Once we have trained a model, we can check this condition on the independence of the
residuals using the Durbin-Watson test.
It might become a problem when analyzing temporal data (i.e.: distinct surveys on the same
sample of population). → time series analysis
- Homoscedasticity
For a given value of X, we assume the target variable underlying the distribution exhibits the
same variance.

Multicollinearity
- Data multicollinearity: This type occurs when we create a model feature using other
features. In other words, it’s a byproduct of the model that we specify rather than
being present in the data itself. For example, if you square term X to model curvature,
clearly there is a correlation between X and X2.
- Structural multicollinearity: This type of multicollinearity is present in the data
itself rather than being an artifact of our model. Observational experiments are more
likely to exhibit this kind of multicollinearity.
- How to assess multicollinearity for a given predictor? With the variance inflation
factor (VIF) = Score that measures how much the variance of a coefficient is inflated
due to multicollinearity.
- If VIF = 1 - absence
- VIF > 5 – multicollinearity problems
IMPROVING LINEAR MODELS

BEYOND STANDARD LINEAR MODELS


There are two problems that typically emerge when analyzing data:
- Too many variables
- Multicollinearity
We have seen some techniques to measure and try to avoid these problems, however,
sometimes it is impossible to completely manage them. A good alternative are shrinkage
methods, also called penalized regression: linear model penalized by adding a constraint in
the equation. Such penalty reduces coefficients towards 0 in the less contributive variables.

Error in formula to minimize in standard linear models

Penalized regression:
- alpha = 1 → Lassp Regression (L1)
- alpha = 0 → Ridge Regression (L2)
- alpha between 0 and 1 → ElasticNet model

We need to choose alpha and lambda, the constant values. How to select optimal values for
them? With cross-validation.
Crossing the right regularization depends on your data´s feature relevance and correlation
structure.
CROSS-VALIDATION (CV)
- Simple validation approach:
We cannot just fit the model to our training data and hope it would accurately work for the
real data it has never seen before. We need some assurance that our model has got most of the
patterns from the data. Then, when we need to divide our data in 2 sets: training and testing.
CV is a set of methods to evaluate the performance of a model by testing it on a new unseen
data sets (test data).
The basic idea is:
- Reserve a small sample of data by randomly splitting it into 2 parts.
- Build the model in the remaining part.
- Test the performance on the reserved sample.
CV is also known as resampling method, as it involves fitting the same model multiple times
using different subsets of the data.

- K-Fold Cross-validation:
1. Split the dataset into multiple parts or folds, where some parts are used for training the
model and the remaining for testing (checking performance).
2. The process is performed as many times as folds are.
3. Results are averaged over folds to give a reliable estimate of how the model will
perform on new data, so then we can select the best performing model, parameter, etc.
How to select optimal values for lambda and alpha?
In ElasticNet, first we need to establish a range of potential values for each parameter. For
every lambda and alpha combination, the model is trained on all K folds. 5 in the example.
We compute the average error for every combination of lambdas across the folds. The
combination providing the minimum error is selected as the best one.

- Cross-validation
It is used for:
- Model comparison and selection
- Hyperparameters tuning
- Avoiding overfitting
2. HYPOTHESIS TESTING RECAP

SO FAR (FAST SUMMARY)


- Linear regression: method to model and predict a continuous target variable on
independent variables (they may be continuous or discrete).
- Goodness of fit: R2, MSE, MAE…
- Coefficients: effect of X on Y. If data is scaled, they can be used to compare the
importance of different features in our model.
- p-value: not all coefficients might be statistically significant because the variable is
not relevant to the target (maybe due to multicollinearity).
- We can also induce some desirable properties into the coefficients with regularization
methods, like Ridge, Lasso and ElasticNet.
- We search for the best hyperparameters by doing Cross-Validation.
Model Validation Framework
Testing model´s generalization ability:
1. A training set is used to train the machine learning model(s).
2. A validation set is used to estimate the generalization error of the model created from
the training set → select best model specification.
3. A test is used to estimate the generalization error of the final/chosen model.
GENERALIZED LINEAR MODELS (GLM)
GLMs are a flexible extension of regular linear models that allow you to model a wider
variety of data types. They consist of three main components:
1. Linear relationship: the model still assumes a linear relationship between the
input variables (features) and the target, but in a transformed space. This is, in
GLMs we transform the target variable (using a function called link function)
so we can still model it by using the linear combination of the features.
2. Link function: a function that connects the linear predictor (the combination
of input variables) to the expected value of the target. This allows the model to
handle non-normal distributions of the output.
3. Distribution of the outcome: instead of just modeling continuous data with
normal distributions, GLMs can handle other types of data distributions, sich
as binomial or Poisson.

BINARY TARGET VARIABLES


Binary target variable: yes = 1, no = 0 as an outcome. This implies a classification problem.
How do we model it? How do we predict it? Remember the probability distribution of binary
random variables: Bernouilli (p) and Binomial (n, p), where:

Being p the probability of success.


BUT we cannot fit a linear regression to model a probability, because a probability is NOT
linear, but bounded between 0 and 1, while the outcome of a linear regression varies from
minus infinite to infinite.

PROBABILITY, ODDS AND LOG-ODDS


The asymmetry in these cases makes it very difficult to compare odds. Let´s remember that,
having p as the probability of success:
So log-odds can be reformulated as:

The log of the ratio of the probabilities is called the logit function, which is the basis of the
logistic regression. Why? The log-odds of p can be modeled linearly because the logit
transformation expands the probability space to cover all real numbers. The linearity
simplifies the interpretation and computation of the model parameters (allowing us to use
linear regression methods to estimate them) and it assures that p remains between 0 and 1:

MERGING ODDS AND LINEAR MODELS


DERIVING LOGISTIC REGRESSION
First, we model the logit function as a linear model:

This would be the logistic function.


INTERPRETING COEFFICIENTS

MEASURING CLASSIFICATION MODEL PERFORMANCE


Since LogR returns probabilities, if prob is > X, then we assign a point to class 1; otherwise,
0.
Confusion matrix and derived metrics
Let´s fix X = 0.5 for now, so if predicted prob > 0.5, the class is “positive”, and negative
otherwise.

CLASSIFICATION MODEL PERFORMANCE


It is key to analyze all performance metrics in order to fully understand how the model is
performing.
Since LogR returns probabilities, if the score is > x, then we assign 1; otherwise, we assign
0.
ROC - AUC
Every different value of the score threshold, X, provides a different rule to decide whether to
assign to the positive class or not. For a given model, we can compute many different
thresholds which change the performance metrics and choose the best one using the Receiver
Operating Characteristic (ROC Curve).
It is a graph to evaluate the performance of a binary classifier. It shows how well it can
separate two classes as you change the decision threshold:
- y axis: TPF (sensitivity or recall)
- x axis: FPR
- Threshold (each provides a TPF-FPR combination)

The AUC, Area Under the ROC Curve, is a single number that summarizes the
performance of the model over all possible thresholds. It measures the total area under the
ROC curve:
- 0.5 = a random classification
- 1 = a perfect classifier

The points on a ROC curve closes to (0, 1) represent a range of the best performing
thresholds for the given model. However, there is an important assumption here: our errors
are symmetric, it is as bad to do FPs as FNs. Is this always the case?
In many cases, the 2 different types of error are not symmetric, so we must consider this
when fixing the score threshold so we reduce the most costly error.
MULTIPLE CLASSFICATION
Imagine we had 3 classes in our target data, A, B and C; which do not follow a natural order
(the data is not ordinal). How do we design a method mashed on logistic regression to do
multinomial classification?
- Option 1: several logistic regressions
1 regression for each category → the chosen class takes value 1, the rest the value 0.

We run the 3 models and register results for each observation.

Thus, we have 3 probabilities for each new observation, one for each possible class. We
identify the maximum value of those 3.
Given new data, we evaluate all models and assign the class with the highest score.

- Option 2: Miltinomial Logistic Regression


As with the binomial logit model, we need to link the probabilities p to the predictors x, while
ensuring that the probabilities are restricted between 0 and 1.
The multinomial logistic regression is an extension of the logistic regression. It is intended
for nominal data that involves more than 2 cases. It can be used for ordinal data, but the
information about order will not be used.
We here the logit function, but instead of one single equation, we have an equation for each
category relative to a reference category. This is, one class acts as a baseline or reference, and
the model predicts the log-odds of each other category relative to this baseline.

Once we have the logit, the model estimates the probability of each class using the softmax
function, which ensures the probabilities sum up to 1.
So, in our example of the 3 classes, we have:

The final output of the multinomial logistic regression is a probability per class for each
observation (many packages provide the highest prob and its class). Thus, as before, given
new data we evalute three models and we assign the class with the highest score.

- Option 3: Ordered Categorical Regression


What if I have ordinal data (more than 2 ordered categories)?
Very common situation in different contexts:
- Measuring severity of damages: “low”, “medium”, “high”
- Reported interest: “not at all”, “medium”, “very interested”
- Consumer satisfaction: “excellent”, “good”, “neutral”, “fair”, “poor”

We proceed similarly to logistic regression, but we introduce the concept of cumulative


odds, that is, odds of being less than or equal category k:
Ordinal regression model:

It assumes proportional odds. This means the effect of all predictors is consistent across
categories ( = the impact of features don’t change for different categories). Because the
relationship between all pairs of groups is the same, there is only one set of coefficients.
Brant test to check if it holds
Coefficient interpretation: exp(alpha_k) is related to the increase of odds to be in category k
or less.
Model evaluation: mixed methods for regression and classification.

CLASS 3:
SUMMARY

Target Type of Link


E(Y|X) Algorithms
variable modeling function

Linear Regression
ElasticNet
Continuous Normal Regression Identity
Lasso Regression
Ridge Regression

Classificatio Logistic
Binary Binomial Logit
n Regression

Non-ordered
Multinomi Classificatio Multinomial
categorical Softmax
al n Classification
variables
Ordinal
Ordered Ordinal Ordered Logistic
categorical distributio Logit
Regression Regression
variables n

POISSON REGRESSION MODEL


Now, suppose our target variable only takes positive integer numbers. How do we model it?
BUILDING POISSON REGRESSION MODEL
In logistic regression, we modeled the log-odds with a linear model. In Poisson Regression,
we model lambda (λ, the expected value of Y) as a linear function of the independent
variables. λ is related to the predictors through the log-link function to ensure non-negative
predictions for λ.

The model outputs the always non-negative mean number of occurrences (events) expected
given the predictors.

TWO POTENTIAL PROBLEMS


1. In our data, we see variance is significantly higher than mean → overdispersion
2. Number of 0s is much higher than expected by Poisson distribution → Zero-inflation
We need to adapt the Poisson Regression Models to deal with these problems.
a. Overdispersion
The data has more variability than the expected by a Poisson distribution V[Y] >> E[Y]. In
consequences, the model underestimates the variability, resulting in underestimated standard
errors and poor fit.
SOLUTIONS
- Quasi-Poisson regression: extension of Poisson model that assumes that the variance
can be modeled as a linear function of the mean, this is, variance is proportional to
mean, providing robust standard errors.
This model computes an overdispersion factor relating mean and variance:

However, it is important to know that this 𝜙 factor is not introduced as a new parameter in the
model: it only modifies the way standard errors are computed after fitting the model.
The model structure and link function remains the same as in Poisson Regression, but the
standard errors of the estimated coefficients are adjusted based on 𝜙, ensuring that hypothesis
tests and confidence intervals account for the overdispersion:

- Negative Binomial regression: based on the Negative Binomial distribution. The


dispersion parameter is part of the likelihood function, meaning it directly impacts the
predicted values as well as the variance. It also allows for negative values.
Negative-binomial distribution models a variable indicating the number of failures until
reaching a number of successes (r).
In count data modeling, we use a reparameterized form where the interpretation of “failures
before successes” translates into a model that counts events with a flexible mean and
variance, which can grow independently, making it suitable for overdispersed data.
This model explicitly includes the overdispersion by adding an additional parameter:

𝜙 is estimated along with the other parameters through maximum likelihood (MLE), meaning
it is part of the core model fitting process, but the model structure and link function
remain the same.

HOW TO CHOOSE THE RIGHT MODEL?


- Check the dispersions: Check whether the variance of count data significantly
exceeds the mean V[Y] >> E[Y]
- Estimate Dispersion factor: Calculate the dispersion factor, which is the ratio of the
residual deviance to the residual degrees of freedom. A value greater than 1 indicates
overdispersion.
- If the overdispersion is mild, quasi-Poisson might be enough
- If the overdispersion is moderate to severe, the Negative Binomial model is
likely a better fit
b. Zero-inflation
Zero inflation occurs when there are more zeros in the data than a standard Poisson model
can account for. The Poisson model assumes that zeros occur randomly according to the rate
(λ), but in some cases, there might be an excess number of zeros due to structural reasons
(certain conditions where events simply don’t happen).
In consequence, the model will underestimate the number of 0s and overestimate counts

SOLUTIONS
- Zero-Inflated Poisson Model, ZIP
This model assumes that there are 2 processes involved and handles them separately:
- Tries to predict zeros by using logistic regression.
- For non-zero data, it computes a Poisson Regression model (that can also
produce 0s).
This is our first example of mixed modelling.
GLM FINAL REMARKS
UNDERSTANDING E(Y|X)
In Linear models, the focus is on understanding and predicting averages for the response
variable.

LMs VS GLMs
In a standard LM → relationship between the target variable and predictors is modeled like
this:

- Xβ=E[Y∣X]: The mean of Y given X is a linear combination of the predictors


- 𝜖: The error term is assumed to follow a normal distribution with constant variance
and 0 mean.
- It assumes a linear relationship between the predictors and the mean of Y on its
original scale

In GLM, the mean of Y is also modeled, but in a more flexible way:

- g(⋅): The link function transforms the mean to a scale where it is linearly related to
the predictors.
- E[Y ∣ X]: The mean of Y is modeled indirectly via the link function.
- The mean of Y may have a nonlinear relationship with the predictors on its original
scale

FAST SUMMARY
GLMs relate the linear combination of predictors Xβ to the expected value of the response
variable E(Y|X) through a link function (glmnet + family in R).

Target Type of Link


Algorithms E(Y|X)
variable modeling function

Linear
Regression
ElasticNet
Continuous Regression Lasso Normal Identity
Regression
Ridge
Regression

Classificati Logistic
Binary Binomial Logit
on Regression

Non-ordered
Classificati Multinomial Multino
categorical Softmax
on classification mial
variables

Ordered Ordinal Ordered Logistic


categorical Ordinal Logit
Regression Regression
variables
Poisson
regression
Quasi-poisson
regression Discrete
Non-negativ
Regression Negative-Binomi and Logarithmic
e & discrete
al Regression positive
Zero-inflation
poisson
regression

GLMs AND REGULARIZATION METHODS


- Regularization methods can be applied to models within the GLM family.
Penalization is applied to the coefficients, making them smaller or even 0.
- Scaling data is key data is key when applying regularization: unscaled features can
lead to biased penalization.
- If we apply a very strong penalty (high lambda) some important predictors might be
excluded (excessive regularization). For optimal performance, we use cross validation
to select best hyperparameters = those most reducing the error. [In R: caret and
cv.glmnet packages]
Key considerations for modelling
1. Choose the Right GLM: Match the distribution of the target variable to the
appropriate GLM (Logistic for binary, Poisson for counts,…).
2. Need for regularization? Understand your data and whether you need regularization
or not
- Prevent overfitting.
- Handle high-dimensional data and/or multicollinearity.
- Perform feature selection (Lasso).

GAMMA REGRESSION (EXTRA)


A Gamma Generalized Linear Model is a type of GLM used when the response variable is
continuous, positive and skewed.
- Survival Analysis and Reliability Engineering: modeling time until an event
occurs, like time to failure of machine parts.
- Healthcare and Biology: modeling times to healing, times to infection, or other
duration-based phenomena.
- Finance and Insurance: modeling incomes or financial returns that are positive and
skewed.

Gamma distribution fits well continous


random variables since it is bounded between
zero and infinity and exhibits right skewness,
a typical phenomena in many situations.

- If our continuous target variable is fitted appropriately by gamma distribution, we can


use gamma regression to model and predict it.
- In this case, the variance is proportional to the square of the mean, so we can handle
data with high skewness.
- Idea: more likely situations tend to appear earlier in time. Therefore, less probable
situations tend to appear later.
- Link: Inverse link function
Sometimes, observations are not independent:
Clustered data → target variable is measured once for each subject (the unit of analysis),
and the units of analysis are grouped into clusters of units.
Let´s suppose we want to study the relationship between average income (y) and the
educational level in the population of a town comprising four fully segregated
neighborhoods. You will sample 1000 individuals irrespective of their areas of origin. If you
ignore dependencies among observation, since individuals from the same neighborhoods are
not independent, then you would be yielding residuals that correlate within block.

Longitudinal data → target variable is measured more than once for each unit of analysis,
with the repeated measures likely to be correlated.
Suppose you want to study the relationship between sleep quality (y) and the levels daily
stress with samples from 1,000 people recording their stress level and sleep quality each day
for 30 days.
If you model as such, you ignore the fact that repeated measurements are nested within each
individual - observations within the same person are not independent, leading to correlated
residuals.

Not independent samples


When working with data where observations are grouped, we often want to:
1. Capture overall patterns (fixed effects): trends or effects that apply to the whole
population.
2. Account for group-specific differences (random effects): variations specific to each
group that we don´t explicitly model but still want to control for.

LMMs are statistical models for continuous overcome variables in which the residuals are
normally distributed but may not be independent or have constant variance. LMMs extend
linear models by addling flexibility to handle grouped data structures, where observations
aren´t entirely independent.

LINEAR MIXED MODELS (LMMs)


Why to use Mixed Models? LMMs address two core issues:
1. Hierarchical Data Structure: when data has a natural grouping, assuming all
observations are independent leads to biased parameter estimates. Mixed models
capture this structure by allowing for correlations within clusters.
2. Partial Pooling: instead of treating each group entirely separately (no pooling = one
model for one group), or ignoring group differences (complete pooling = one model
for all observations), LMMs provide a middle ground. Partial pooling “shares
strength” across groups, using data from all of them to inform the estimates for each,
making the model more robust, especially when some groups have small sample sizes.

LMMs combine the benefits of different approaches, as it counts for non-independence while
retaining individual-level data. They model the relationship at both levels:
1. Fixed effects: like coefficients in standard linear models, these capture
population-level effects, representing the overall influence of predictors that are
assumed to be the same across all groups or clusters.
a. These are explanatory variables of main interests in our study
b. Estimate general trends
2. Random effects: these capture group-level deviations from the fixed effect trends,
allowing different groups to have their own intercepts or slopes. RE introduce an
additional layer of flexibility, allowing the model to account for intra-group
correlations and variations across groups.
a. Model group-specific deviations or variability around fixed effects
Model specification -- General
Fixed or Random effect?

Complex effects: Nested


When we have more than 2 levels of grouping factors, LMMs get more complex. Some
effects may be nested within others. For instance, in this case we have subgroups 1 and 2
(grey line), which are nested inside the upper left green box:
Complex effects: Crossed
When having multiple levels in our data, first-level variables may vary across units of the
other levels. Similarly, second-level variables may vary across units of the third level, so we
need to include cross-level interactions between variables. Sometimes we cannot set a strict
hierarchy between random effects, and this is called crossed effects.
REGULARIZED LMMs AND EXTENSIONS
Can we apply regularization methods such as Ridge, Lasso, or Elastic Net to LMMs? Yes.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy