Exam potential questions Flashcards

1
Q

Why do the models which use industrial data cannot be very strong?

A

Because replicative data points are not typical, and cannot be used for data models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define MLR, the goal from using it, and what does it assume?

A

Multiple linear regression (MLR), also known simply as multiple regression, is a statistical technique that uses several explanatory variables to predict the outcome of a response variable.
The goal of multiple linear regression (MLR) is to model the linear relationship between the explanatory (independent) variables and response (dependent) variable.
MLR assumes the following:
1- There is a linear relationship between the dependent variables and the independent variables.
2- The independent variables are not too highly correlated with each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define PLS and determine when it is useful?

A

Partial least squares regression (PLS regression) is a statistical method that is related to regression:
It finds a linear regression model by projecting predicted variables (predictors) and observable variables. PLS is used to find the fundamental relations between two matrices (X and Y).
Partial least squares (PLS) regression is a technique that reduces the predictors to a smaller set of uncorrelated components and performs least squares regression on these components, instead of the original data.
PLS is useful when:
1) Predictors are highly collinear,
2) you have more predictors than observations,
3) ordinary least-squares regression either produces coefficients with high standard errors or completely fails.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define Anova, and when is it used?

A

ANOVA determines IF the difference between samples is by chance or due to systematic treatment effect.

ANOVA is used in the analysis of comparative experiments, those in which only the difference in outcomes is of interest. The statistical significance of the experiment is determined by a ratio of two variances. This ratio is independent of several possible alterations to the experimental observations: Adding a constant to all observations does not alter significance. Multiplying all observations by a constant does not alter significance. So, ANOVA statistical significance result is independent of constant bias and scaling errors as well as the units used in expressing observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are the steps that should be followed to check model suitability?

A

To check model suitability, the following steps should be taken:

1) Evaluation of raw data: data scattering, any replicate points in process data
2) Regression analysis and model interpretation: R2 and Q2 values, coefficient plot, confidence limits of weight factors, …
3) Use of regression model: response contour plot, Equation fit for other models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the importance of raw data quality?

A

The raw data quality is essential in getting a reliable data model which can explain process disturbances, or which can be used in process performance at different conditions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can low-quality data be differentiated from better quality data?

A

There are a couple of main characteristics for which low-quality data can be differentiated from better quality data:

1) data should be dispersed well spatially. It means that amount of data is sufficient (> 20-25 data points to get distributed data). Data should not be in a small region, and it is recommended that it is uniformly dispersed.
2) If replicate points are available, the responses should be within a reasonable limit as it is related to measurement accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the use of the predicted vs. observed plot?

A

It is useful to find outliers and to determine the data dispersivity;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the use of the Residuals n-plot?

A

The Residuals n-plot is also useful to find out outliers in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe a good model fit where an experimental plan can be conducted and when process data is used. why the second one is lower?

A

The good model fit has R2 > 0.9 and Q2 > 0.7 especially in cases where an experimental plan can be conducted. However, when process data is used the R2 and Q2 values can be weaker, R2 > 0.8 and Q2 > 0.6. An obvious reason is that replicate points may not be available, and the variable values can be concentrated within too tight specifications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the solution to increasing too narrow process window?

A

1) To Include laboratory test results together with actual process data
2) To Perform intentional out-of-process variable specification data points (confirm hypothesis)
3) To Use process development data history along with actual process data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define the Coefficient plot?

A

It is an important tool in defining the most meaningful process parameters. The parameter weight is primarily used for the estimation of parameter significance to the response.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what does the confidence interval of weight factors (error bars) determine?

A

It determines the parameter significance as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the contour plot indicate? and what does its shape describe?

A

It indicates the goodness of the regression model. The contour plot shape not only describes the process parameter significance but also is informative to model reflection to reality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the uses of regression modeling?

A

can be used for several purposes:

1) Experimental plan results interpretation
- In statistical analysis software the Design of Experiments (DOE) is supported. It is used for screening significant process parameters and initial process conditions, and also for optimizing best process performances.
- The models are relatively weak because physical phenomena are not included in these models. It is, however, possible to some extent when process variables are transformed to e.g. dimensionless numbers: Reynolds-, Nusselt-, Froude- or Damköhler-numbers, dimensionless length, dimensionless time.
2) Trouble-shooting process
- Solving trouble-shooting problems is relatively easy to perform but there are the following limitations to data which will be revealed during data-analysis: the lack of replications causes difficulties to evaluate measurement & sampling-based errors; the model may suffer from a narrow parameter window which weakens the model predictive capabilities in trouble-shooting.
3) Creation of a simplistic mathematical model for simulation of process performance.
- In the case of large data amounts it is relatively fast to set-up a set of linear equation-based model. It is noted that the calculated weight factors are easy to pick-up from results and include them into the selected set of equations.
- The acquired model can be then used for e.g. predicting process performance using different process run scenarios.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to check model

A

1) Check Summary of Fit

2) then check ANOVA

17
Q

what is D-optimal design?

A

Computer-generated designs that maximize the volume (maximize the
determinant of the X’X matrix). You can select the design based on G-efficiency,
Determinant or based on a conditional number. G efficiency compares D-optimal design to
fractional factorial design which selected by default.

18
Q

Why biological processes concerning WWTPs are complex?

A

1) The inlet composition has an extremely large variation of compounds, and the composition changes as a function of time (seasonal change, etc.).
2) The purification reactions are based on chemical purification partly and on bacterial metabolism. The biochemical reactions have several reaction steps
which occur parallel and in series, and the mechanisms even for main impurities are complex.
3) The bacterial population can vary largely e.g. due to inlet composition changes or yearly seasonal effects.

19
Q

why modeling wastewater treatment is difficult?

A

The models are very complex and non-linear which makes the use and optimization time-consuming. Additionally, several model parameters need to be defined in order to start simulations. Model parameter establishment is also needed for the use of WWTP specific simulation software