• Calculations of central tendency and dispersion
quantitative variables
• Frequency distribution and confidence interval calculations
• Calculation Graphing of sectors, bar charts and
histograms
• Hypothesis test contrast.
A) Chi Square
B) T Student
C) ANOVA test
D) Linear Regression
It is true
that this seminar was ahead of schedule with regard to the large group classes
because we were a little behind but this served to better understand what soon
we were going to give. Therefore mentioning certain terms will suffice because
in other related publications appear again and more broadly.
To begin with
we have the measures of position, which as we know are the percentiles, deciles
and quartiles.
On the
contrary in the measures of centralization we have the median, fashion and
average; And finally within the dispersion measures is the variance, standard
deviation and path or sampling range.
We also review the inferential
statistics, the types of variables... But we can say that the most interesting
was the tests of contrast of hypotheses:
• First, CHI QUADRADO, we learned to do it in paper but also in Epi Info to
use it in the research work. Observing the results we will accept or not the
null hypothesis. We use it in QUALITATIVE-QUALITATIVE.
• T DE STUDENT secondly used in QUALITATIVE-QUANTITATIVE cases;
Like the previous case we can also insert in our program this table.
• ANOVA TEST, for more than two variables.
• LINEAL REGRESIONES, where we draw the clouds of points and as in
all cases we do also in our program of work. QUANTITATIVE-QUANTITATIVE.
And this was
everything we saw in seminar 4, in just two hours gave us time to look over
each example but it was not until the large group classes when we finished
consolidating the terms and formulas and above all to do more example in Role
that ultimately is what serves us in this subject, practice.
En el seminario 4, realizado el
día 30 de marzo, nos centramos en:
Recodificación de variables
Cálculos de medidas
de tendencia central y dispersión variables cuantitativas
Cálculos de distribuciones
de frecuencia y de intervalos de confianza
Cálculos y
elaboración de gráficas de sectores, diagramas de barras e
histogramas
Contraste de hipótesis mediante test.
a)Chi Cuadrado
b)T de Student
c)Test ANOVA
d)Regresión lineal
Es cierto que
en este seminario se adelantó temario con respecto a las clases de grupo grande
porque estábamos un poco atrasados pero esto nos sirvió para entender mejor lo
que pronto íbamos a dar. Por tanto con mencionar ciertos términos serán
suficientes porque en otras publicaciones relacionadas aparecen de nuevo y de forma
más amplia.
Para empezar tenemos las medidas de posición, que como sabemos ya son los percentiles, deciles y cuartiles.
Por el
contrario en las medidas de centralización tenemos la mediana, moda y media; y
por último dentro de las medidas de dispersión se encuentra la varianza,
desviación típica y recorrido o rango muestral.
Repasamos además la estadística
inferencial, los tipos de variables… Pero podemos decir que lo más interesante
fue las pruebas de contraste de hipótesis:
·En primer lugar, CHI CUADRADO, aprendimos a realizarlo en papel pero también en Epi
Info para usarlo en el trabajo de investigación. Al observar los resultados
aceptaremos o no la hipótesis nula. La utilizamos en CUALITATIVA-CUALITATIVA.
·T DE STUDENT
en segundo lugar que se usan en casos CUALITATIVA-CUANTITATIVA; como el caso
anterior también podemos insertar en nuestro programa esta tabla.
·TEST DE
ANOVA, para más de dos variables.
·REGRESIONES
LINEALES, donde dibujamos las nubes de puntos y como en todos los casos lo
hacemos también en nuestro programa de trabajo.
Y esto fue
todo lo que vimos en el seminario 4, en apenas dos horas nos dio tiempo a ver
por encima cada ejemplo pero no fue hasta las clases de grupo grande cuando
terminamos de consolidar los términos y formulas y sobre todo a hacer más
ejemplo en papel que a fin de cuentas es lo que nos sirve en esta asignatura,
la práctica.
To control random errors, in
addition to the calculation of confidence intervals, we have a second tool in
the process of statistical inference: hypothesis tests or contrasts.
With the contrasts of the strategy
hypothesis is as follows:
- We establish a priori a
hypothesis near the value of the parameter.
- Performs the collection of data.
- We analyze the coherence between
the previous hypothesis and the data obtained.
Tools to answer research questions:
allows to quantify the compatibility between an established hypothesis and the
results obtained.
Whatever the desires of the
researchers, the hypothesis test will always contrast the null hypothesis.
Type of statistical analysis
according to the type of variables involved in the study
2.
HYPOTHESIS ERRORS.
The hypothesis test measures the
probability of error that I make if I reject the null hypothesis.
With the same sample we can accept
or reject the null hypothesis. Everything depends on an error, which we call α.
• The error α is the probability of
mistakenly rejecting the null hypothesis.
• The smallest error at which we
can reject H0 is the error p. (P is synonymous with minimized α)
We usually reject H0 for a maximum
α level of 5% (p <0.05). Above 5% of error, we accept the null hypothesis.
This is what we call "statistical significance".
3.
TYPES OF ERRORS IN HYPOTHESIS TEST.
The most important error for us is
the alpha type. We accept that we can be mistaken up to 5%.
4.
CHI-SQUARE HYPOTHESIS TEST.
To compare qualitative variables
(dependent and independent).
5.
STUDENT TEST (comparison of means)
It is used when the independent
variable is qualitative (dichotomous) and the dependent variable is continuous
quantitative. It only serves to compare two groups.
6.
JOINT STUDY OF TWO VARIABLES.
For this we collect the data in
some tables:
· In each row we have the data of
an individual. Each column represents the values that a variable takes on
them. Individuals are not displayed in any particular order.
· These observations can be
represented in a scatter diagram. In them each individual is a point whose
coordinates are the values of the variables.
7.
DISPERSION AND POINT CLOUD DIAGRAM.
If we have the heights and weights
of x individuals represented in a scatter diagram I place them on a graph to
observe the distribution they have since there is a RELATIONSHIP BETWEEN BOTH
VARIABLES.
8.
PREDICTION OF VARIABLES IN THE FUNCTION OF ANOTHER.
Apparently the weight increases X
kg for each Y cm of height
9.
SIMPLE LINEAR REGRESSION: CORRELATION AND DETERMINATION.
· It is a question of studying the
linear association between two quantitative variables.
· Example: influence of age on
systolic blood pressure
· Deterministic linear models: the
independent variable determines the value of the dependent variable. Then for
each value of the independent variable there would be only one value of the
dependent.
· Probabilistic linear models: for
each value of the independent variable there is a probability distribution of
values of the dependent, with a probability between 0 and 1.
· There is no deterministic model:
there is a cloud of points and we look for the line that best explains the
behavior of the dependent variable as a function of the independent variable.
· Correlation coefficient (Pearson
and Speerman): Non-dimensional number (between -1 and 1) that measures the strength
and the meaning of the linear relationship between variables,
R = ß1 x Sx / Sy
· Coefficient of determination:
dimensionless number (between 0 and 1) giving idea of the relationship
between linearly related variables, is r2.
TEMA 10: HIPÓTESIS ESTADÍSTICAS. TEST DE HIPÓTESIS.
1. CONTRASTES
DE HIPÓTESIS
Para controlar los errores
aleatorios, además del cálculo de intervalos de confianza, contamos con una segunda
herramienta en el proceso de inferencia estadística: los test o contrastes de
hipótesis..
Con los contrastes (test) de
hipótesis la estrategia es la siguiente:
-Establecemos
a priori una hipótesis cerca del valor del parámetro.
-Realizamos
la recogida de datos.
-Analizamos
la coherencia entre la hipótesis previa y los datos obtenidos.
Son herramientas estadísticas para
responder a preguntas de investigación: permite cuantificar la compatibilidad
entre una hipótesis previamente establecida y los resultados obtenidos.
Sean cuales sean los deseos de los
investigadores, el test de hipótesis siempre va a contrastar la hipótesis nula.
Tipo de análisis estadísticos según
el tipo de variables implicadas en el estudio
2. ERRORES
DE HIPÓTESIS.
El
test de hipótesis mide la probabilidad de error que cometo si rechazo la hipótesis
nula.
Con
una misma muestra podemos aceptar o rechazar la hipótesis nula. Todo depende de
una error, al que llamamos α.
·El
error α es la probabilidad de equivocarnos al rechazar la hipótesis nula.
·El
error α más pequeño al que podemos rechazar H0 es el error p. (p es
sinónimo de α minimizada)
Habitualmente rechazamos H0
para un nivel α máximo del 5% (p< 0.05). Por encima del 5% de error,
aceptamos la hipótesis nula. Es lo que llamamos “significación estadística”.
3. TIPOS
DE ERRORES EN TEST DE HIPÓTESIS.
El error más importante para
nosotros es el tipo alfa. Aceptamos que podemos equivocarnos hasta un 5%.
4. TEST
DE HIPÓTESIS CHI-CUADRADO.
Para comparar variables
cualitativas (dependiente e independiente).
5. TEST DE STUDENT (comparación de medias)
Se
utiliza cuando la variable independiente es cualitativa (dicotómica) y la
variable dependiente es cuantitativa continua. Solo sirve para comparar dos
grupos.
6. ESTUDIO CONJUNTO
DE DOS VARIABLES.
Para ello
recogemos los datos en unas tablas:
·En
cada fila tenemos los datos de un individuo. Cada columna representa los
valores que toma unas variables sobre los mismos. Los individuos no se muestran
en ningún orden particular.
·Dichas
observaciones pueden ser representadas en un diagrama de dispersión. En ellos
cada individuo es un punto cuyas coordenadas son los valores de las variables.
7. DIAGRAMA DE
DISPERSIÓN Y NUBE DE PUNTOS.
Si tenemos
las alturas y los pesos de x individuos representados en un diagrama de
dispersión los coloco en una gráfica para observar la distribución que tienen
ya que existe una RELACIÓN ENTRE AMBAS VARIABLES.
8. PREDICCIÓN DE UNA VARIABLES EN FUNCIÓN DE
OTRA.
Aparentemente
el peso aumenta X Kg por cada Y cm de altura
9. REGRESIÓN LINEAL
SIMPLE: CORRELACIÓN Y DETERMINACIÓN.
·Se
trata de estudiar la asociación lineal entre dos variables cuantitativas.
·Ejemplo:
influencia de la edad en las cifras de tensión arterial sistólica
·Modelos lineales
deterministas:
la variable independiente determine el valor de la variable dependiente.
Entonces para cada valor de la variable independiente solo habría un valor de
la dependiente.
·Modelos lineales
probabilísticos:
para cada valor de la variable independiente existe una distribución de
probabilidad de valores de la dependiente, con una probabilidad entre 0 y 1.
·No hay modelo
determinista:
hay una nube de puntos y buscamos la recta que mejor explica l comportamiento
de la variable dependiente en función de la variable independiente.
·Coeficiente
de correlación (Pearson y Speerman) : Número
adimensional (entre -1 y 1) que mide la fuerza y el sentido de la relación
lineal entre variables,
r = ß1 x Sx /Sy
·Coeficiente
de determinación: número adimensional (entre 0 y 1) que da
idea de la relación entre las variables relacionadas linealmente, es r2.
UNIT 9: INFERENTIAL STATISTICS: SAMPLING AND ESTIMATION
1. STATISTICAL
INFERENCE
When we propose a study in the health field to establish
relations between variables, our interest is usually not exclusively in the
specific patients to whom we have access, but rather in all patients similar to
these.
In inferring you never have the sure data of the entire
population on which you deduce the results of a study carried out previously on
the population that interests us, to infer always there is random error.
• To the group of patients about whom we want to study
some question (draw conclusions) we call it study population.
• To the set of concrete individuals that participate in
the study we call it sample.
• The number of individuals in the sample is called the
sample size.
• To the set of statistical procedures that allow us to
pass from the particular, the sample, to the general, the population, we call
it statistical inference.
• To the set of procedures that allow to choose samples
in such a way that they reflect the characteristics of the population we call
Sampling techniques, this is done to avoid bias.
Whenever we work with samples, even if they are
representative, we must assume a certain error.
• If the sample is chosen by a random procedure, that error
can be evaluated. The sampling technique in this case is called probabilistic or random sampling and
the error associated with that sample chosen at random is called a random error.
• In non-probabilistic sampling, it is not possible to
evaluate the error.
• The larger the sample size, I favor the reduction of random
error by probability.
2. STATISTICAL
INFERENCE PROCESS
We have a study population, and the measure we want to get is
called a parameter.
We make a random selection and obtain a sample, the measure
of the study variable obtained in the sample, is called the estimator.
The process by which from the estimator, I approach the
parameter is called inference.
3. STANDARD ERROR.
It is the measure that tries to capture the variability of
the values of the estimator.
The standard error of any estimator measures the degree of variability
in the estimator values in the different samples of a given size that we
could take from a population.
The smaller the standard error of an estimator, the more we
can rely on the value of a particular sample
STANDARD ERROR CALCULATION
It depends on each estimator:
- Standard error for a mean:
- Standard error for a ratio (relative frequency):
From both formulas, it follows that the larger the sample
size, the lower the standard error.
4. THE CENTRAL THEOREM
OF THE LIMIT
For estimators that can be expressed as the sum of sample
values, the distribution of their values follows a normal distribution with
population mean and standard deviation equal to the standard error of the
estimator in question. If you follow a normal distribution, follow the basic
principles of this:
± 1S 68.26% of observations.
± 2S 95.45% of observations.
± 1.95S 95% of observations
± 3S 99.73% of observations.
± 2.58S 99% of observations.
5.
INTERVALS OF CONFIDENCE:
·They are a means of knowing the parameter in a
population by measuring the error that has to do with chance (random error).
·It is a pair of numbers such that, with a certain
confidence level, we can ensure that the value of the parameter is greater or
less than both numbers.
·It is calculated considering that the sample
estimator follows a normal distribution, as established by the central limit
theory.
The greater the confidence we want to give to the
interval, the longer it will be, ie the lower and the upper end of the interval
will be more distanced, and therefore the interval will be less precise.
You can calculate confidence intervals for any
parameter: arithmetic means, proportions, relative risks, odds ratio ...
In formulas each time we use proportion we express
in the formula in as many as 1 and not in as many as 100 (%)
6.
SAMPLE PROCEDURE. (Sampling Technique).
- A sampling is a method such that when choosing a
small group of a population we can have a degree of probability that this small
group has the characteristics of the population that we are studying.
- The general population of the we want to obtain
conclusions we will choose random (random), to obtain the sample and from this
make inference of the entire population.
7.
TYPES OF SAMPLING.
·PROBABILISTIC
SAMPLING.
It is the method of extracting a part (or
sample) from a population, so that all possible samples of fixed size have the
same possibility of being sel
- Simple
Random. (It is the most reliable and equitable)
1. It is characterized
because each unit has the equitable probability of being included in the
sample:
• Lottery or raffle: We
assign a number to each member of the population, calculate the sample size and
randomly select that number. This type of method is not easy when the
population is very large.
• Random number table:
more economical and less time consuming. It is done when we have a computerized
list in a database of the study population.ected.
-
Systematic Random.
Similar to the simple random, where each
individual has the same probability of being selected.
-
Stratified.
It is characterized by the subdivision of the
study population into subgroups or strata, since the main variables to be
studied have some known variability or distribution that may affect the
results.
-
Conglomerate.
1. It is used when there is not a detailed and
enumerated list of each of the units that make up the sample and it is very
complex to elaborate it. In selecting the sample, the subgroups or sets of
units, conglomerates, are taken.
2. In this type of sampling the researcher does
not know the distribution of the variable.
NON-PROBABILISTIC SAMPLING.
- The random process is not followed.
- The sample can not be considered representative of a
population.
- It is characterized because the researcher selects the
sample following some criteria identified for the purposes of the study that
performs.
-Types:
1. By quotas: in
which the researcher selects the sample considering some phenomena or variables
to study, such as: Sex, race, religion, etc. (There is no randomness)
2. Accidental: is
to use for the study the people available at any given time, depending on what
is interesting to study. Of the three is the most deficient.
3. For convenience or
intentional. In which the investigated, decides according to its
objectives, the elements that integrate the sample.