Vistas de página en total

lunes, 29 de mayo de 2017

SEMINAR 4

SEMINAR 4 


In seminar 4, held on March 30, we focus on:

Recoding variables
• Calculations of central tendency and dispersion quantitative variables
• Frequency distribution and confidence interval calculations
• Calculation Graphing of sectors, bar charts and histograms
Hypothesis test contrast.

A) Chi Square
B) T Student
C) ANOVA test
D) Linear Regression


It is true that this seminar was ahead of schedule with regard to the large group classes because we were a little behind but this served to better understand what soon we were going to give. Therefore mentioning certain terms will suffice because in other related publications appear again and more broadly.


To begin with we have the measures of position, which as we know are the percentiles, deciles and quartiles. 

On the contrary in the measures of centralization we have the median, fashion and average; And finally within the dispersion measures is the variance, standard deviation and path or sampling range.

We also review the inferential statistics, the types of variables... But we can say that the most interesting was the tests of contrast of hypotheses:



• First, CHI QUADRADO, we learned to do it in paper but also in Epi Info to use it in the research work. Observing the results we will accept or not the null hypothesis. We use it in QUALITATIVE-QUALITATIVE.

• T DE STUDENT secondly used in QUALITATIVE-QUANTITATIVE cases; Like the previous case we can also insert in our program this table.

ANOVA TEST, for more than two variables.

LINEAL REGRESIONES, where we draw the clouds of points and as in all cases we do also in our program of work. QUANTITATIVE-QUANTITATIVE.

And this was everything we saw in seminar 4, in just two hours gave us time to look over each example but it was not until the large group classes when we finished consolidating the terms and formulas and above all to do more example in Role that ultimately is what serves us in this subject, practice.


SEMINARIO 4

SEMINARIO 4 

En el seminario 4, realizado el día 30 de marzo, nos centramos en:

  • Recodificación de variables
  •  Cálculos de medidas de tendencia central y dispersión variables cuantitativas
  •  Cálculos de distribuciones de frecuencia y de intervalos de confianza
  •  Cálculos y elaboración de gráficas de sectores, diagramas de barras e histogramas
  • Contraste de hipótesis mediante test.
a)      Chi Cuadrado
b)      T de Student
c)       Test ANOVA
d)      Regresión lineal

Es cierto que en este seminario se adelantó temario con respecto a las clases de grupo grande porque estábamos un poco atrasados pero esto nos sirvió para entender mejor lo que pronto íbamos a dar. Por tanto con mencionar ciertos términos serán suficientes porque en otras publicaciones relacionadas aparecen de nuevo y de forma más amplia. 

Para empezar tenemos las medidas de posición, que como sabemos ya son los percentiles, deciles y cuartiles. 



Por el contrario en las medidas de centralización tenemos la mediana, moda y media; y por último dentro de las medidas de dispersión se encuentra la varianza, desviación típica y recorrido o rango muestral. 


          Repasamos además la estadística inferencial, los tipos de variables… Pero podemos decir que lo más interesante fue las pruebas de contraste de hipótesis: 

·    En primer lugar, CHI CUADRADO, aprendimos a realizarlo en papel pero también en Epi Info para usarlo en el trabajo de investigación. Al observar los resultados aceptaremos o no la hipótesis nula. La utilizamos en CUALITATIVA-CUALITATIVA.

·    T DE STUDENT en segundo lugar que se usan en casos CUALITATIVA-CUANTITATIVA; como el caso anterior también podemos insertar en nuestro programa esta tabla.

·      TEST DE ANOVA, para más de dos variables.

·     REGRESIONES LINEALES, donde dibujamos las nubes de puntos y como en todos los casos lo hacemos también en nuestro programa de trabajo. 

Y esto fue todo lo que vimos en el seminario 4, en apenas dos horas nos dio tiempo a ver por encima cada ejemplo pero no fue hasta las clases de grupo grande cuando terminamos de consolidar los términos y formulas y sobre todo a hacer más ejemplo en papel que a fin de cuentas es lo que nos sirve en esta asignatura, la práctica. 


martes, 23 de mayo de 2017

LESSON 10

UNIT 10: HYPOTHESIS STATISTICS. HYPOTHESIS TEST.


1. CONTRIBUTIONS OF HYPOTHESIS

To control random errors, in addition to the calculation of confidence intervals, we have a second tool in the process of statistical inference: hypothesis tests or contrasts.

With the contrasts of the strategy hypothesis is as follows:
- We establish a priori a hypothesis near the value of the parameter.
- Performs the collection of data.
- We analyze the coherence between the previous hypothesis and the data obtained.

Tools to answer research questions: allows to quantify the compatibility between an established hypothesis and the results obtained.
Whatever the desires of the researchers, the hypothesis test will always contrast the null hypothesis.
Type of statistical analysis according to the type of variables involved in the study



2. HYPOTHESIS ERRORS.

The hypothesis test measures the probability of error that I make if I reject the null hypothesis.
With the same sample we can accept or reject the null hypothesis. Everything depends on an error, which we call α.

• The error α is the probability of mistakenly rejecting the null hypothesis.
• The smallest error at which we can reject H0 is the error p. (P is synonymous with minimized α)
We usually reject H0 for a maximum α level of 5% (p <0.05). Above 5% of error, we accept the null hypothesis. This is what we call "statistical significance".


3. TYPES OF ERRORS IN HYPOTHESIS TEST.



The most important error for us is the alpha type. We accept that we can be mistaken up to 5%.


4. CHI-SQUARE HYPOTHESIS TEST.

To compare qualitative variables (dependent and independent).





5. STUDENT TEST (comparison of means)

It is used when the independent variable is qualitative (dichotomous) and the dependent variable is continuous quantitative. It only serves to compare two groups.


6. JOINT STUDY OF TWO VARIABLES.

For this we collect the data in some tables:
· In each row we have the data of an individual. Each column represents the values ​​that a variable takes on them. Individuals are not displayed in any particular order.
· These observations can be represented in a scatter diagram. In them each individual is a point whose coordinates are the values ​​of the variables.


7. DISPERSION AND POINT CLOUD DIAGRAM.

If we have the heights and weights of x individuals represented in a scatter diagram I place them on a graph to observe the distribution they have since there is a RELATIONSHIP BETWEEN BOTH VARIABLES.




8. PREDICTION OF VARIABLES IN THE FUNCTION OF ANOTHER.

Apparently the weight increases X kg for each Y cm of height


9. SIMPLE LINEAR REGRESSION: CORRELATION AND DETERMINATION.

· It is a question of studying the linear association between two quantitative variables.
· Example: influence of age on systolic blood pressure
· Deterministic linear models: the independent variable determines the value of the dependent variable. Then for each value of the independent variable there would be only one value of the dependent.
· Probabilistic linear models: for each value of the independent variable there is a probability distribution of values ​​of the dependent, with a probability between 0 and 1.
· There is no deterministic model: there is a cloud of points and we look for the line that best explains the behavior of the dependent variable as a function of the independent variable.

· Correlation coefficient (Pearson and Speerman): Non-dimensional number (between -1 and 1) that measures the strength and the meaning of the linear relationship between variables,
R = ß1 x Sx / Sy

· Coefficient of determination: dimensionless number (between 0 and 1) giving idea of ​​the relationship between linearly related variables, is r2.





TEMA 10

TEMA 10: HIPÓTESIS ESTADÍSTICAS. TEST DE HIPÓTESIS.

1. CONTRASTES DE HIPÓTESIS

Para controlar los errores aleatorios, además del cálculo de intervalos de confianza, contamos con una segunda herramienta en el proceso de inferencia estadística: los test o contrastes de hipótesis..
Con los contrastes (test) de hipótesis la estrategia es la siguiente:
-          Establecemos a priori una hipótesis cerca del valor del parámetro.
-          Realizamos la recogida de datos.
-          Analizamos la coherencia entre la hipótesis previa y los datos obtenidos.
Son herramientas estadísticas para responder a preguntas de investigación: permite cuantificar la compatibilidad entre una hipótesis previamente establecida y los resultados obtenidos.
Sean cuales sean los deseos de los investigadores, el test de hipótesis siempre va a contrastar la hipótesis nula.

Tipo de análisis estadísticos según el tipo de variables implicadas en el estudio






2. ERRORES DE HIPÓTESIS.

El test de hipótesis mide la probabilidad de error que cometo si rechazo la hipótesis nula.
Con una misma muestra podemos aceptar o rechazar la hipótesis nula. Todo depende de una error, al que llamamos α.
·         El error α es la probabilidad de equivocarnos al rechazar la hipótesis nula.
·         El error α más pequeño al que podemos rechazar H0 es el error p. (p es sinónimo de α minimizada)
Habitualmente rechazamos H0 para un nivel α máximo del 5% (p< 0.05). Por encima del 5% de error, aceptamos la hipótesis nula. Es lo que llamamos “significación estadística”.


3. TIPOS DE ERRORES EN TEST DE HIPÓTESIS.



El error más importante para nosotros es el tipo alfa. Aceptamos que podemos equivocarnos hasta un 5%.


4. TEST DE HIPÓTESIS CHI-CUADRADO.

Para comparar variables cualitativas (dependiente e independiente).


5. TEST DE STUDENT (comparación de medias)

Se utiliza cuando la variable independiente es cualitativa (dicotómica) y la variable dependiente es cuantitativa continua. Solo sirve para comparar dos grupos.



6. ESTUDIO CONJUNTO DE DOS VARIABLES.

Para ello recogemos los datos en unas tablas:
·         En cada fila tenemos los datos de un individuo. Cada columna representa los valores que toma unas variables sobre los mismos. Los individuos no se muestran en ningún orden particular.
·         Dichas observaciones pueden ser representadas en un diagrama de dispersión. En ellos cada individuo es un punto cuyas coordenadas son los valores de las variables.



7. DIAGRAMA DE DISPERSIÓN Y NUBE DE PUNTOS.

Si tenemos las alturas y los pesos de x individuos representados en un diagrama de dispersión los coloco en una gráfica para observar la distribución que tienen ya que existe una RELACIÓN ENTRE AMBAS VARIABLES.





8. PREDICCIÓN DE UNA VARIABLES EN FUNCIÓN DE OTRA.

Aparentemente el peso aumenta X Kg por cada Y cm de altura





9. REGRESIÓN LINEAL SIMPLE: CORRELACIÓN Y DETERMINACIÓN.

·         Se trata de estudiar la asociación lineal entre dos variables cuantitativas.
·         Ejemplo: influencia de la edad en las cifras de tensión arterial sistólica
·         Modelos lineales deterministas: la variable independiente determine el valor de la variable dependiente. Entonces para cada valor de la variable independiente solo habría un valor de la dependiente.
·         Modelos lineales probabilísticos: para cada valor de la variable independiente existe una distribución de probabilidad de valores de la dependiente, con una probabilidad entre 0 y 1.
·         No hay modelo determinista: hay una nube de puntos y buscamos la recta que mejor explica l comportamiento de la variable dependiente en función de la variable independiente.

·         Coeficiente de correlación (Pearson y Speerman) : Número adimensional (entre -1 y 1) que mide la fuerza y el sentido de la relación lineal entre variables,
r = ß1 x Sx /Sy
·         Coeficiente de determinación: número adimensional (entre 0 y 1) que da idea de la relación entre las variables relacionadas linealmente, es r2.




domingo, 21 de mayo de 2017

LESSON 9

UNIT 9: INFERENTIAL STATISTICS: SAMPLING AND ESTIMATION


1. STATISTICAL INFERENCE
When we propose a study in the health field to establish relations between variables, our interest is usually not exclusively in the specific patients to whom we have access, but rather in all patients similar to these.
In inferring you never have the sure data of the entire population on which you deduce the results of a study carried out previously on the population that interests us, to infer always there is random error.

• To the group of patients about whom we want to study some question (draw conclusions) we call it study population.
• To the set of concrete individuals that participate in the study we call it sample.
• The number of individuals in the sample is called the sample size.
• To the set of statistical procedures that allow us to pass from the particular, the sample, to the general, the population, we call it statistical inference.
• To the set of procedures that allow to choose samples in such a way that they reflect the characteristics of the population we call Sampling techniques, this is done to avoid bias.

Whenever we work with samples, even if they are representative, we must assume a certain error.

• If the sample is chosen by a random procedure, that error can be evaluated. The sampling technique in this case is called probabilistic or random sampling and the error associated with that sample chosen at random is called a random error.
• In non-probabilistic sampling, it is not possible to evaluate the error.

• The larger the sample size, I favor the reduction of random error by probability.



2. STATISTICAL INFERENCE PROCESS

We have a study population, and the measure we want to get is called a parameter.
We make a random selection and obtain a sample, the measure of the study variable obtained in the sample, is called the estimator.
The process by which from the estimator, I approach the parameter is called inference.


3. STANDARD ERROR.
It is the measure that tries to capture the variability of the values ​​of the estimator.
The standard error of any estimator measures the degree of variability in the estimator values ​​in the different samples of a given size that we could take from a population.
The smaller the standard error of an estimator, the more we can rely on the value of a particular sample


STANDARD ERROR CALCULATION

It depends on each estimator:
- Standard error for a mean:


- Standard error for a ratio (relative frequency):


From both formulas, it follows that the larger the sample size, the lower the standard error.


4. THE CENTRAL THEOREM OF THE LIMIT
For estimators that can be expressed as the sum of sample values, the distribution of their values ​​follows a normal distribution with population mean and standard deviation equal to the standard error of the estimator in question. If you follow a normal distribution, follow the basic principles of this:

± 1S 68.26% of observations.
± 2S 95.45% of observations.
± 1.95S 95% of observations
± 3S 99.73% of observations.
± 2.58S 99% of observations.



5. INTERVALS OF CONFIDENCE:

·         They are a means of knowing the parameter in a population by measuring the error that has to do with chance (random error).
·         It is a pair of numbers such that, with a certain confidence level, we can ensure that the value of the parameter is greater or less than both numbers.
·         It is calculated considering that the sample estimator follows a normal distribution, as established by the central limit theory.
The greater the confidence we want to give to the interval, the longer it will be, ie the lower and the upper end of the interval will be more distanced, and therefore the interval will be less precise.
You can calculate confidence intervals for any parameter: arithmetic means, proportions, relative risks, odds ratio ...
In formulas each time we use proportion we express in the formula in as many as 1 and not in as many as 100 (%)

6. SAMPLE PROCEDURE. (Sampling Technique).

- A sampling is a method such that when choosing a small group of a population we can have a degree of probability that this small group has the characteristics of the population that we are studying.

- The general population of the we want to obtain conclusions we will choose random (random), to obtain the sample and from this make inference of the entire population.

7. TYPES OF SAMPLING.

  • ·         PROBABILISTIC SAMPLING.

It is the method of extracting a part (or sample) from a population, so that all possible samples of fixed size have the same possibility of being sel

- Simple Random. (It is the most reliable and equitable)

1. It is characterized because each unit has the equitable probability of being included in the sample:
• Lottery or raffle: We assign a number to each member of the population, calculate the sample size and randomly select that number. This type of method is not easy when the population is very large.
• Random number table: more economical and less time consuming. It is done when we have a computerized list in a database of the study population.ected.

- Systematic Random.
Similar to the simple random, where each individual has the same probability of being selected.

- Stratified.
It is characterized by the subdivision of the study population into subgroups or strata, since the main variables to be studied have some known variability or distribution that may affect the results.

- Conglomerate.
1. It is used when there is not a detailed and enumerated list of each of the units that make up the sample and it is very complex to elaborate it. In selecting the sample, the subgroups or sets of units, conglomerates, are taken.
2. In this type of sampling the researcher does not know the distribution of the variable.


NON-PROBABILISTIC SAMPLING.

- The random process is not followed.
- The sample can not be considered representative of a population.
- It is characterized because the researcher selects the sample following some criteria identified for the purposes of the study that performs.

-Types:

1. By quotas: in which the researcher selects the sample considering some phenomena or variables to study, such as: Sex, race, religion, etc. (There is no randomness)

2. Accidental: is to use for the study the people available at any given time, depending on what is interesting to study. Of the three is the most deficient.

3. For convenience or intentional. In which the investigated, decides according to its objectives, the elements that integrate the sample.




8. SAMPLE SIZE.

The size of the sample to be taken will depend on

- Standard error.
- Variations of the variable to be studied.
- The size of the study population.