**SECTION – A**

## 1. Explain the meaning of descriptive statistics and describe organisation of data.

**Ans. **Descriptive statistics are brief informational coefficients that summarize a given data set, which can be either a representation of the entire population or a sample of a population. Descriptive statistics are broken down into measures of central tendency and measures of variability (spread).

Measures of central tendency include the mean, median, and mode, while measures of variability include standard deviation, variance, minimum and maximum variables, kurtosis, and skewness.

Descriptive statistics summarizes or describes the characteristics of a data set.

Descriptive statistics consists of three basic categories of measures: measures of central tendency, measures of variability (or spread), and frequency distribution.

Measures of central tendency describe the center of the data set (mean, median, mode).

Measures of variability describe the dispersion of the data set (variance, standard deviation).

Measures of frequency distribution describe the occurrence of data within the data set (count).

Descriptive statistics, in short, help describe and understand the features of a specific data set by giving short summaries about the sample and measures of the data. The most recognized types of descriptive statistics are measures of center: the mean, median, and mode, which are used at almost all levels of math and statistics. The mean, or the average, is calculated by adding all the figures within the data set and then dividing by the number of figures within the set.

For example, the sum of the following data set is 20: (2, 3, 4, 5, 6). The mean is 4 (20/5). The mode of a data set is the value appearing most often, and the median is the figure situated in the middle of the data set. It is the figure separating the higher figures from the lower figures within a data set. However, there are less common types of descriptive statistics that are still very important.

People use descriptive statistics to repurpose hard-to-understand quantitative insights across a large data set into bite-sized descriptions. A student’s grade point average (GPA), for example, provides a good understanding of descriptive statistics. The idea of a GPA is that it takes data points from a wide range of exams, classes, and grades, and averages them together to provide a general understanding of a student’s overall academic performance. A student’s personal GPA reflects their mean academic performance.

Measures of central tendency focus on the average or middle values of data sets, whereas measures of variability focus on the dispersion of data. These two measures use graphs, tables and general discussions to help people understand the meaning of the analyzed data.

Measures of central tendency describe the center position of a distribution for a data set. A person analyzes the frequency of each data point in the distribution and describes it using the mean, median, or mode, which measures the most common patterns of the analyzed data set.

Measures of Variability

Measures of variability (or the measures of spread) aid in analyzing how dispersed the distribution is for a set of data. For example, while the measures of central tendency may give a person the average of a data set, it does not describe how the data is distributed within the set.

So while the average of the data maybe 65 out of 100, there can still be data points at both 1 and 100. Measures of variability help communicate this by describing the shape and spread of the data set. Range, quartiles, absolute deviation, and variance are all examples of measures of variability.

Consider the following data set: 5, 19, 24, 62, 91, 100. The range of that data set is 95, which is calculated by subtracting the lowest number (5) in the data set from the highest (100).

In descriptive statistics, univariate data analyzes only one variable. It is used to identify characteristics of a single trait and is not used to analyze any relationships or causations.

For example, imagine a room full of high school students. Say you wanted to gather the average age of the individuals in the room. This univariate data is only dependent on one factor: each person’s age. By gathering this one piece of information from each person and dividing by the total number of people, you can determine the average age.

Bivariate data, on the other hand, attempts to link two variables by searching for correlation. Two types of data are collected, and the relationship between the two pieces of information is analyzed together. Because multiple variables are analyzed, this approach may also be referred to as multivariate.

Let’s say each high school student in the example above takes a college assessment test, and we want to see whether older students are testing better than younger students. In addition to gathering the age of the students, we need to gather each student’s test score. Then, using data analytics, we mathematically or graphically depict whether there is a relationship between student age and test scores.

Descriptive statistics have a different function than inferential statistics, data sets that are used to make decisions or apply characteristics from one data set to another.

Imagine another example where a company sells hot sauce. The company gathers data such as the count of sales, average quantity purchased per transaction, and average sale per day of the week. All of this information is descriptive, as it tells a story of what actually happened in the past. In this case, it is not being used beyond being informational.

Let’s say the same company wants to roll out a new hot sauce. It gathers the same sales data above, but it crafts the information to make predictions about what the sales of the new hot sauce will be. The act of using descriptive statistics and applying characteristics to a different data set makes the data set inferential statistics. We are no longer simply summarizing data; we are using it predict what will happen regarding an entirely different body of data (the new hot sauce product).

## 2. Explain the concept of normal curve with help of a diagram. Explain the characteristics of normal probability curve.

**Ans. ****Computation of Normal Probability Curve:**

If a coin is tossed unbiased it will fall either head (H) or tail (T). This the probability of appearing a head is one chance in two. So the probability ratio of H is ½ and T is ½.

Likewise of we shall toss two coins, coin x and coin y there are four possible ways of falling.

Thus the four possible ways are-both x and y may fall H, x may fall T and y H, x may fall H and yT or both may fall T.

Expressed in ratios

Probability of two heads = ¼

Probability of two tails = ¼

Probability of one H and one T = ¼

Probability of one T and one H = ¼

Thus the ratio is ¼ + ½ + ¼ = 1.00

**The expected appearance of heads and tails of two coins can be expressed as:**

(H + T)^{2} = H^{2} + 2HT + T^{2}

If we shall increase the number of coins to three i.e. x, y and Z, there can be eight possible arrangements.

**The expected appearance of heads and tails of coins can be expressed as:**

In this way we can determine the probability of different combinations of heads and tails of any number of coins. We can obtain probability of any number of coins by binomial expansion. An expression containing two terms is called a binomial expression. Binomial theorem is an algebraic formula which expands the power of a binomial expression in the form of a series.

**The formula reads like this:**

(H + T)^{n} = C (n, 0) H^{n} + C (n, 1) H^{n-1} T + C (n, 2) H^{(n-2)} T^{2} ….

… +C(n,r) H^{n-r} T^{r} + …. + C(n,n)T^{n}… (11.1)

Where C = Possible combinations.

C(n,r) = n!/r! (n – r)!

n! means 1 x 2 x 3 x …. x n

n = Total number of observations or persons.

r = umber of observations or persons taken at a time.

**If the above data are plotted on a graph as histogram and frequency polygon it will be as below **

Thus the figure we obtained from toss of 10 coins (H + T)^{10} is a symmetrical many sided polygon.

**And if we shall go on increasing the number of coins, with each increase the polygon would exhibit a perfectly smooth surface line the figure-11.2 given below:**

This bell shaped curve is called as the ‘Normal Probability Curve’. Thus the **“graph of the probability density function of the normal distribution is a continuous bell shaped curve, symmetrical about the mean”** is called normal probability curve.

**In statistics it is important because:**

(а) It is the distribution of many naturally occurring variables, such as intelligence of 8th grade students, height of the 10th grade students etc.

(b) The distribution of the means of samples drawn from most parent populations is normal or approximately so when the samples are sufficiently large.

Therefore normal curve has great significance in social sciences and behavioural sciences. In behavioural measurement most of the aspects approximates to the normal distribution. So that Normal Probability Curve or most popularly known as NPC is used as a reference curve. In order to understand the utility of the NPC we must have to understand the properties of the NPC.

**Some of the major characteristics of normal probability curve are as follows:**

**The curve is bilaterally symmetrical.**

The curve is symmetrical to its ordinate of the central point of the curve. It means the size, shape and slope of the curve on one side of the curve is identical to the other side of the curve. If the curve is bisected then its right hand side completely matches to the left hand side.

**The curve is asymptotic:**

The Normal Probability Curve approaches the horizontal axis and extends from-∞ to + ∞. Means the extreme ends of the curve tends to touch the base line but never touch it.

**The Mean, Median and Mode:**

The mean, Median and mode fall at the middle point and they are numerically equal.

**The Points of inflection occur at ± 1 Standard deviation unit:**

The points of influx in a NPC occur at ± 1σ to unit above and below the mean. Thus at this point the curve changes from convex to concave in relation to the horizontal axis.

**The total area of NPC is divided in to ± standard deviations:**

The total of NPC is divided into six standard deviation units. From the center it is divided in to three +ve’ standard deviation units and three —ve’ standard deviation units.

Thus ± 3σ of NPC include different number of cases separately. Between ± 1σ lie the middle 2/3rd cases or 68.26%, between ± 2σ lie 95.44% cases and between ± 3σ lie 99.73% cases and beyond + 3σ only 0.37% cases fall.

**The Y ordinate represents the height of the Normal Probability Curve:**

The Y ordinate of the NPC represents the height of the curve. At the center the maximum ordinate occurs. The height of the curve at the mean or mid point is denoted as Y_{0}.

**In order to determine the height of the curve at any point we use the following formula:**

**It is unimodal:**

The curve is having only one peak point. Because the maximum frequency occurs only at one point.

**The height of the curve symmetrically declines:**

The height of the curve decline to both the direction symmetrically from the central point. Means the M + σ and M — σ are equal if the distance from the mean is equal.

**The Mean of NPC is µ and the standard deviation is σ:**

As the mean of the NPC represent the population mean so it is represented by the µ (Meu). The standard deviation of the curve is represented by the Greek Letter, σ.

**In Normal Probability Curve the Standard deviation is the 50% larger than the Q:**

In NPC the Q is generally called the probable error or PE.

**The relationship between PE and a can be stated as following:**

1 PE = .6745σ

1σ = 1.4826PE.

**Q can be used as a unit of measurement in determining the area within a given part:****The Average Deviation about the mean of NPC is .798σ:**

There is a constant relationship between standard deviation and average deviation in a NPC.

**The model ordinate varies increasingly to the standard deviation:**

In a Normal Probability curve the modal ordinate varies increasingly to the standard deviation. The standard deviation of the Normal Probability Curve increases, the modal ordinate decreases and vice-versa.

**Applications of Normal Probability Curve:**

**Some of the most important applications of normal probability curve are as follows:**

The principles of Normal Probability Curve are applied in the behavioural sciences in many different areas.

**NPC is used to determine the percentage of cases in a normal distribution within given limits:**

**The Normal Probability Curve helps us to determine:**

- What percent of cases fall between two scores of a distribution.
- What percent of scores lie above a particular score of a distribution.

iii. What percent of scores lie below a particular score of a distribution.

## 3. The scores obtained by four groups of employees on occupational stress are given below. Compute ANOVA for the same.

**SECTION – B **

## 4. Discuss the assumptions of parametric and nonparametric statistics.

## 5. Using Spearman’s rank order correlation for the following data:

**6. Describe various levels of measurement with suitable examples.**

**7. Explain Kruskall- Wallis ANOVA test and compare it with ANOVA. **

**8. Compute Chi-square for the following data:**

**SECTION – C **

**9. Type I and type II errors. **

**10. Skewness and kurtosis. **

**11. Point and interval estimations. **

**12. Null hypothesis**

**13. Scatter diagram**

**14. Outliers **

**15. Biserial correlation **

**16. Variance **

**17. Interactional effect**

**18. Wilcoxon matched pair signed rank test. **

