One Inflated Binomial Distribution and its Real-Life Applications

Objective: To introduce a one-inflated Binomial distribution (OIBD) and discuss its applications.Methods: Study its distributional properties, reliability characteristics, and estimation of its parameters using the method of moment estimation (MM) andmaximum likelihood estimation (MLE). A simulation study has been conducted to see the behaviour of the MLEs. Two real-life examples are used to examine the pertinent of the proposed distribution. Findings : The proposed one-inflated binomial distribution (OIBD) provides better fitting in terms of AIC, BIC, and KS test comparison to the other known distributions. Novelty: Develop a new statistical distribution to study the count data having inflated frequency at count one, along with the different statistical properties. The practical utility of the distribution is also discussed with real-life examples.


Introduction
It has been observed that count data with a surplus amount of zero's, one's, two's, three's, etc are common in studies related to health, insurance, agriculture, etc. (1,2) . These inflated frequencies may create trouble in data analysis due to over-dispersion at a particular count. The already existed distributions developed for studying count data may not be suitable in the presence of inflated frequency. Hence, inflated models are generated to overcome such situation and to overcome workable irregularity in parameter estimation (1) . The inflated distribution is a mixture between a point mass at a particular count and any other count distribution supported by non-negative integers (3) .
To overcome the situation of surplus zero's Neyman (4) and Feller (5) first introduced the idea of zero inflation. The structural properties and MLE's of inflated discrete distribution inflated at zero was studied by Gupta et al. (6) The results of Gupta et al. (6) were extended by Murat and Szynal (7) . They studied that discrete distributions may inflate at any point say. Lambert (8) proposed a zero-Inflated Poisson regression (ZIP) model with surplus zeroes with an example of a manufacturing defect. Ridout et al. (9) furnishes an aspect of zero-inflated counts on creating contraption of happening as well as acceptable designing structure by citing examples of sexual behaviour and species abundance. By taking on the methodology of Lambert's (8) ZIP regression model, Hall (10) derived a zero-Inflated Binomial Model. Two-inflated binomial distribution was used by Singh et al. (11) to investigate the mechanism of son preference through the modelling of the pattern of male children in Uttar Pradesh, where family size and sex composition are dominated by strong son preference. The parameters of a zero-inflated Poisson distribution were estimated by Beckett et al. (12) and they modelled some natural calamities data using these parameters. Beckett et al. (12) also juxtapose MLEs and MMEs regarding Standardized bias and standardized mean squared error. Zero-inflated binomial distribution was characterized by Najundan et al. (13) ; Zero-inflated negative binomial distribution was characterized by Suresh et al. (14) ; Zero-inflated Poisson distribution was characterized by Najundan et al. (15) . Alshkaki (16) extended the zero-inflated Poisson distribution to zero-one-inflated Poisson distribution and also studied its structural properties and estimates its parameters by method of maximum likelihood and method of moments. Mwalili et al. (17) studied a zero-inflated negative binomial model to gratify extravagant zeros, an extension of negative binomial distribution. Alshkaki (18) studied the structural properties and estimated the parameters by the method of moments of zero-one inflated negative binomial distribution. Sakthivel and Rajitha (19) proposed a probability based inflation estimator for zero-inflated Poisson model, which will be helpful for inferences about the inflation parameter. Jornsation and Bodhisuwan (20) proposed a zero-one inflated negative binomial-beta exponential distribution along with its distributional properties and estimate its parameters using method of maximum likelihood.
The Binomial distribution is a well-known non-negative integer-valued discrete distributions. While sampling binomial data, it is often observed that either the number of counts of zero or One or two has a higher frequency than that expected, which can be explained by an appropriate inflated distribution, say, zero-inflated or one-inflated or two-inflated binomial distribution.
The probability distribution of the number of successes, so obtained is called the Binomial Probability distribution. The binomial distribution is a discrete distribution as X can take only the integral values, viz. 0,1,2,3,4…,n A random variable X is said to follow binomial distribution if it assumes only non-negative values and its probability mass function is

It is denoted by B(n, p)
The Zero-Inflated Binomial Distribution (ZIBD) was proposed by Hall (10) . Let X ∼ B(n, p) as given in (1), let α ∈ (0, 1) be an extra proportion added to the proportion of zero of the random variable X, then the random variable X defined by is said to have a zero-inflated binomial distribution and it is denoted by X ∼ ZIBD(n, p, α) Note: If α → 0, then the above distribution reduces to standard binomial distribution (1). The Two-Inflated Binomial Distribution (TIBD) was proposed by Singh et al. (11) . Let X ∼ B (n, p) as given in (1), let α ∈ (0, 1)be an extra proportion added to the proportion of two of the random variable X, then the random variable X defined by Where 0 < α < 1, 0 < p < 1, is said to have a two inflated binomial distribution and it is denoted by X ∼ T IBD(n, p, α) Note: If α → 0, (3) reduces to standard binomial distribution (1).
Here in this paper, the researchers propose a one-inflated binomial distribution along with its distributional properties, reliability characteristics and consider the method of moment estimation (MM) and maximum likelihood estimation (MLE) to estimate its parameters. A simulation study has been conducted to see the behaviour of the MLEs. Here two real-life data sets are used to examine the pertinent of the proposed distribution. https://www.indjst.org/

One Inflated Binomial Distribution (OIBD)
Let X ∼ B (n, p) as given in (1), let α ∈ (0, 1)be an extra proportion added to the proportion of one of the random variable X, then the random variable X defined by Where 0 < α < 1, 0 < p < 1, is said to have one-inflated binomial distribution and in the rest of the article it will be denoted by OIBD(n, p, α) Some particular cases: When 1. α → 0, OIBD(n, p, α) reduces to B(n, p).
The pmf plots of OIBD (n, p, α) with different choice of parameters values of n, p and α to study the variety of shapes are provided in Figure 1.

Moments
Theorem 1: If X ∼ OIBD(n, p, α), then its r th order moments about zero is as follows Proof: If X ∼ OIBD(n, p, α) then the r th order moments about zero is In particular, the first four moments of OIBD (n, p, α) can be obtained as Therefore,  From the Figure 2 it is clear that as α decreases and p and n increases, the mean of the proposed distribution increases. From the Figure 3 it is clear that as α decreases, the variance of the proposed distribution decreases. https://www.indjst.org/

Coefficient of Kurtosis
If X ∼ OIBD(n, p, α) the Pearson's β 2 coefficient is as follows The plots of coefficient of Kurtosis of the proposed distribution for different choice of parameters are shown in the Figure 5 From the Figure 5 it is observed that 1. As p ≥ 0.5 and α = 0.1, β 2 > 3 for large n 2. As α > 0.1, β 2 < 3 for large n Remark 2 1. As α → 0, 0 < p < 1, then the coefficient of kurtosis β 2 → 3 for large n i.e. the proposed distribution tends to normal.

Remark 3
Putting S = e t in equation (13), the Moment Generating Function (m.g.f), M x (t) of OIBD(n, p, α) is as follows

Remark 4
Putting S = e t in equation (13), the Characteristic Function, φ x (t) of OIBD(n, p, α) is as follows

Cumulative Distribution Function (CDF)
Theorem 3: If X ∼ O IBD(n, p, α), then its CDF of X is as follows , then its CDF is as follows The plots of CDF of OI BD(n, p, α) with different choice of parameters n, p and α are provided in Figure 6. https://www.indjst.org/ Proof: If X ∼ O IBD(n, p, α), then its Survival Function (SF) is as follows Where I p (x, n − x + 1) is an incomplete beta function.

Failure Rate (FR)
Let x 1 , x 2 , x 3 , . . . , x n be a random sample from one-inflated binomial distribution as given by (4) https://www.indjst.org/ Define Y be the number of X i s taking the value 1. Then equation (4) can be inscribe as follows and using s(x) from equation (14) The failure rate (FR) of OIBD(n, p, α) is given by The plots of Failure Rate (FR) of OIBD(n, p, α) with different choice of parameters n, p and α are provided in Figure 8.

Method of Moment Estimation (MM)
The parameters p and α of (4) can be obtained using the method of moments as follows: Considering the first two moments from Equations (6) and (7) From Equations (6) From Equation (7) µ https://www.indjst.org/ Then, Solving the quadratic equation (17), we can estimate the value of p. The value of p have been used in equation (15) to estimate the value of α

Maximum Likelihood Estimation (MLE)
The parameters p and α of (4) can be obtained using the method of maximum likelihood as follows: Let x 1 , x 2 , x 3 , . . . , x n be a random sample from one-inflated binomial distribution as given by (4) Where b i = 1 − a i and n 0 = ∑ n i=1 a i . Note that n 0 represents, respectively the number of one's in the sample. Therefore Now, letting ∂ ∂ α log L = 0 , from Equation (18) that Setting ∂ log L ∂ p = 0 , from Equation (19) and using Equation (21) n 0 Now, if we replace p 1 by their sample relative frequencies, which is by their sample estimate, the proportion of one's in the sample, i.e.p 1 = n 0 /n , respectively then Equations (20) and (22) reduce to and Using Equation (23), Equation (24) reduces to (25) Where Hence Equation (25) Therefore, the maximum likelihood estimates (MLE) of the parameters p and α are given by solving Equation (25) numerically to findp, with α given by Equation (26) respectively.
When likelihood functions have convoluted frames, Expectation Maximization (EM) algorithm can be used also as a simplest alternative method for finding the MLEs of the parameters.
For the reckoning of the asymptotic variance covariance matrix of the estimates the second order differentiations of the log-likelihood function are furnish here The asymptotic variance covariance matrix of the maximum likelihood estimators of p and α for OIBD(n, p, α), can be acquired by inverting the Fisher information matrix (I), given by The ingredient of the above Fisher information matrix can be acquired as The asymptotic distribution of the maximum likelihood estimator (p,α) is given by , as n → ∞

Simulation study
In this section a simulation study has been conducted to see the performance of the estimated parameters. Here, to generate random numbers X from OI BD(n, p, α) = OI BD(20, p, α) we have applied acceptance rejection sampling (21) . By applying this method random samples are generated of size n=100 and 200 with different combination of true values of parameters p and α and finally, MLEs are computed using EM algorithm of R software. Bias and MSE of the parameters given in the Table 1 are https://www.indjst.org/ calculated using the following formulae.
Here, r(= mumber of replication ) = 1000 From the values of the MSE and biases of the simulation study given in Table 1, it is observed that as the sample size increase the estimated bias an MSE are also gradually decreases which is as expected.

Real-life examples
The researchers illustrate the application of OIBD with real data set with the inflated count of one and compare it with binomial distribution and ZIBD. The data set consists of the sex composition of child to mothers with parity two in Assam and Meghalaya of India. The data used to fit the model are taken from NFHS-IV conducted in all the states and Union territories of India during 2015-2016 (dhsprogram.com). The prime reason for selecting the two states is the variation in the prevailing social system. Assam being a patriarchal society, a strong son preference exists (22) ; whereas, the social system of Meghalaya is based on matriarchal where preference for daughters is observed (23) . Past research showed that sex preference has a positive effect on fertility and contraceptive practices (24)(25)(26) . Studies also showed that it is imperative to have one son to continue family in India with a patriarchal family system (27) . Considering these facts, for Assam and Meghalaya, the birth of male and female child are studied with mothers of parity 2. Singh et al. (11) show that sex preference does not exist among mothers with higher parity, that https://www.indjst.org/ is why, in this study, mothers with lower parity is considered to study gender preference. The study considers the women who had ever been married in the age group 15-49 and also the birth interval of the considered women is greater than five years (i.e. who completed their family size). Tables 2 and 3 shows the number of son and daughters born to mothers in the states of Assam and Meghalaya. The count one is inflated in both states, so OIBD is fitted and compared with Binomial distribution and ZIBD. Here we fitted our proposed distribution, i.e. one-inflated binomial distribution (OIBD) along with Binomial distribution (BD) and Zero-inflated binomial distribution (ZIBD) (10) . The values of log-likelihood, Akaike information criterion (AIC) (28) , Bayesian information criterion (BIC) (28) and the Kolmogorov-Smirnov test (KS test) with p-values are summarized in Tables 2  and 3 for the two states. AIC and BIC are the model selection criteria and KS test for goodness of fit.
From the tables it is seen that the values of KS test, AIC and BIC of OIB distribution is smaller than BD and ZIB distribution and the expected frequencies of OIBD is closed to the observed frequencies. From Table 2, it can be observed that the proposed OIBD provides better fitting to the son preference data of Assam with inflated count of one (p-value > 0.05) whereas the Binomial distribution and ZIBD does not fits to the data. In case of Meghalaya, all the three distributions fits well with the daughter preference data with inflated count of one (Table 3), but OIBD has the highest p-value of KS test, indicating that it fits better than the other distribution.

Conclusion
In this study, the one-inflated binomial distribution (OIBD) is introduced and studied its distributional properties and reliability characteristics. The parameters are estimated using the method of maximum likelihood estimators. A simulation study has been conducted to see the behaviour of the MLEs. The appropriateness of the fitting distribution is carried out based on the goodness of fit test and some information criteria. To the real-life data set having a higher frequency of count one, it can be observed that the proposed one-inflated binomial distribution (OIBD) provides better fitting than other competitor distributions.
https://www.indjst.org/ A future prospect of this study may be to proposed an OIBD regression model. This study is centered on only the mixture of over dispersed count data for one as a point mass and binomial distribution i.e. one-inflated binomial distribution (OIBD). There are many other count distributions suitable for one-inflated version, which can be done for future research.