Tumgik
Text
Linear Regression
It is a statistical relationship that allows us to summarize and study relationships between two or more continuous variables(quantitative). It also determines the change in a dependent variable is associated with, and depends on a, change in one or more independent variables. We can predict scores of one variable from the scores on a second variable. The variable which is predicted is usually called a response variable, criterion variable or dependent variable. The predictor variable(s) which helps in the prediction process is called an independent variable.
Each of these values are also shown and calculated by the Following expression of Linear Regression
y = bx + a + ε
where:
·        x is an independent variable
·        y is a dependent variable.
·        a is the Y-intercept, which is the expected mean value of y when all x variables are equal to 0. On a regression graph, it’s the point
·        where the line crosses the Y-axis.
·        b is the slope of a regression line, which is the rate of change for y as x
·        ε is the random error term, which is the difference between the actual value of a dependent variable and its predicted value.
When there is only one predictor variable, the prediction method is called simple regression. In simple linear regression, the topic of this section, the predictions of Y when plotted as a function of X form a straight line.
0 notes
Photo
Tumblr media
A Statistician’s Ten steps for data quality management. Identify and agree regarding the system implemented meta data vs. business logic supporting meta data, every time you receive data.  Always ask for a data dictionary which is managed by the IT department.  Also, ask for first and the last 10 records of the data that are being delivered.
1.     Ask for data to be delivered in a particular format (CSV, TXT with special separation character, EXCEL, or Other database forms, SAS, SPSS, DB2, … ) that you are very familiar to handle.  Over a long period of experience, I found it easier if the data is delivered in fixed format text form.  Yet, it is much easier if there is an automation that would create what is called ‘Data Audit Report’ for analysts to have a quick look at the delivered data and communicate with the data delivery team on the quality of the data.
2.     Make sure you can read the data and output the top 10 and bottom 10 records.  Visually read the sample data for each of the variables and make sure it matches with the data promised to have been delivered to you by the IT department.
3.     Check to see whether total number of observations sent by the provider and the total number of observations received are the same.
4.     How are the numeric elements coded? Numeric or character?
5.     If a field is a numeric element, find out (1) is it Integer or not, (2) Min, (3) Max, and (4) Number OF Missing values for numerical variables.  Check out the equivalence of full list of alpha (character) values along with number of missing for alpha variables
6.     Check for all consistency checks in the data that exist among variables.  For example, if there is a total revenue and also revenue by product groups, make sure the sum of the product group revenues is same as total revenue, after checking with business/IT managers that such a consistency check exist or not.  This is a tricky part. Because there are so many ways you can identify the consistency checks.  Identify the quick major ones and check it out.
7.     The Data Audit Report should also have distributions of each of the variable.  If a variable is a numeric variable, use quintiles or deciles to see the distribution.  If a variable is a character variable, use the occurrences of each of the characters.
8.     Make sure weights are provided if there is a sample survey or if sample is taken from a population.  If weights are not provided create a weighting system using an available auxiliary variable that is available for the full population.
9.     If the data is provided for a predictive model, make sure you are selecting the right reference population when modeling the target population.  It is not the whole US population list whether it is B2B or B2C application.
10. Missing value distributions (missed or not) should also be covered in any communication with the IT department so that re-orienting the processes for better capture of data can be implemented.
0 notes
Text
The future for sure through a career in  Big Data / Data Analytics / Data Science!
Every business decision involves key information.  Hitherto, information was sought for various functional aspects from the respective departments of an organization.   MIS played a key role in the decision-making process.  With the advent of advanced computerization and Information Technology, the functions were integrated at the organization level through ERP, at the supply level (both procurement and distribution) through SCM and at the customer level through CRM solutions.  The internet and mobile technology also provided a great platform for e-commerce and internet-based transactions.  This generated a huge volume of data, variety of data - both structured and unstructured data such as text, image, audio, video and sensor data and all of them coming at great velocity.  All of these attributes, of volume, variety, and velocity came to be called as Big Data.  
Essentially, the study of Big Data enables one to become a Big Data Scientist who would help solve problems and also enable decision making using Data Science or Data Analytics.  For students and professionals who want to make it big in their career should consider the option of becoming a Big Data Scientist.  For this purpose, one may enroll in a Data Science course in Chennai, Mumbai, Bengaluru and other places.  They have the option of undergoing  Data Analytics training in Chennai, Mumbai, Bengaluru and other places.  A Data Science course in Chennai or Data Analytics training in Chennai or other places in India provide a good opportunity for growth in their career.  Most of the business houses today and also government agencies look for skilled persons to be employed as Big Data Scientist who have undergone Data Science course in Chennai or Data Analytics course in Chennai or elsewhere. The future is sure through a career in Big Data / Data Analytics / Data Science.
1 note · View note