Data science has recently become a hot field in computing. Data science involves using the computing power of computers to process data, extract information from data, and then form “knowledge”. It has influenced branches of computing such as computer vision, signal processing, and natural language recognition. Data science has been widely used in IT, finance, medicine, autonomous driving and other fields. (If you’re familiar with the CIA’s Prism leak, you’ll see that data science is already widely used in intelligence.)
In this series of articles, we hope to complete the entire chain of data analysis from probability theory, statistics, to machine learning. Data processing in the traditional sense is achieved by statistical methods, and probability theory is the basis of statistics. With the enhancement of computer processing power, some data analysis methods that require a lot of computation have been developed rapidly. Machine learning is actually a hybrid, including some algorithms developed in the computer field, and some statistical methods that already exist in traditional statistics but are limited by computational power. On the other hand, extracting knowledge from data is the main purpose of machine learning, which is closely related to statistical inference. Therefore, starting from traditional probability and statistics, it is easier to understand the connotation of machine learning.
Of course, the difficulty with doing this is that there is a lot to cover. Rigorous narratives can sometimes seem boring. We will try our best to introduce practical programming examples so that we can develop a better sense of touch. The programming tools will be based on the Python language , with third-party packages such as Numpy , Scipy , Matplotlib , scikit-learn . Statistics and machine learning can also be implemented in other languages, such as Matlab and R. If you are familiar with the corresponding tools, it is not difficult to write code with similar functions.
Probability Theory
Variance and Standard Deviation
Covariance and Correlation Coefficient
Moments and Moment Generating Functions
Math and Programming: “Probability Theory” Summary
Statistics Basics
Parameter Estimation
interval estimation
hypothetical test
Linear regression
ANOVA
No-parameter estimation
Bayesian method
Multivariate Data
Linear Algebra 01 Linear Brain
PCA analysis
Timing Analysis
Machine Learning
Clustering Algorithm
Neural Networks
Markov chain
drawing tools
1) matplotlib:
Anatomy of the core of matplotlib
Reference books
see bean column