Predictive Analytics For Dummies (Anasse Bari, 2013)
- Content-based filtering: credibility, sparsity, inconsistency
- System recommendation measurement: precision - how accurate the recommendation, recall - set of possible good recommendation
- Groups of customers: persuadables, sure things, lost causes, do not disturb
- Structured data: organised, formally defined, easy to access and query, lower availability, efficient to analyse. Unstructured data: scattered and dispersed, free-form, hard to access and query, higher availability, additional preprocessing is needed
- Attitudinal data: how the customer feels about something. Behavioural data: sales transaction. Demographic data: personal information
- Data-driven data: no prior knowledge, broad use of data-mining tools, suited for large-scale data, open scope, needs verification of results, uncovers patterns and associations. User-driven data: in-depth domain knowledge, specific design for analysis and testing, can work on smaller datasets, limited scope, easier adoption of analysis results, may miss hidden patterns and associations
- Patterns: separation, alignment, cohesion
- Fruit basket example: past patterns > categories > bias mode > percentage matching > confirm or deep search
- Identifying groups in data: k-means clustering algorithm
- Calculate similarity: Euclidean distance
- Density-based algorithm: density-based spatial clustering of applications with noise (DBSCAN)
- Data mining in association rules: Apriori algorithm
- Data classification to predict the future: decision trees
- Entropy decision formula
- Data classification algorithm: support vector machine (SVM)
- Neural networks - process past and current data to estimate future values
- Data classification on probabilistic analysis: Naive Bayes classification algorithm & Naive Bayes’ Theorem
- Boost prediction accuracy: ensemble method
- Statistical model: Markov Model
- Linear regression - statistical model to analyse and find the relationship between two variables
- Underfitting: can’t detect any relationship in the data; overfitting: no predictive power and extra noise