Formal Science Directory
Formal Science // Computation

DATA SCIENCE &
MACHINE LEARNING

Extracting knowledge from noise. Data science utilizes mathematics, statistics, and computer algorithms to uncover hidden patterns in massive datasets, turning raw information into predictive power.

The Data Pipeline

Before a model can predict the future, data must be cleaned and transformed. Real-world data is messy, incomplete, and highly unstructured.

[Image of the Data Science Lifecycle]
"Data is the new oil. It’s valuable, but if unrefined it cannot really be used." — Clive Humby

Unsupervised Learning

In Supervised Learning, we train models using labeled data (e.g., thousands of pictures explicitly tagged as "cats"). In Unsupervised Learning, we throw raw, unlabeled data at an algorithm and ask it to find the structure itself.

The K-Means Clustering algorithm does this by scattering $k$ "centroids" into the data space. It then mathematically groups points based on Euclidean distance:

d(p,q)=(pxqx)2+(pyqy)2d(p, q) = \sqrt{(p_x - q_x)^2 + (p_y - q_y)^2}

K-Means Clustering

Unsupervised Learning

Awaiting Data Generation...