Training Resources banner

Data Took Kit - Glossary

If you are new to the data scene, it can be overwhelming to know where to start. Here we explain some foundational data concepts. If you do encounter jargon or terminology with which you aren’t familiar, please see the Glossary for definitions.

Do you think more terms should be added to this glossary? Let us know by taking the Water Board Data Tool Kit Feedback Survey!Back to Tool Kit

  Glossary for definitions

Computer vision:
techniques that gain understanding from digital imagery or videos, essentially transforming said imagery into data that can be analyzed and used to develop models and predictions.
Data:
A set of values representing a specific concept. Data become “information” when analyzed and interpreted in a way that extracts meaning and provides context. (Definition adapted from: https://www.data.gov/glossary)
Data dictionary:
A collection definitions, attributes, and allowable values about each data element.
Data science:
The process of obtaining information and insights from data.
Dataset:
The completed combination of data, metadata, and data dictionary.
Machine learning:
A field of study that gives computers the capability to learn without being explicitly programmed.
Human-readable:
Data cannot be read by a computer. This could be in the form of non-digital material (e.g. printed documents or data sheets) or digital material that the computer cannot access (e.g. PDFs, unformatted Excel spreadsheet).
Machine-readable:
Data can be automatically read and processed by a computer (e.g. CSV, JSON, XML).
Metadata:
Data about your data; the who, what, when, where, why, how about your data.
Predictive modeling:
Techniques gain understanding from tabular datasets.
Reinforcement learning:
A form of learning where the algorithm learns to react to the environment and is rewarded it gets something correct.
Supervised learning:
A form of learning where we are tasked with finding patterns from a dataset where each observation has a label.
Tidy data:
Data that are structured such that the data are easy to manipulate, model, and visualize.
Unsupervised learning:
A form of learning where we are tasked with finding patterns when we don’t know what the “right answers” or labels for the outputs should be.