Training Resources banner

Data Tool Kit - Glossary

If you are new to the data scene, it can be overwhelming to know where to start. Here we explain some foundational data concepts. If you do encounter jargon or terminology with which you aren’t familiar, please see the Glossary for definitions.

Do you think more terms should be added to this glossary? Let us know by taking the Water Board Data Tool Kit Feedback Survey!Back to Tool Kit

  Glossary

Search assistance for the Data Tool Kit Glossary:

  1. To narrow the list of terms or definitions, enter search criteria in the search box below. Please note: Search is case insensitive, alphanumeric, and allows up to 20 characters.
    Compatibility View Settings must be off for search box functionality to work
  2. To display the entire glossary, clear the search box.
  3. To activate the sorting function in the table, click the column heading on which you would like to sort.
  4. To search the page, use your browser’s "Find" command from the Edit menu at the top of your browser window or press (CTRL+F) keys.
Term Definition Relevant Handbook
Computer vision Techniques that gain understanding from digital imagery or videos, essentially transforming said imagery into data that can be analyzed and used to develop models and predictions. Machine Learning Handbook
Data A set of values representing a specific concept. Data become “information” when analyzed and interpreted in a way that extracts meaning and provides context. 
Definition adapted from: https://www.data.gov/glossary
People's Data Science Handbook
Data dictionary A collection definitions, attributes, and allowable values about each data element. Data Management Handbook
Data science The process of obtaining information and insights from data. People's Data Science Handbook
Dataset The completed combination of data, metadata, and data dictionary. Data Management Handbook
Datathon Data engagement events that tend to involve more discussion about the question(s) at hand and the data available to answer them; questions may need to be refined and data found before participants can work on answering the questions with data. Data Engagement Handbook
Hackathon Data engagement events where participants use already defined questions or objectives and available data to build prototypes of analyses, visualizations, or interactive tools that answer refined questions. Data Engagement Handbook
Human-readable Data cannot be read by a computer. This could be in the form of non-digital material (e.g. printed documents or data sheets) or digital material that the computer cannot access (e.g. PDFs, unformatted Excel spreadsheet). Data Management Handbook
Machine learning A field of study that gives computers the capability to learn without being explicitly programmed. Machine Learning Handbook
Machine-readable Data can be automatically read and processed by a computer (e.g. CSV, JSON, XML). Data Management Handbook
Metadata Data about your data; the who, what, when, where, why, how about your data. Data Management Handbook
Open data Open data principles generally specify that datasets should be public, accessible, described, reusable, complete, timely, and managed post-release. Open Data Handbook
Open source code Open source code refers to code that is made freely available for anyone to use, modify, and share. Examples of open source software include MySQL, Firefox, and WordPress. Open Source Code Handbook
Predictive modeling Techniques used to gain understanding and make predictions from tabular datasets. Machine Learning Handbook
Proprietary source code Proprietary source code is copyrighted by a company or individual and not shared with others. Examples of proprietary software include Microsoft Office and Adobe Photoshop. Open Source Code Handbook
Reinforcement learning A form of learning where the algorithm learns to react to the environment and is rewarded it gets something correct. Machine Learning Handbook
Supervised learning A form of learning where we are tasked with finding patterns from a dataset where each observation has a label. Machine Learning Handbook
Tidy data Data that are structured such that the data are easy to manipulate, model, and visualize. Data Management Handbook
Unsupervised learning A form of learning where we are tasked with finding patterns when we don’t know what the “right answers” or labels for the outputs should be. Machine Learning Handbook