Week 1 Notes
Databases Introduction
|
|
| Date |
09/06/2022 |
| Topic |
Introduction |
| Professor |
John DeMastri |
| Week |
1 |
Overview
- What is a database? A collection of information that has relevance.
Files, notes, to do list, patient records, play lists, etc.
DBMS
a database management system is a software package designed to store and manage databases
- Typically large, integrated collection of data
- Models real-world enterprise
- entities: students courses
- relationships
Files vs. DBMS
Why not just use files?
You can (and you are...) but:
- Files limit concurrent access
- Copies/sharded files improve access
Sharding
Breaking a data structure into multiple pieces
Why use a DBMS?
- Data independence
- Efficient access
- Reduced application development time
- Data integrity
- Data security
- Uniform data administration
- Concurrent access, recovery from crashes
- Availability (distribution and access), scalability
- Aggregate functions
Why study Databases?
-
Shift from computation to information
- At the low-end: scramble to web-space, big data
- At the high-end: scientific applications, data analytics, collective intelligence, eScience, machine learning
-
Shift to functional computing on large datasets
-
Datasets increasing in diversity & volume
-
Digital libraries, interactive video, Human Genome, the web...
-
Integration
-
DBMS encompass most of CS
- OS, languages, theory, computational complexity, data structures, algorithms, AI, multimedia, logic
- In business
Why should YOU study Databases
- Make more money:)
- Intellectual, science- data poor to data rich, know how to handle data!
Data models
- A data model is a collection of concepts for describing data
- The relational data model is one of the most popular models
- Main concept: relation, basically a table with rows (records) and columns (attributes)
- Every relation has a schema which describes the relation name, the name of the columns (or fields), and the field types
- A schema is a description of a particular collection of data, using the given data model
- A semantic data model is a more abstract high level representation that makes it easier to come up with an initial description
Levels of abstraction
- Many views (external schema)
- Describe how users see the data
- Single conceptual (logical) schema
- defines logical structure
- Single physical schema
- describes the files (pages, blocks) and indexes uses
Example: Unitversity Database
-
Conceptual schema:
- Students (sidstr, namestr, loginstr, ageint)
- Courses
- Enroll
-
Physical schema
- Relations stored as ordered/unordered files
- Index on first column of Students
-
External schema (View):
- Course_Info(cid: str, enrollement:int)
Data Independence
Applications insulated from how data is structured and stored.
- Logical data independence: Protection from changes in the logical structure (conceptual schema) of data.
- Physical data independence: Protection from changes in physical structure of data.
Concurrency Control
- Concurrent execution (threads) of user programs is required for good DBMS performance.
- Since disk access is frequent and slow, it is important to keep the CPU humming by working on several user programs concurrently.
- Interleaving actions of different user programs can lead to inconsistency
Transaction
-
A transaction is an execution of a DB program.
-
Key concept of transation: an atomic sequence of database actions (reads/writes).
-
Each transaction, executed completely, must leave the DB in a consistent state (provided DB was consistent when the transaction begins).
-
Users can specify some simple integrity constraints on the data and the DBMS will enforce these constraints.
-
The DBMS does not really understand the semantics (meaning) of the data.
- e.g. how interest is calculated on student overdue accounts.
-
So, ensuring that a transaction (runs alone) is ultimately the user's responsibility
Polyglot Persistence
Polyglot
Speaking, writing, written in, or composed of several languages.
- Using different data storage technologies to handle varying data storage needs.
- An application that talks to different databases using each for what they are best at to achieve an end goal
Data Modeling Entity Relationship Model
|
|
| Date |
09/07/2022 |
| Topic |
Entity Relationship Model |
| Professor |
John DeMastri |
| Week |
1 |
Model Basics
| Shape |
Meaning |
| Squares |
Entities |
| Diamonds |
Relationships |
| Ovals |
Attributes |
Relationships can have attributes
Entity Relationship (ER) Model
ER Data Model
- Widely used in data modeling and database design
- Describes data involved in terms of entities and relationships.
- Provides useful concepts that allow us to move from an informal description to a more detailed & precice description
- Independent of specific database models
- Can be used to implement data model in database
Purpose is to identify the specific data that has to be encompassed by a specific entity.
ER Model Basics
Entity
- "Real-world" object we seek to represent, and distinguish from other objects.
- Entities described (in DB) using a set of attributes.
- Think "objects" (class instances), or nouns
Entity Set:
- A _collection of similar entities
- All entities within the same entity set have the same attributes.
They all fit into the same schema
- Each entity has a key that uniquely identifies it within the entity set.
- Key can be a new attribute, or another attribute (or group of attributes) can also uniquely identify the entity.
- Each attribute has a domain (of acceptable values).
Example entities
Entity:
- Student
- Shared attributes: student id, name, major, etc.
- Attributes we don't care aabout: hair color, whether they sky dive, religion, political ideology. Outisde domain of concern.
Attribute Domains:
- Student id: The set of all integer numbers between 1 - 10000
- Etc.
Constrain- the more clearly they are defined, the easier it is to make sure they are complete.
Relationship:
- Association among two or more entities
- E.g. EF Codd teaches Database Systems; Jim Gray majors in SE.
- Think verb
Relationship Set:
- Collection of similar relationships
- An n-ary relationship set
R relates n entities (E1, ..., En)
- Same entity set could participate in different relationship sets, or in different roles in the same set. E.g. using first order logic:
- PHD-advisor
- Co-author
- Research-partner
Relationships
Keys (ER Model)
Key
One or more attributes that uniquely identifies an entity instance. I.e., an individual entity in an entity set
- Keys should be minimal and appropriate
- Minimal - minimum number of attributes
- Appropriate - make sense semantically
[Student]
-> name
-> phone number
-> address
Keys as defined by the relational model
Key
set of entity attributes that identifies one or more entity instances within an entity set
Superkey
distinctly identifies a record, e.g. name + phone_number
Candidate keys
set of keys that are minimal (attributes) superkeys.
Primary key
candidate key selected as primary
Foreign key
Attribute of entity or relationship that references another entity. (e.g. address)
Key Constraints
The arity (or cardinality) of a relationship is defined with a key constraint.
works_in relationship (m to m):
- An employee can work in many departments
- A department can have many employees
manages relationship (1 to m):
- Each department has one manager according to the key constraint.
- An employee manages 0, 1, or more than one department.
Participant Constraints
Does Every department have a manager?
Weak Entities
|
|
| Date |
09/09/2022 |
| Topic |
Weak Entities |
| Professor |
John DeMastri |
| Week |
1 |
Recap
Key
set of entity attributes that identifies one or more entity instances within an entity set
Superkey
distinctly identifies a record, e.g. name + phone_number
Candidate keys
set of keys that are minimal (attributes) superkeys.
Primary key
candidate key selected as primary
Foreign key
Attribute of entity or relationship that references another entity. (e.g. address)
Weak Entity
An entity that cannot be uniquely identified by its attributes alone.
- It must use a foreign key in conjunction with its attributes to create a primary key.
- foreign key in one entity is the primary key of another entity that it has a relationship/association
- Owner entity set and weak entity set must participate in a 1-m relationship set (one owner, many weak entities)
- Weak entity set must have total participation in this identifying set.
- Example: dependents on an insurance policy. The dependents can not exist without the primary beneficiary.
ISA ('isa') Hierarchies
- Inheritance: A isA B iff every A entity is also considered to be a B entity
- Issues:
- Overlap constraints: Can Paul be an hourly and a contract employee?
- Covering constraints: Does every Employees entity have to be an hourly or contract employee?
- When to use:
- To add descriptive attributes specific to a subclass
- Identify entities that participate in a relationship
- Typically solved with type field
Aggregation
- Relationship involving entity sets and a relationship set
- Allows us to treat a relationship set as an tity set for purposes of participation in (other) relationships.
- Can be n-ary relationship
Conceptual Design using ER Model
- Design Choices:
- Should a concept be modeled as an entity or an attribute?
- Should a concept be modeled as an entity or a relationship?
- Identifying relationship: Binary, Ternary, or n-ary?
Constraints in ER Model
- A lot of data semantics can (and should) be captured
Entity versus attribute
- Should address be an attribute of Employee, or an entity (connected to Employees by a relationship)?
- Depends on use and semantics (meaning) of the data:
- Several addresses per employeee => address must be entity.
- Several employees per address => address must be entity
- If the structure is important (city, street, etc.) => address must be an entity (since attribute values are atomic).
- 1 to 1 relationship can be tricky
Entity versus attribute - Issues?
- Employee as separate entity allows employee to work in Departments for multiple durations
Summary of Conceptual Design
-
Conceptual design (data modeling) follows requirements analysis.
-
Results in "High level" description of data to be stored.
-
ER model popular for conceptual design.
-
Constructs are expressive, and close to the way people think.
-
Basic constructs: entities, relationships, and attributes (of entities and relationships).
-
Several integrity constraints:
- Key (primary, foreign) and participation generally, and overlap/covering for ISA.
-
Note: many variations of ER model syntax!
-
Entity sets and relationship sets can be represented as classes in UML together with key constraints, weak entities, and class hierarchies.