by Sarah Mason
Databases come in two types, Relational Database or Non-Relational Databases. Relational Databases have set relationships between entities that can be represented as row and column with attributes. This is a set structure to the set of data, or facts, stored for repeated access. The access of the data to find information is a feature of the Database. The structure is defined to create a logical map of finding data by relationship.
Extraction is part of ETL. ETL stands for Extraction, Transform, and Load. This is a standard manipulation of a Relational Database. The process is removing data, changing it, then reinserting into the database for use. For Extraction to be useful, it must follow set logic to pull data that is needed. This is done by defining the facts required by location and relationship in the database.
Schema is the logical organization of the database. This is a store of facts and information by relationship and definition. An example is organizing a database for the weather over the past year. Logically, values for the temperature and amount of sunlight would be stored per date for the past 365 days. The relationship would be that the values would be distinct by date and time. This allows for easier extraction by generating rules to provide data expected.
Using schema in extraction as a first step of Extraction Transform and Load, is dependent on a logical map of the database. Finding the temperature by date can give insight to seasonal variance in temperature and correlation to sunlight, which proves the best practice of defined schema that are easy to use for data extraction for analysis or transformation and Loading.
Sign up for events, news, and updates by signing up for our Newsletter.
Sarah Mason is a Healthcare Data Analyst and Founder Sarah Mason Consulting LLC.