Data Conversion for Relational and Object Oriented Databases
By Donna Bridgham, Division of Aging and Rehabilitative Services, Commonwealth of Virginia
Abstract: When doing a database project, at some time, there will be the need to do data conversion. The data conversion will involve files that are to be converted to a Relational Database (RD) or an Object Oriented Database (OODB). There will have to be a business process in place to reduce the amount of errors during the process and a way to manage the data once it has been moved into the database. There are business cases that review the process for data conversion and data migration along with processes for using and storing objects even though many of the processes leave out essential steps in creating a quality process for the data conversion.
This article will examine the process of database design for data conversion, data migration, and data maintenance. The article will explore the conversion project for both types of databases and examine the techniques that need to be used to have a successful conversion project. The gap in many studies only explores the data conversion and it does not include the total process. It may be helpful to include some common terms first.
Common Terms
Boyce-Codd’s Normal Form - BCNF -- Find all the candidate keys and check if all the FD satisfies the definition.
MS Access—Single or multi-user database created by Microsoft
Object Oriented Database (OODB) -- The Object Oriented Database does not use the specific design parameters that a relational database uses. The Object database can be designed and adjusted to fit most data environments. The Object database stores objects rather than data and the objects usually consist of methods and attributes.
Query — An inquiry into the database using a statement to extract data from database to use in different scenarios.
Relational Database (RD) -- A relational database is a collection of related data that represents the miniworld or Universe of discoveries. A database can be small with a few records to an enterprise size database that has millions of records (Elmasri & Navathe, 2010).
SQL Server — Microsoft’s Enterprise Database systems
Third Normal Form (3NF) -- Candidate key is AB, and the FD’s left hand side is equal to that of the candidate key.
Data Conversion for Relational and Object Oriented Databases
Relational Databases and Object Oriented Databases are two types of databases that use data or data controls to store data in applications that are used for different operations. A flat file database is a database designed usually around a single table. All of the data in a flat file can be placed in one table such as Name, Address, State, and Phone. If another flat file is then created with email addresses, there is no way to connect the data between the two tables, and duplicate data may be created which would affect the integrity of the data. Therefore that collection of data would not be considered a relational database but could be an object oriented database depending if the tables became objects. (Database Dev, 2012).
A relational database is a collection of related data. A relational database can be small with a few records to an enterprise size database that has millions of records (Elmasri & Navathe, 2010). When a relational database is queried properly it can create items such as key reports to users. It is imperative in a relational database that there is a relationship between all the data. It is important not to leave orphan tables (a table that has no relation to any other table in the database) in a relational database.
The Object Oriented Database does not use the specific design parameters that a relational database uses. The Object database can be designed and adjusted to fit most data environments; therefore, it can have any type of data. The Object database stores objects rather than data, and the objects usually consist of methods and attributes. The other important factor about Object Databases is the use of classes (COMTECHProject, n.d). The classes are used to define the data and methods the object will have. The class does not contain the data or methods but defines data methods contained in the class. Languages such as JAVA, JQuery, and C++ use classes which define the data methods usually by making calls to a library. This can limit the use of finding relations on data.
In 1997, de Vries and Fleck wrote an article on data conversion and the transition from legacy systems to client server systems. de Vries and Fleck were one of the first to write down a formal process for reviewing if a company wanted or needed to convert an application from a legacy system to a client/server application (1997). The key to the process was benchmarking during the data conversion process. Legacy systems were popular since they could be programmed to perform many types of business operations, and much of the data would not be considered relational. de Vries and Fleck wrote about the need to understand the legacy system and then bring the technology of client/server convenience to build a new system (1997). The client/server system had to be analyzed, so it followed the design of a client/server system by introducing the three tier process associated with client/server systems.
Methodology
Data conversion, migration, and maintenance are a part of most upgrade projects. It is also a part of projects when a new system or application is built to replace an outdated system that is no longer working properly. The issue is how does data from the old system get transferred into the new system? The easy answer is build a tool similar to one that the Japanese just patented that will rewrite all your data, queries, and system data from one system to another. In 2011, a patent was granted by Hamamatsu and Hirayama of Tokyo that will automate this process (US Patent, 2011). The process would include translating a set of queries from the first database to be used in the second database by determining the query types of the parent and the child and then generating the translated query based on the type of query that will need to be run on the new database system. Even though the data and information has been put into queries that will recreate the data functions, the question remains whether the data is clean and all transactions have been accounted for. The question on any conversion or migration project is, "Will the integrity of the data hold, be improved, or not work?" This piece is missing from the automated process.
According to Hernandez, the first step in building a database is to see how data is collected (2003). Whether it is in paper form, web form, Excel spreadsheet, or another type of form, it is important to see what data is being collected and how it is being collected. Figure 1 is an example of a spreadsheet that shows a list of offices. In order for this to be useful, an identity field would have to be identified when the data is placed in a database. Figure 1 is an example of a web form that collects data for a project management system. According to Hernandez, all of the input fields on the screen must be incorporated into the database through the design or schema of the database (2003). A form can be used actually to create the preliminary schema of the database.
Figure 1: Input Form for Employee Data
Below is a screenshot of the Microsoft Excel interface, with a list of publicly available addresses.
Table 1: List of Offices in Microsoft Excel
In 1999, when Microsoft Access was no longer considered the future of relational databases, many industries moved to SQL Server because of the limitations of MS Access. Jones, who was a writer for SQL Server Magazine, a magazine that was published by Microsoft, created 15 steps for a conversion plan. Even though this was created for SQL Server, the steps can be applicable to other conversion projects since it has a very clear process to get from A to Z. Jones included several steps that included the automated tools from SQL Server, but other servers have similar tools. Even though this list was compiled several years ago, it provides a comprehensive list of what needs to be done on a conversion project. The 15 Steps are listed in the Table 2 with additional information:
Step | Action | Purpose Today |
Step 1 | Create the Database and Log Devices | The database must be created before any actions can be done and while creating the database, all associated logs should be created. |
Step 2 | Create the Database | Create the Database on the server and determine the size that for the database. |
Step 3 | Gather Data on Table Relationships | From the Access database, determine all the relationships that you can determine. |
Step 4 | Make the File Compliant | Make sure that the file can be linked to Access. In newer versions, this can be done in different ways. |
Step 5 | Remove or Determine an order for Recursive Relationships | If one table is dependent on another, during a data load, all dependent tables must be loaded in order according to the keys. |
Step 6 | Determine the Table Dependency Order | Make sure the dependency order of tables is clear and documented. |
Step 7 | Check All Default Values | Make sure if a field has a default value, the default value is consistent in the new environment, or it will need to be changed. |
Step 8 | Move the Data Tables into the Code File | This now should be handled in files that were created before the moving the database, with newer versions of Access VBA code included in import files. In older versions, some of the code will have changed and placed in the database before moving. |
Step 9 | Make a Backup Copy of Your Files | Always make a back up of all files you are using. |
Step 10 | Run the Upsizing Wizard | The upsizing wizard is used differently in different versions of SQL Server, so check if you have any versioning questions before converting databases. |
Step 11 | Check the Log | Check all logs to make sure they are set up properly. |
Step 12 | Back Up the New SQL Server Database | Back-up and make sure you name all versions with a name that makes sense. |
Step 13 | Load Access Data into SQL Server | Using scripts, available tools, and modeling, load the Access data. |
Step 14 | Write Triggers | Triggers are not used in all versions of SQL Server for conversions. |
Step 15 | Test the New SQL Server System | All tables need to be tested; all data should be tested; all transactions should be tested. A plan and script should be written to handle this step |
Table 2: Steps for Access Database to SQL Database Conversion
The database design is key since it drives the project. If the tables have not been checked to make sure that they contain all the data that is needed, the developers will have issues later during the design phase. Also all the relationships should be drawn out, so there are no orphan tables. Each table should have a purpose in the design.
Figure 2: Example of Tables with their Relationships
The database design is key since it drives the project. If the tables have not been checked to make sure that they contain all the data that is needed, the developers will have issues later during the design phase. Also all the relationships should be drawn out, so there are no orphan tables. Each table should have a purpose in the design.
Figure 2: Example of Tables with their Relationships
In Figure 2, all the tables have an identity field or primary key. Some of the tables also have a foreign key. For example the tblLURoles has a relationship with the tblAppRoles, The field IDLURole is in both tables. Therefore, the relationship between these tables is IDLURole. If you have a flat file with the LURoles and APPRoles, you will need to place an ID that will be used to create a relationship. Your database designer will create a query to create this relationship on the conversion and then keep the relationship on all data added in the future.
A conversion takes careful planning. It is imperative that all the data, data fields and relations are taken care of. Testing needs to be done throughout the process and the end user must have a test script to make sure the conversion was done correctly.
Conclusions and Recommendations
Data migration and data conversion are still one of the biggest challenges that face developers and programmers. Saracco, Roth, and Wolfson (2004) wrote that the biggest challenges are retrieving the data since the syntactic, semantic, and performance issues need to be addressed, so there are no run time errors, so there is efficient use of the resources and acceptable degrees of errors. If the data is just converted and migrated without a formal process in place, there are higher degrees of risks. Just designing a new system on an old design and old documentation can lead to the same problems that needed to be addressed in the new application.
Jones (1999) who wrote the article for 15 steps for converting MS Access to SQL Server had one of the most comprehensive plans for data conversion as outline in Table 1. This plan needs to be updated but would be a good start for developers and administrators to start their process on a conversion project. Many of the steps can be used for any type of data conversion.
Newer applications as discussed by Wendy, Wang, and Weerakkody in 2004 use both RD and OODB technology. Once again, this article needs to be updated to use the newer tools that come with database servers and web servers. Objects need to be assigned to be used by the web side of the application and well-designed databases need to be designed to connect to the web pages.
The future of database technology will encompass better ways to integrate data, test the data before it is fully migrated, and will run constant reports not only for the administrators but for the developers. Data mining, analysis, and structuring data will include some of the techniques that should be updated. Data mining will be important since scanning and image capabilities will become a more critical part of the database conversions in the future.
Many database programmers will tell you that the database is only as good as the data. If the database design is poor, if the data migration was not done properly, if objects and/or relations were not retained and if data management was not taken care of, the database will not do the job it was designed for. In order to have strong web applications, whether they are written in JAVA, ASPX, or XML, the database has to be strong. In order for the database to do other jobs that are assigned to it such as running jobs, analytics, sending emails, and automating reports, the design drives the database. It is essential to follow a well-designed plan for data design, data migration, data conversion, data testing and data maintenance.
References
de Vries, W. A., & Fleck, R. A. (1997). Client/server infrastructure: A case study in planning and conversion. Industrial Management + Data Systems, 97(6), 222-232. Retrieved from http://search.proquest.com/docview/234921669?accountid=8289.
Elmasri, R. & Navathe, S. B. (2011). Fundamentals of database systems (6th ed.). New York, NY: Addison-Wesley.
Hernandez, M. (2003). Database design for mere mortals. (2nd ed). New York, NY: Addison-Wesley.
Jones, A. (1999). 15 steps to convert access to SQL server. Retrieved from http://www.sqlmag.com/article/data-access/15-steps-to-convert-access-data-to-sql-server-6133.
Saracco, C. M., Roth, M. A., & Wolfson, D. C. (2004). Enabling distributed enterprise integration with WebSphere and DB2 information integrator. IBM Systems Journal, 43(2), 255-269. Retrieved from http://search.proquest.com/docview/222427877?accountid=8289.
Wendy, L. C., Wang, X., & Weerakkody, V. (2004). Developing web services using the microsoft.net platform: Technical and business challenges. Journal of Enterprise Information Management, 17(5), 335-350. Retrieved from http://search.proquest.com/docview/220026368?accountid=8289.
About the Author
Donna Bridgham is a Systems Analyst with the Commonwealth of Virginia, and more specifically, with the Division of Aging and Rehabilitative Services. She provides customer service to the Virginia Department for the Deaf and Hard of Hearing. Bridgham is proficient in breaking down high-level information into details for designing databases and prototypes for application design . She also has technical expertise in data analysis. She has a B.S. from Lyndon State College, an MBA from Northcentral University (NCU), and an MS in Information Technology from American Public University System (APUS). She may be reached at donnabridgham@comcast.net.
Previous page on path | Cover, page 10 of 27 | Next page on path |
Discussion of "Data Conversion for Relational and Object Oriented Databases"
Add your voice to this discussion.
Checking your signed in status ...