• References
  • Services
  • About
  • Blog
    • Access-SQL Articles
    • Excel Articles
    • Data Integration
    • General Data Management
  • Contact Michelle
1ST CONTACT DATABASES

How to Plan for a Data Integration Project

0 Comments

 
Picture
There was a time, before computers, when we literally could not get our hands on the data we needed to make decisions. And now, 25-30 years later, we have so much data that we’re drowning in it.

​As technology and software have evolved, the sheer amount of data we have to manage in our work place has increased exponentially. Anymore, it’s not uncommon to have multiple sources of data in any given office.
Common sources of data are:
  • Accounting, personnel and payroll systems (not always housed in the same software package).
  • CRM Software
    • Business Purpose related software – just a few examples include:
    • Job tracking software used in the construction industry
    • Event management software used in membership driven organizations
    • Inventory management software used in retail, manufacturing, etc..
      • Patient Records software used in clinics, hospitals and other health-care settings.
  • Grants management software used in non-profit settings… etc…
  • Excel spreadsheets &/or local databases that manage data unique to your office.

One major problem, with so much data, is integrating it. What do you do when you have related data spread across multiple sources? Integrating data from multiple containers can be very challenging. Following are some basic questions and guidelines to consider before starting the data integration process.

  • Outline your project objective. Knowing your end objective is very important. Answering these questions will help you stay on track, once you start delving into all the different data sources.
    • Types of reports you want to generate
    • Types of analysis you want to do
    • How often you want to access the data?
    • Who you will share data with?
    • Who will be running reports or doing data analysis?
​
  • Understand data source architecture. Related fields are essential to data integration. Best case scenario; you will be able to connect data sets using a common ID Number. But, most of the time you are going to have to use text fields, such as connecting First Name, Last Name fields across multiple record sets. It is important that these related fields be identified as soon as possible.
    • What sources of data do you need to integrate?
    • Who controls those data sources?
    • Do you have access to those data sources? How?
    • Can you do data exports?
      • Do those exports contain the necessary fields you’ll need for analysis?
      • Do those exports contain key fields?
    • Can you get direct access to the data (best case scenario)?
      • ODBC connections to tables.
      • Direct connection to spreadsheets, etc…
    • Document the data source architecture and how different sources are related. A picture is worth a 1000 words, take time to diagram the following:
      • Draw lines connecting related fields
      • Identify how those fields are related

  • Determine required fields. Sometimes it helps to actually mockup what you want your final output to look like. This exercise will help you determine which data fields are required in the integration project.
    • If two fields will be linked (for integration) then those fields will be required.
    • If fields are going to be used in calculation, then those fields will be required.
    • Do you need fields to help clarify results? If so, consider these fields required as well.
    • Do you need to create and populate new data fields, in addition to the fields you already have access to?

  • Pattern searching and analysis. This is where things can get tricky. The more data sources, and the more end-users entering data, the more issues you will have with data quality.
    • Outline the patterns you will search for to determine accuracy of data. For instance… if you need demographic information on populations served and 15% of the records lack age information, how will you handle this reality?
  • How do you determine the overall accuracy of those records? Knowing how you are going to account for missing data – before you start working with the data – is important.
  • When looking specifically at the subset of records with null age information, are the rest of the demographic fields populated (ie, gender, ethnicity, income range, etc…)
  • Is there a way to infer age from other fields (example a date of birth field)?
  • Is the age to date important, or is it the age at the time of service?
  • These types of questions are very important. As you go forward with your data integration project, it may become necessary for you to include some kind of explanation on how you managed these types of issues, in your final output.
    • How will you managed related fields, used for linking data. As mentioned earlier, the best case scenario is that you will be linking ID number fields. But, most of the time, my clients have to link text fields. For instance, they have to link two different data sets by first and last name fields. This can run into problems. A typical example is my name. Michelle Meyer is a pretty common name (even if it’s spelled correctly 100% of the time). If you’ve multiple data sets that need to be linked by first and last name – following are a few things you’ll need to take into consideration:
  • How comfortable are you that names are spelled correctly 100% of the time. Because once you link first and last name fields, if one recordset has the last name spelled “Meyer” and another recordset has the last name spelled “Myer”, then the two records won’t link.
  • Are your recordsets large enough that you have to include a 3rd field to manage multiple people in your database with the same name? For instance, will it be necessary to include an address field, or city field, in the data link?
  • Do you need to do some data clean-up BEFORE trying to link multiple data sets? If you’re not confident that data entry is accurate, then it may be necessary to spend some time cleaning the datasets, before integrating them.

Integrating data from multiple sources is possible. It’s the type of project I’m called in to tackle, on a regular basis. But, for the best and most accurate results, it takes planning and fore-thought. The more data sources you have, and the larger those recordsets are the more intentional one has to be in planning an integration project. But, the results of a good integration project are worth the time invested. Data integration projects, managed properly, can give you the data you need to make decisions.​

More Data Management Articles
  • Using Microsoft Access for Data Evolution Projects
  • Yes – Microsoft Access works in a Multi-User Environment
  • YES – Microsoft Access Can be Used Securely
  • Deploying Microsoft Access in a Remote Environment
  • Why You Should Care about RAD and How It Impacts Your Bottom Line ....
  • Do you have questions about your own data management project? Contact Michelle.
0 Comments



Leave a Reply.

    Michelle Meyer

    Articles discuss data management integration

    Archives

    November 2018
    April 2018
    January 2018

    Categories

    All
    Access
    Data Management
    Outlook

    RSS Feed

Count Began On: 02-17-2018


©1st Contact Databases - 2018
View my profile on LinkedIn
Picture
  • References
  • Services
  • About
  • Blog
    • Access-SQL Articles
    • Excel Articles
    • Data Integration
    • General Data Management
  • Contact Michelle