Sign in or register
for additional privileges

C2C Digital Magazine (Fall 2017 / Winter 2018)

Colleague 2 Colleague, Author

You appear to be using an older verion of Internet Explorer. For the best experience please upgrade your IE version or switch to a another web browser.

An Exploratory Factor Analysis of IT Satisfaction Survey Data (from 2014)

By Shalin Hai-Jew, Kansas State University 

To see how an exploratory factor analysis might work with real-world data, anonymous IT satisfaction survey data from 2014 was used for this light walk-through.  The survey instrument itself was a revised inherited open-source survey.    

First, before data is used, it has to be downloaded accurately.  One step was to recode the Likert-like scores with descending point values instead of ascending point values (Figure 1).  

Recoding values before data download from the Qualtrics Research Suite.  




Figure 1:  Recoding Values from the Survey 


When the data was downloaded, the numerical values were selected—so that the variables (items) could be compared (to see if there were co-associations).  (Figure 2) 



 
Figure 2:   Downloading the Data Table with Numeric Values 


The basic data structure of the extracted information was to have variables in the columns, and the row data were the cases.  

The survey itself was exported, so that when the factor analysis was done, the questions and prompts related to the respective variables could be read directly (Figure 3), and text labels could be applied to the extracted factors (components or dimensions).  (What should also have been done was a renumbering of each of the questions, so that everything would be in order.  Surveys do not come together in perfect order in chronological time, so the renumbering would have brought increase clarity.)  


 

Figure 3:  Exporting the Survey to Word 


There are some data requirements before factor analyses may be run on the data effectively.  

Sufficiency of dataset and “some multicollinearity”.   Because exploratory factor analysis is a multivariate statistical analysis approach, there have to be a sufficient number of variables that have some similarities among themselves to enable correlations (co-relating).  In this survey, there were some 50 usable items for the analysis and 413 responses (but fewer fully completed cases).  

The dataset has to have numeric data for the variables.  This means that any string (text) data has to be recoded to numeric data or omitted.  (In this case, some open-ended questions were omitted as were some complex multi-response questions.)  

There have to be a sufficient number of cases (with complete response data per case or per respondent or per row) to enable the identification of factors, in a statistically significant way.  

Prior to the factor analysis, some tests were run on the data for “multi-collinearity.”  Some level of multicollinearity is desirable because if variables are unrelated (totally orthogonal), they are not likely to cluster around some related factors.  If they are too collinear, then the results might be muddy, with extracted factors being hard to define.  

A simple test sometimes used to linearly test multicollinearity is sometimes done (albeit with switching out different variables to be the dependent variable).  For this process:     

Open SPSS.   
Open a new dataset (wait while it generates a preview).  

Work through the Text Import Wizard (for the .csv file).  Clarify how the variables are arranged, where the variable names are located (usually in Row 1), the symbol used for decimals, how the cases are arrayed (by rows), the delimiter identification (comma), how dates are represented in the dataset, and so on.  Sometimes, it takes several go-around to get the underlying data in shape (in Excel first) before it may be opened into SPSS.  

Go to the Analyze tab in the menu.  Select Regression -> Linear (Figure 4).  


 

Figure 4:  Analyze -> Regression -> Linear 


Pick one variable to be the Dependent Variable (DV), and then load the other variables in the Independent Variables (IV) block (Figure 5).  
 



Figure 5:  Variables in the Linear Regression 

In the Statistics area, select the “Collinearity diagnostics” box (Figure 6).  



 
Figure 6:  Selection for Collinearity Diagnostics

In the Coefficients table, check the column for collinearity statistics.  “Tolerance” refers to how variable one independent variable is that is not explained by the other independent variables.  The desirable cutoff for tolerance is said to be between 0.10 to 0.25 as the minimum tolerance ("Tolerance," 2017).  For the “VIF” or Variance Inflation Factor, this index “measures how much the variance (the square of the estimate’s standard deviation) of an estimated regression coefficient is increased because of collinearity” or correlation with other predictor variables ("Variance Inflation Factor," 2017).   The highest level of VIF that is acceptable is 10 (“Variance Inflation Factor,” 2017)  A few variables go over that in this dataset…from the one run linear regression (Table 1).  (As noted earlier, this should be run multiple times with different variables in the DV position.)  




Table 1:  Coefficients Table 

On the whole, though, this works.   There is some degree of multicollinearity but not excessive amounts.  

The exploratory factor analysis.  Again, the assumptions of an “exploratory” factor analysis is that the researcher does not have initial hypotheses about factors in the data. There are no applied theories or frameworks or other a priori assumptions. There are no hypotheses being tested with the factor analysis.  (If there were a priori assumptions, then the type of factor analysis being conducted would be a “confirmatory” factor analysis.  The process would be the same, but the analytical approaches would be different.)  Figure 7 shows how to start a factor analysis.  


 

Figure 7:  Analyze -> Dimension Reduction -> Factor   

In the related windows, it is important to include all variables in the factor analysis.  In the section for the Correlation Matrix, the Kaiser-Meyer-Olkin (KMO) test should be run in order to see how well the dataset is set up for the extraction of factors, and the related Bartlett’s test of sphericity (which tests for homogeneity of variances and how normal an underlying frequency distribution is for a set of data).  Extract a correlation matrix.  Extract a scree plot.  Set the factor extraction based on factors with eigenvalues greater than 1 (instead of selecting a desired number of factors, at least not initially).  Go with the default maximum iterations for convergence at the default of 25.  The factor analysis rotation may be set at Promax with a Kappa of 4.  Select any other desirable matrices.  Exclude cases listwise (if there is incomplete information in a case, then it is left off altogether from the factor analysis.)  Select the coefficient display format to be sorted by size (from the biggest factors to the smallest), and suppress small coefficients at 0.30 to start.  (The default in SPSS is set to 0.10.)  

Resulting Factor Analysis

As noted earlier, the KMO indicates how likely it would be for this dataset to enable some factors (sub-groups) to be extracted.  A KMO score would ideally be 0.80 or higher, and in this case, rounded up, this dataset just makes.  The statistical significance for Bartlett’s Test is sufficient (Table 2).  


 
Table 2:  KMO and Bartlett’s test of sphericity 

A study of “communalities” shows how much an item correlates with all other items, and Table 3 shows some fairly high correlations, which suggests some communal coherence among the variables.  



Table 3: Communalities 

Figure 8 shows a scree plot. This line graph shows the loadings of extracted factors above an eigenvalue of 1 (where the red line is).  This visualization shows the 14 factors that had fairly significant loading by variables in this particular dataset.  

 

Figure 8 :  Scree Plot 

Table 4, the “Total Variance Explained,” shows that this factor analysis can explain a cumulative 73.872% of the variance in this dataset.  
 
Table 4:  Total Variance Explained 

In Table 5, the Pattern Matrix, the fourteen “common” factors may be seen in the column headers in descending order.  The coefficients in the pattern matrix show how the variables in the left column load to the respective factors.  Cross-loadings, when variables load to two factors (with less than a .2 difference), are usually addressed by eliminating the variables sequentially and seeing if that may be done without ending up with a failure to converge in 25 rotations.   The variables with problematic crossloadings are the following:  Q36_30, Q36_25, Q36_4, Q14_3, Q36_18, Q22_1, Q36_16, Q6, and Q36_10.  

If the variables cannot be removed without causing problems in the rotation, then another approach is to raise the threshold for inclusion of loading.  In this case, the loading coefficient was set to 0.4, which resulted in a much cleaner pattern matrix, with only three variables cross-loading.  Increased cumulative variance was explained (74.244%), with 14 factors.  

 
Table 5:  Pattern Matrix 

The next step, once a pattern matrix has been cleaned up, is to label the variables with human-readable text for sense-making.  In this case, one has to return to the survey to read the related questions and prompts for the loading items…in order to label them.  (Also, a component correlation matrix can be run to see how the factors correlate with each other.)  

So the variables that loaded to the respective factors were as follows:  

Factor 1:  Q19_2, Q19_10, Q19_5, Q19_4, Q19_8, Q19_3, Q19_1, and Q19_7
Factor 2:  Q36_29, Q36_27, Q36_31, Q36_19, Q36_11, Q36_7, Q36_6, Q36_12, Q36_8, and Q36_30 
Factor 3:  Q16_4, Q16_1, Q16_2, Q16_3, and Q36_3
Factor 4:  Q8_2, Q8_3, Q8_4, Q8_1, and Q36_25 (albeit with co-loading) 
Factor 5:  Q10_1, Q10_2, Q10_3, and Q36_5
Factor 6:  Q36_15, Q36_26, Q36_9, and Q36_4 
Factor 7:  Q14_1, Q14_2, and Q36_2
Factor 8:  Q12_3 and Q12_4
Factor 9:  Q4 and Q36_18
Factor 10:  Q22_1 and Q19_6
Factor 11:  Q22_2 and Q36_16
Factor 12:  Q12_1 and Q12_2
Factor 13:  Q18 and Q6
Factor 14:  Q17_2 and Q17_1 

The applied labels to the respective factors were as follows:  


1. Network Function and Reliability
2. IT Physical Spaces and Direct Services 
3. Telephone Services
4. Tech Services / Follow-through
5. Help Desk Support and Availability 
6. User Accounts, CONNECT Services, Survey Tool, Web Services 
7. Computer Technologies in Classrooms
8. Mobile Friendly Technologies
9. Gender and ? 
10. Availability of Network and Mobile  
11. File Storage Location and Network Speed
12. Mobile Friendly Technologies  
13. Wifi Availability on Campus
14. Strength of Signal, Availability of Connectivity on Campus

Table 6:  Named Factors 

Ideally, names for these factors (components or dimensions) should be much more parsimonious.  

Reliability of the factors.  Finally, some researchers test the reliability of the respective factors by bundling the respective variables underlying each factor and running a reliability analysis on them (to see how coherently the respective factors are describing a particular construct).  For this, Cronbach’s alpha is captured as a representation of reliability (Figure 9).



 
Figure 9:  Analyze -> Scale -> Reliability Analysis 

Figure 10 shows the selection of the variables that loaded to Factor 1.  


 

Figure 10:  Variables for Factor 1 Selected

The reliability analyses result in a Cronbach’s alpha score.  The general cutoffs suggest that .6 and over is acceptable, .7 to .9 is good, and .9 is excellent… in terms of how well the variables represent a coherent construct (or factor).  If the numbers are lower than .6, the reliability of the elements representing the factor construct is unreliable…and may not be indicating a coherent or valid component.  



Factor Numbers



Variables / Items



Topic



Cronbach’s Alpha



1



Q19_2, Q19_10, Q19_5, Q19_4, Q19_8, Q19_3, Q19_1, and Q19_7


(8 items / variables)



Network Function and Reliability



0.93


[182 (44.1%) cases valid cases, 231 (55.9%) excluded (based on listwise deletion)]



2



Q36_29, Q36_27, Q36_31, Q36_19, Q36_11, Q36_7, Q36_6, Q36_12, Q36_8, and Q36_30


(10 items / variables)



IT Physical Spaces and Direct Services



0.905


[312 (75.5%) cases valid cases, 101 (24.5%) excluded]



3



Q16_4, Q16_1, Q16_2, Q16_3, and Q36_3


(5 items / variables)



Telephone Services



.927


[304 (73.6%) cases valid, 109 (26.4%) excluded]



4



Q8_2, Q8_3, Q8_4, Q8_1, and Q36_25 (albeit with co-loading)


(5 items / variables)



Tech Services / Follow-through



.819


[326 (78.9%) cases valid, 87 (21.1%) excluded]



5



Q10_1, Q10_2, Q10_3, and Q36_5


(4 items / variables)



Help Desk Support and Availability



.914


[314 (76%) cases valid, 99 (24%) excluded]



6



Q36_15, Q36_26, Q36_9, and Q36_4


(4 items / variables)


 



User Accounts, CONNECT Services, Survey Tool, Web Services



.720


[325 (78.7%) cases valid, 88 (21.3%) excluded]



7



Q14_1, Q14_2, and Q36_2


(3 items / variables)



Computer Technologies in Classrooms



.779


[310 (75.1%) cases valid, 103 (24.9%) excluded]



8



Q12_3 and Q12_4


(2 items / variables)



Mobile Friendly Technologies



.826


[225 (54.5% cases valid, 188 (45.5%) cases excluded]



9



Q4, Q36_18


(2 items / variables)



Gender and ?



.113


[319 (77.2%) cases valid, 94 (22.8%) excluded]



10



Q22_1 and Q19_6


(2 items / variables)



Availability of Network and Mobile 



.669


[298 (72.2%) valid, 115 cases (27.8%) excluded]



11



Q22_2 and Q36_16


(2 items / variables) 



File Storage Location and Network Speed



.713


[305 (73.8%) cases valid, 108 (26.2%) excluded]



12



Q12_1 and Q12_2


(2 items / variables)



Mobile Friendly Technologies 



-.254


“The value is negative due to a negative average covariance among items. This violates reliability model assumptions. You may want to check item codings.”


[279 (67.6%) cases valid, 134 (32.4%) excluded]



13



Q18 and Q6


(2 items / variables)



Wifi Availability on Campus



.035


[295 (71.4%) cases valid, 118 (28.6%) excluded]



14



Q17_2 and Q17_1


(2 items / variables)



Strength of Signal, Availability of Connectivity on Campus



.619


[305 (73.8%) cases valid, 108 (26.2%) excluded]





Table 7:  Measuring the Reliability of the Respective Factors by Variables with Cronbach’s Alpha 

Table 7 communicates the descending order of reliability of the factors.  Factors 9, 12, and 13 were not considered reliable.  



About the Author

Shalin Hai-Jew is an instructional designer at Kansas State University.  Her email is shalin@k-state.edu.  

Comment on this page
 

Discussion of "An Exploratory Factor Analysis of IT Satisfaction Survey Data (from 2014)"

Add your voice to this discussion.

Checking your signed in status ...