Compare Two Data Frames in R

In this tutorial, we will learn how to compare two Data Frames using compare() function.

To compare two R Data frames, there are many possible ways like using compare() function of compare package, or sqldf() function of sqldf package. In this article, we will use inbuilt function, compare() to compare two Data frames.

The syntax of compare() function is

compare(model, comparison,
	equal = TRUE,
	coerce = allowAll,
	shorten = allowAll,
	ignoreOrder = allowAll,
	ignoreNameCase = allowAll,
	ignoreNames = allowAll,
	ignoreAttrs = allowAll,
	round = FALSE,
	ignoreCase = allowAll,
	trim = allowAll,
	dropLevels = allowAll,
	ignoreLevelOrder = allowAll,
	ignoreDimOrder = allowAll,
	ignoreColOrder = allowAll,
	ignoreComponentOrder = allowAll,
	colsOnly = !allowAll,
	allowAll = FALSE)

where

  • model The “correct” object.
  • comparison The object to be compared with the model.
  • equal Test for equality if test for identity fails.
  • coerce If objects are not the same, allow coercion of comparsion to model class.
  • shorten If the length of one object is less than the other, shorten the longer object.
  • ignoreOrder Ignore the order of values when comparing.
  • ignoreNameCase Ignore the case of names when comparing.
  • ignoreNames Ignore names attributes altogether.
  • ignoreAttrs Ignore attributes altogether.
  • round If objects are not the same, allow numbers to be rounded.
  • ignoreCase Ignore the case of string values.
  • trim Ignore leading and trailing spaces in string values.
  • dropLevels If factors are not the same, allow unused levels to be dropped.
  • ignoreLevelOrder Ignore the order of factor levels.
  • ignoreDimOrder Ignore the order of dimensions when comparing matrices, arrays, or tables.
  • ignoreColOrder Ignore the order of columns when comparing data frames.
  • ignoreComponentOrder Ignore the order of components when comparing lists.
  • colsOnly Only transform columns (not rows) when comparing data frames.
  • allowAll Allow any sort of transformation (almost; see Details).

The list of arguments is very big. But no worries, we will go through those that are generally used for comparing data frames.

Basic Comparison between two Data Frames

In this case, we will go with the default values and just provide the original (model in argument list) data frame and the comparison data frame.

Consider two data frames, DF1 and DF2 shown below.

> DF1 = data.frame(id=c(1,2,3,4), name=c("John", "Manu", "Surya", "Amith"))
> DF2 = data.frame(id=c(1,2,3,4), name=c("John", "Manu", "Surya", "Tinu"))
> DF1
  id  name
1  1  John
2  2  Manu
3  3 Surya
4  4 Amith
> DF2
  id  name
1  1  John
2  2  Manu
3  3 Surya
4  4  Tinu
>

DF1 and DF2 differ in the fourth row name value.

Now, use compare function with DF1 as model and DF2 as comparison.

> compare(DF1, DF2)
FALSE [TRUE, FALSE]
>

The straight away comparison results in FALSE which is right.

Let us take identical data frames and compare.

> DF1 = data.frame(id=c(1,2,3,4), name=c("John", "Manu", "Surya", "Amith"))
> DF2 = data.frame(id=c(1,2,3,4), name=c("John", "Manu", "Surya", "Amith"))
> compare(DF1, DF2)
TRUE
ADVERTISEMENT

Conclusion

In this R Tutorial, we have learnt how to compare two Data Frames.