SAS Data step Overview

SAS Data Step consists of group of statements in SAS language that can read raw data to create SAS data set. Data step is always starts with keyword of “Data“. So how Data step is used? Data step is useful to perform Data access and data management. SAS Data step is also to perform the following operations :

  • For retrieving information.
  • Checking for errors, validating and correcting SAS code.
  • To create new variables and compute their values.
  • To create new data sets from existing SAS datasets.
  • For manipulating and reshaping data.
  • Generating and printing reports.

Data Step process

SAS program consists of two main blocks Data step and Procedure (Proc) step. In our previous SAS tutorial, we learnt about SAS program basics. Data is created, imported, modified, merged and calculated by data step in SAS program.

When the data step is submitted for execution, it first under goes a syntax check by the SAS system, if no errors are found the data step is then compiled and executed. When executing the data step for in stream data, the SAS system creates the following three items

Input Buffer

Each raw record of data is read into an area of memory when an input statement is executed.

Program Data Vector

The SAS system builds the data set on observation at a time in this area of memory as the program is executed, values are read from the input buffer or created by programming statements and assigned to corresponding variables in the PDV (Program Data Vector).

In PDV along with all variables there are 2 automatics variables those are _N_ and _ERORR_.  Let us understand about these two variables in detail.

  • _N_ : It indicates how many times the data step has iterated. By default _N_ = 1, when iterations done it increases +1. Using this we can find out how many observations are there in dataset.
  • _ERORR_ : By default value = 0, when error encounters it gives _ERORR_ = 1. If 200 of errors, _ERORR_ = 1 only because _ERORR_ = 1 is a logical error and not a syntax error. For syntax error you won’t get _error_ = value, syntax errors can see in the log with Red color and where ever error is there it shows red color underline.

[alert-note] Note :- Syntax errors are program errors and logical errors are data errors. [/alert-note]

Descriptor Information

On each SAS data set, SAS creates and maintain information about data set and variable attributes like length, label, format, informat and data type. To see descriptor information use Proc contents procedure.

PROC CONTENTS DATA=DATASET_NAME;
RUN;