File organization
/ 3 min read
Table of Contents
Def: Refers to the data is stored in a file
Text file VS binary form
-
Text file: Data is stored in a file as a sequence of characters.
-
Binary file: The binary equivalent of data is stored as a sequence od bytes
Record stored in a text file
- The number of data items per line must be known and the number of characters per item must be known.
OR
-
Item seperator characters must be used
-
repeating lines defined be end-of-line character or characters
Drect access
-
Random access files
-
Can be any a
Serial files:
-
The records are put into the file chronologically, ie., according to when each record is produced. The one produced first is put into the file first, then the next one
-
Each new record is simply appended to the file so that the only
ordering in the file is the time order of data entry. -
Typical example:
-
For a bank to record transactions involving customer accounts
Sequential files
-
A sequential file have records that are ordered!
-
In order to allow a sequential file to be ordered, there has to be a key for which the values are unique and sequential but not necessarily consecutive.
-
When a new record is to be added what need to do?
-
Read the file sequentially and each record is written to a new file.
-
This is continued until the appropriated position for the new record is
reached. -
The new record is then written to the new file before the remain
records in the old file are copied in.
Direct-access files
-
Random-access files
-
Access can be to any record in the file without sequential reading of the file
-
Direct access can be achieved with a sequential file.
-
A separated index file is created which has two fields per record
-
The first field has the key field and the second field has a value for the position of this key field value in the main file.
-
The alternative is to use a hashing algorithm
-
What is hashing algorithm?
One simple hashing algorithm
If there is a numeric key field in each record
-
Choose a suitable number
-
Divide this number by the value in the key field
-
The remainder from this division then identifies the address in the file for storage of that record
-
The suitable number works best if it is a prime number of a similar size to the expected size of the file
-
collision: when the same address is calculated for different field values, it is usually referred to as collision.
-
Ways to handle collisions
-Use a sequential search to look for a vacant address following the calculated one.
-Keep a number of overflow addresses at the end of the file.
-Have a linked list accessible from each address