skip to content
ChrisLi-Tech

File organization

/ 3 min read

Table of Contents

Def: Refers to the data is stored in a file

Text file VS binary form

  • Text file: Data is stored in a file as a sequence of characters.

  • Binary file: The binary equivalent of data is stored as a sequence od bytes

Record stored in a text file

  • The number of data items per line must be known and the number of characters per item must be known.

OR

  • Item seperator characters must be used

  • repeating lines defined be end-of-line character or characters

Drect access

  • Random access files

  • Can be any a

Serial files:

  • The records are put into the file chronologically, ie., according to when each record is produced. The one produced first is put into the file first, then the next one

  • Each new record is simply appended to the file so that the only
    ordering in the file is the time order of data entry.

  • Typical example:

  • For a bank to record transactions involving customer accounts

Sequential files

  • A sequential file have records that are ordered!

  • In order to allow a sequential file to be ordered, there has to be a key for which the values are unique and sequential but not necessarily consecutive.

  • When a new record is to be added what need to do?

  • Read the file sequentially and each record is written to a new file.

  • This is continued until the appropriated position for the new record is
    reached.

  • The new record is then written to the new file before the remain
    records in the old file are copied in.

Direct-access files

  • Random-access files

  • Access can be to any record in the file without sequential reading of the file

  • Direct access can be achieved with a sequential file.

  • A separated index file is created which has two fields per record

  • The first field has the key field and the second field has a value for the position of this key field value in the main file.

  • The alternative is to use a hashing algorithm

  • What is hashing algorithm?

One simple hashing algorithm

If there is a numeric key field in each record

  • Choose a suitable number

  • Divide this number by the value in the key field

  • The remainder from this division then identifies the address in the file for storage of that record

  • The suitable number works best if it is a prime number of a similar size to the expected size of the file

  • collision: when the same address is calculated for different field values, it is usually referred to as collision.

  • Ways to handle collisions
    -Use a sequential search to look for a vacant address following the calculated one.
    -Keep a number of overflow addresses at the end of the file.
    -Have a linked list accessible from each address