The End-of-File Character
The term "text file" generally means a human-readable file in which each line is terminated by a specific character or pair of characters. There are three main types of text files; the main characteristic that distinguishes them is the way that each line is terminated.
The Windows/DOS text file format ends each line with the characters "Carriage Return" and "Linefeed", in that order. The Linux/Unix format terminates each line with a "Linefeed" only. Finally, the Mac (short for "Macintosh") format puts a "Carriage Return" character at the end of each line.
"Carriage Return" is often abbreviated as CR. It has the decimal value of 13 which equates to the hexadecimal value $0D. The CR character is occasionally known as Ctrl-M or ^M.
"Linefeed" is often abbreviated as LF. It has the decimal value of 10 and the hex value $0A. The LF character is occasionally referred to as Ctrl-J or ^J.
You will sometimes see the abbreviation CRLF to mean "a carriage return followed by a linefeed".
Editing a Text File
The text file itself is sometimes — but not always — terminated with the "End of Message" character, sometimes known as EM, Ctrl-Z or ^Z (character value: Decimal 26, Hex $1A). However, you are rarely aware of this, since the character is removed when you load the text file into a program.
Programs that use text files should address the fact that some text files end with EM and some do not. They can ask the operating system how long the file is, then after loading the file they can inspect the last character. If it is EM, it can be ignored. The upshot of this is that a text file may sometimes appear to be one byte longer than you think it should be.
In some cases, the end of a text file may be marked with the NUL (Null) character (decimal value 0, hex value $00).
Some Examples of Text File Extensions
Text files can generally be loaded by a text editor program (such as Windows Notepad, or NoteTab from Fookes Software), and most word-processing programs can load them
as well. However, when you save a text file loaded this way it may lose
its original format.
For example, you might load a Mac text file (in which each line ends
with LF), but if you edit and save it using a Windows text editor you
might find that each line in the file now ends with CRLF. This might
cause problems later, if the next program to use the file does not know
how to deal with CRLF-delimited files. In such case, the extra LF may
appear in the program as a strange-looking character at the beginning
of each line (starting with the second line).
Worse problems can arise if you edit a text file in a
word-processing program. When you save the file, you must ensure that
you save it as a text file rather than a word-processing file. In Word
2002 you can select "File/Save As", and then select "Plain Text
(*.txt)". If you should inadvertently save a text file in a
word-processing format, it will now contain a lot of additional
information it did not have before. This will probably render it
useless to the next program that tries to use it, since it expected an
ordinary text file. Fortunately, it will probably be easy to load the
file back into the word processing program and save it again, this time
making sure to specify a text file format.
EBCDIC Text Files
A file whose name ends with the characters .txt is almost certainly a text file. Other extensions typical of text files include .me (as in a file named Read.Me) and .htm — which is an HTML file, as used by web pages.
Windows files with the .ini extension are also text files, so they could be loaded into a text editor program. However, just because you can do this does not mean that you should do this. An ini file typically contains the settings for a program, and if you alter the file the program might stop working.
Files with the .csv extension are comma-separated-value files. These can be loaded into a text editor, but your operating system may be configured to open them in a spreadsheet if you double-click on them.
To summarize the foregoing: many programs save data in text files, but not all text files are supposed to be loaded into a text editor program.
Some mainframe computers use the EBCDIC character set instead of ASCII. As it happens, the CR character is the same in EBCDIC as in ASCII. However, the Linefeed character in EBCDIC is hex $25. An EBCDIC text file might also use the "New Line" character (hex $15).
In any case, most programs designed to work with ASCII text files generally cannot cope with EBCDIC files. So if you receive a file that you have been assured is "text", yet when you load it into your text editor it looks like nonsense, it might possibly be an EBCDIC file. (Either that, or the file has been encrypted in some way.)
Parse-O-Matic Free, Basic, Business and Enterprise are data conversion tools that allow you to parse, convert, mine, import and export data files, reports, web capture, logs, legacy databases, text, CSV (comma separated; comma delimited), ASCII, EBCDIC, and almost any data format that you may have.