Skip to content

Commit 9e5ffa2

Browse files
author
Mark Baker
committed
Reader documentation markdown
1 parent c758748 commit 9e5ffa2

File tree

7 files changed

+560
-85
lines changed

7 files changed

+560
-85
lines changed

Documentation/markdown/ReadingSpreadsheetFiles/01-file-formats.md renamed to Documentation/markdown/ReadingSpreadsheetFiles/01-File-Formats.md

+24-14
Original file line numberDiff line numberDiff line change
@@ -9,42 +9,52 @@ Currently, PHPExcel supports the following File Types for Reading:
99

1010
### Excel5
1111

12-
The Microsoft Excel™ Binary file format (BIFF5 and BIFF8) is a binary file format that was used by Microsoft Excel™ between versions 95 and 2003. The format is supported (to various extents) by most spreadsheet programs. BIFF files normally have an extension of .xls. Documentation describing the format can be found online at [http://msdn.microsoft.com/en-us/library/cc313154(v=office.12).aspx][2] or from [http://download.microsoft.com/download/2/4/8/24862317-78F0-4C4B-B355-C7B2C1D997DB/[MS-XLS].pdf][3] (as a downloadable PDF).
12+
The Microsoft Excel™ Binary file format (BIFF5 and BIFF8) is a binary file format that was used by Microsoft Excel™ between versions 95 and 2003. The format is supported (to various extents) by most spreadsheet programs. BIFF files normally have an extension of .xls. Documentation describing the format can be found online at [http://msdn.microsoft.com/en-us/library/cc313154(v=office.12).aspx][1] or from [http://download.microsoft.com/download/2/4/8/24862317-78F0-4C4B-B355-C7B2C1D997DB/[MS-XLS].pdf][2] (as a downloadable PDF).
1313

1414
### Excel2003XML
1515

16-
Microsoft Excel™ 2003 included options for a file format called SpreadsheetML. This file is a zipped XML document. It is not very common, but its core features are supported. Documentation for the format can be found at [http://msdn.microsoft.com/en-us/library/aa140066%28office.10%29.aspx][4] though it’s sadly rather sparse in its detail.
16+
Microsoft Excel™ 2003 included options for a file format called SpreadsheetML. This file is a zipped XML document. It is not very common, but its core features are supported. Documentation for the format can be found at [http://msdn.microsoft.com/en-us/library/aa140066%28office.10%29.aspx][3] though it’s sadly rather sparse in its detail.
1717

1818
### Excel2007
1919

20-
Microsoft Excel™ 2007 shipped with a new file format, namely Microsoft Office Open XML SpreadsheetML, and Excel 2010 extended this still further with its new features such as sparklines. These files typically have an extension of .xlsx. This format is based around a zipped collection of eXtensible Markup Language (XML) files. Microsoft Office Open XML SpreadsheetML is mostly standardized in ECMA 376 ([http://www.ecma-international.org/news/TC45_current_work/TC45_available_docs.htm][5]) and ISO 29500.
20+
Microsoft Excel™ 2007 shipped with a new file format, namely Microsoft Office Open XML SpreadsheetML, and Excel 2010 extended this still further with its new features such as sparklines. These files typically have an extension of .xlsx. This format is based around a zipped collection of eXtensible Markup Language (XML) files. Microsoft Office Open XML SpreadsheetML is mostly standardized in ECMA 376 ([http://www.ecma-international.org/news/TC45_current_work/TC45_available_docs.htm][4]) and ISO 29500.
2121

2222
### OOCalc
2323

24-
aka Open Document Format (ODF) or OASIS, this is the OpenOffice.org XML File Format for spreadsheets. It comprises a zip archive including several components all of which are text files, most of these with markup in the eXtensible Markup Language (XML). It is the standard file format for OpenOffice.org Calc and StarCalc, and files typically have an extension of .ods. The published specification for the file format is available from the OASIS Open Office XML Format Technical Committee web page ([http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office#technical][6]). Other information is available from the OpenOffice.org XML File Format web page ([http://xml.openoffice.org/general.html][7]), part of the OpenOffice.org project.
24+
aka Open Document Format (ODF) or OASIS, this is the OpenOffice.org XML File Format for spreadsheets. It comprises a zip archive including several components all of which are text files, most of these with markup in the eXtensible Markup Language (XML). It is the standard file format for OpenOffice.org Calc and StarCalc, and files typically have an extension of .ods. The published specification for the file format is available from the OASIS Open Office XML Format Technical Committee web page ([http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office#technical][5]). Other information is available from the OpenOffice.org XML File Format web page ([http://xml.openoffice.org/general.html][6]), part of the OpenOffice.org project.
2525

2626
### SYLK
2727

2828
This is the Microsoft Multiplan Symbolic Link Interchange (SYLK) file format. Multiplan was a predecessor to Microsoft Excel™. Files normally have an extension of .slk. While not common, there are still a few applications that generate SYLK files as a cross-platform option, because (despite being limited to a single worksheet) it is a simple format to implement, and supports some basic data and cell formatting options (unlike CSV files).
2929

3030
### Gnumeric
3131

32-
The Gnumeric file format is used by the Gnome Gnumeric spreadsheet application, and typically files have an extension of .gnumeric. The file contents are stored using eXtensible Markup Language (XML) markup, and the file is then compressed using the GNU project's gzip compression library. [http://projects.gnome.org/gnumeric/doc/file-format-gnumeric.shtml][8]
32+
The Gnumeric file format is used by the Gnome Gnumeric spreadsheet application, and typically files have an extension of .gnumeric. The file contents are stored using eXtensible Markup Language (XML) markup, and the file is then compressed using the GNU project's gzip compression library. [http://projects.gnome.org/gnumeric/doc/file-format-gnumeric.shtml][7]
3333

3434
### CSV
3535

36-
Comma Separated Value (CSV) file format is a common structuring strategy for text format files. In CSV flies, each line in the file represents a row of data and (within each line of the file) the different data fields (or columns) are separated from one another using a comma (","). If a data field contains a comma, then it should be enclosed (typically in quotation marks ("). Sometimes tabs "\t" or the pipe symbol ("|") are used as separators instead of a comma. Because CSV is a text-only format, it doesn't support any data formatting options.
36+
Comma Separated Value (CSV) file format is a common structuring strategy for text format files. In CSV flies, each line in the file represents a row of data and (within each line of the file) the different data fields (or columns) are separated from one another using a comma (","). If a data field contains a comma, then it should be enclosed (typically in quotation marks ("). Sometimes tabs "\t", or the pipe symbol ("|"), or a semi-colon (";") are used as separators instead of a comma, although other symbols can be used. Because CSV is a text-only format, it doesn't support any data formatting options.
3737

38-
### HTML
38+
"CSV" is not a single, well-defined format (although see RFC 4180 for one definition that is commonly used). Rather, in practice the term "CSV" refers to any file that:
39+
- is plain text using a character set such as ASCII, Unicode, EBCDIC, or Shift JIS,
40+
- consists of records (typically one record per line),
41+
- with the records divided into fields separated by delimiters (typically a single reserved character such as comma, semicolon, or tab,
42+
- where every record has the same sequence of fields.
43+
44+
Within these general constraints, many variations are in use. Therefore "CSV" files are not entirely portable. Nevertheless, the variations are fairly small, and many implementations allow users to glance at the file (which is feasible because it is plain text), and then specify the delimiter character(s), quoting rules, etc.
3945

46+
**Warning:** Microsoft Excel™ will open .csv files, but depending on the system's regional settings, it may expect a semicolon as a separator instead of a comma, since in some languages the comma is used as the decimal separator. Also, many regional versions of Excel will not be able to deal with Unicode characters in a CSV file.
47+
48+
### HTML
4049

50+
HyperText Markup Language (HTML) is the main markup language for creating web pages and other information that can be displayed in a web browser. Files typically have an extension of .html or .htm. HTML markup provides a means to create structured documents by denoting structural semantics for text such as headings, paragraphs, lists, links, quotes and other items. Since 1996, the HTML specifications have been maintained, with input from commercial software vendors, by the World Wide Web Consortium (W3C). However, in 2000, HTML also became an international standard (ISO/IEC 15445:2000). HTML 4.01 was published in late 1999, with further errata published through 2001. In 2004 development began on HTML5 in the Web Hypertext Application Technology Working Group (WHATWG), which became a joint deliverable with the W3C in 2008.
4151

4252

4353

44-
[2]: http://msdn.microsoft.com/en-us/library/cc313154(v=office.12).aspx
45-
[3]: http://download.microsoft.com/download/2/4/8/24862317-78F0-4C4B-B355-C7B2C1D997DB/%5bMS-XLS%5d.pdf
46-
[4]: http://msdn.microsoft.com/en-us/library/aa140066%28office.10%29.aspx
47-
[5]: http://www.ecma-international.org/news/TC45_current_work/TC45_available_docs.htm
48-
[6]: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office
49-
[7]: http://xml.openoffice.org/general.html
50-
[8]: http://projects.gnome.org/gnumeric/doc/file-format-gnumeric.shtml
54+
[1]: http://msdn.microsoft.com/en-us/library/cc313154(v=office.12).aspx
55+
[2]: http://download.microsoft.com/download/2/4/8/24862317-78F0-4C4B-B355-C7B2C1D997DB/%5bMS-XLS%5d.pdf
56+
[3]: http://msdn.microsoft.com/en-us/library/aa140066%28office.10%29.aspx
57+
[4]: http://www.ecma-international.org/news/TC45_current_work/TC45_available_docs.htm
58+
[5]: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office
59+
[6]: http://xml.openoffice.org/general.html
60+
[7]: http://projects.gnome.org/gnumeric/doc/file-format-gnumeric.shtml
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# PHPExcel User Documentation – Reading Spreadsheet Files
2+
3+
4+
## Loading a Spreadsheet File
5+
6+
The simplest way to load a workbook file is to let PHPExcel's IO Factory identify the file type and load it, calling the static load() method of the PHPExcel_IOFactory class.
7+
8+
```php
9+
$inputFileName = './sampleData/example1.xls';
10+
11+
/** Load $inputFileName to a PHPExcel Object **/
12+
$objPHPExcel = PHPExcel_IOFactory::load($inputFileName);
13+
```
14+
> See Examples/Reader/exampleReader01.php for a working example of this code.
15+
16+
The load() method will attempt to identify the file type, and instantiate a loader for that file type; using it to load the file and store the data and any formatting in a PHPExcel object.
17+
18+
The method makes an initial guess at the loader to instantiate based on the file extension; but will test the file before actually executing the load: so if (for example) the file is actually a CSV file or conatins HTML markup, but that has been given a .xls extension (quite a common practise), it will reject the Excel5 loader that it would normally use for a .xls file; and test the file using the other loaders until it finds the appropriate loader, and then use that to read the file.
19+
20+
While easy to implement in your code, and you don't need to worry about the file type; this isn't the most efficient method to load a file; and it lacks the flexibility to configure the loader in any way before actually reading the file into a PHPExcel object.
21+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# PHPExcel User Documentation – Reading Spreadsheet Files
2+
3+
4+
## Creating a Reader and Loading a Spreadsheet File
5+
6+
If you know the file type of the spreadsheet file that you need to load, you can instantiate a new reader object for that file type, then use the reader's load() method to read the file to a PHPExcel object. It is possible to instantiate the reader objects for each of the different supported filetype by name. However, you may get unpredictable results if the file isn't of the right type (e.g. it is a CSV with an extension of .xls), although this type of exception should normally be trapped.
7+
8+
```php
9+
$inputFileName = './sampleData/example1.xls';
10+
11+
/** Create a new Excel5 Reader **/
12+
$objReader = new PHPExcel_Reader_Excel5();
13+
// $objReader = new PHPExcel_Reader_Excel2007();
14+
// $objReader = new PHPExcel_Reader_Excel2003XML();
15+
// $objReader = new PHPExcel_Reader_OOCalc();
16+
// $objReader = new PHPExcel_Reader_SYLK();
17+
// $objReader = new PHPExcel_Reader_Gnumeric();
18+
// $objReader = new PHPExcel_Reader_CSV();
19+
/** Load $inputFileName to a PHPExcel Object **/
20+
$objPHPExcel = $objReader->load($inputFileName);
21+
```
22+
> See Examples/Reader/exampleReader02.php for a working example of this code.
23+
24+
Alternatively, you can use the IO Factory's createReader() method to instantiate the reader object for you, simply telling it the file type of the reader that you want instantiating.
25+
26+
```php
27+
$inputFileType = 'Excel5';
28+
// $inputFileType = 'Excel2007';
29+
// $inputFileType = 'Excel2003XML';
30+
// $inputFileType = 'OOCalc';
31+
// $inputFileType = 'SYLK';
32+
// $inputFileType = 'Gnumeric';
33+
// $inputFileType = 'CSV';
34+
$inputFileName = './sampleData/example1.xls';
35+
36+
/** Create a new Reader of the type defined in $inputFileType **/
37+
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
38+
/** Load $inputFileName to a PHPExcel Object **/
39+
$objPHPExcel = $objReader->load($inputFileName);
40+
```
41+
> See Examples/Reader/exampleReader03.php for a working example of this code.
42+
43+
If you're uncertain of the filetype, you can use the IO Factory's identify() method to identify the reader that you need, before using the createReader() method to instantiate the reader object.
44+
45+
```php
46+
$inputFileName = './sampleData/example1.xls';
47+
48+
/** Identify the type of $inputFileName **/
49+
$inputFileType = PHPExcel_IOFactory::identify($inputFileName);
50+
/** Create a new Reader of the type that has been identified **/
51+
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
52+
/** Load $inputFileName to a PHPExcel Object **/
53+
$objPHPExcel = $objReader->load($inputFileName);
54+
```
55+
> See Examples/Reader/exampleReader04.php for a working example of this code.
56+

0 commit comments

Comments
 (0)