Reading jpeg data

Mike Hale · Post by **Mike Hale** » Sun Jan 22, 2006 11:41 am

This script is really not even beta, but I have run into some problems and I didn't know where else to post it.

The good news is I'm sure that this can be done in JS. And I think once I have this working it shouldn't be too hard to read tiffs as well.

But I have hit several hurdles and need help.

1 The jpeg part of the data in a jpeg file uses the Motorola byte order, but the metadata part can be either Motorola or Intel. How can I deal with the byte order with out a bunch of if statemants?

2 Some of the fields in the IFD lookup values. I don't know how to code the lookup in JS. For example the second part of all subIFD is a lookup for the data that subIFD contains. Here are the values with thier meaning

1 = BYTE An 8-bit unsigned integer.,
2 = ASCII An 8-bit byte containing one 7-bit ASCII code. The final byte is terminated with NULL.,
3 = SHORT A 16-bit (2 -byte) unsigned integer,
4 = LONG A 32-bit (4 -byte) unsigned integer,
5 = RATIONAL Two LONGs. The first LONG is the numerator and the second LONG expresses the
denominator.,
7 = UNDEFINED An 8-bit byte that can take any value depending on the field definition,
9 = SLONG A 32-bit (4 -byte) signed integer (2's complement notation),
10 = SRATIONAL Two SLONGs. The first SLONG is the numerator and the second SLONG is the
denominator.

3 The last and biggest problem is some of the fields are double words. Xbytor gave me function to read bytes and words in Motorola byte order and I changed one to read Intel words. I not sure how to change those to read double words. The first subIFD in the test file I'm using is for the description. It's in Intel byte order so the tag is 0E01H, the data is 0200H which means it's ASCII, the length is 08000000H which means the string is 7 bytes long with a 00H termination, and the offset is 9E000000H which means the data is 158 bytes from the start of the header tag.

Here is what I have so far

Code: Select allFile.prototype.readByte = function() {// from Xbytor as well as the basic outline
return this.read(1).charCodeAt(0);
};
File.prototype.readInt16 = function() {
var self = this;
var hi = this.readByte();
var lo = this.readByte();
return (hi << 8) + lo;
};
File.prototype.readIntelInt16 = function() {
var self = this;
var lo = this.readByte();
var hi = this.readByte();
return (hi << 8) + lo;
};
jpegInfo = function(file) {

getUnits = function(seek){
file.seek(seek,0);
var units = file.readByte();
if (units == 0) {
file.Resolution = undefined;
} else if (units == 1) {
units = " dpi";
} else if (units == 2) {
units = " dpcm";
}
if(units != 0) {
file.Resolution = file.readInt16() + units;
}
}

getNextTag = function(){
tagPos = nextTag;
file.seek(nextTag, 0);
tag = file.readInt16();
tagLength = file.readInt16();
nextTag = nextTag+2+tagLength;
return tag;
}

processEXIF = function() {
file.seek(tagPos + 10, 0);//byte order for this section
var EXIFOffset = tagPos + 10;//also start of directory, tags offset data from here
if(file.readInt16() == 18761){;//Intel?
   var endian = "Intel";
   }else{
   var endian = "Motorola";
   }
file.seek(EXIFOffset+8,0);
numberOfSubIFD = file.readIntelInt16();
var subIFDTags = new Array();
var subIFDnumber = new Array();
subIFDTags[0] = file.readIntelInt16();
subIFDTags.push(file.readIntelInt16());
subIFDTags.push(file.readIntelInt16());//this is a lookup value. 2 means ASCII how to handle
subIFDTags.push(file.readIntelInt16());//how to read double word? 00080000H in test jpeg
alert(endian+","+numberOfSubIFD+","+subIFDTags[0]+","+subIFDTags[1]);//Intel,12,270,2 with my test file
   }

var tagPos = 0;
file.open("r");
file.encoding = 'BINARY';
file.seek(tagPos, 0);// first tag, per the spec
var tag = file.readInt16();
if(tag!=65496){//if not FFD8 not valid jpeg
   return;
}
tagPos = tagPos + 2;
file.seek(tagPos, 0);// sceond tag, per the spec
var tag = file.readInt16()
if(tag!=65504){//if not FFEO not valid jpeg
   return
}
var tagLength = file.readInt16();
var nextTag = tagPos+2+tagLength;
var unitsPos = tagPos+2+9;//unit offset from tag
var units = getUnits(unitsPos);
getNextTag();//next tag should be EXIF APP1 tag if present
if(tag == 65505){//it's there so process
processEXIF();
}

while(tag !=65472){
getNextTag();
}
var pixelsPos = tagPos+2+3;//unit offset from tag
file.bitDepth = file.readByte();
file.HPixels = file.readInt16();
file.WPixels = file.readInt16();
file.components = file.readByte();
file.seek(2,2);
if(file.readInt16() == 65497){
file.isValid = true;
}

return file;
};

function test () {
var fileRef = new File("/c/angie.jpg");
var newRef = jpegInfo(fileRef)
alert(newRef.Resolution);//240 dpi with my test file
alert(newRef.bitDepth); // 8
alert(newRef.HPixels); // 926
alert(newRef.WPixels); // 1179
alert(newRef.components);// 3
alert(newRef.isValid); // true
};
test();

Mike Hale · Post by **Mike Hale** » Sun Jan 29, 2006 10:49 am

I just recently saw a post by Xbytor on the Abode site advising Stan that he shouldn't try to code his own image reader. I wish that I had seen that before I started on this project.

I can see why, tiff and jpeg files are hard enough. They have been around long enough that they more or less have a standard format. But raw files are a nightmare.

But I'm learning how to do a lot of things in Javascript that I can apply to other scripts so I'm going to keep at it until it becomes a chore.

For the most part I have solved the 3 problems in my last post. Here is how I am handling the tag lookup.

Code: Select allExif = new Object()
Exif.tag = function(arg){
arg = arg.toString();
return Exif.tags.getByName(arg);
};
var tag = function(){
tag.name;
tag.dataSize;
tag.hex;
tag.value;}

var _214 = new tag;{_214.name = "214";_214.hex = "D6";_214.dataSize = 1;};
var _215 = new tag;{_215.name = "215";_215.hex = "D7";_214.dataSize = 2;};
var _216 = new tag;{_216.name = "216";_216.hex = "D8";_214.dataSize = 4;};

Exif.Tags = [_214,_215,_216];
Exif.tags = Exif.Tags;

Exif.tags.getByIndex = function(i){
return Exif.tags;
};

Exif.tags.getByName = function(string){
for(i=0;i<Exif.tags.length;i++){
   res = Exif.tags.getByIndex(i);
   if(res.name = string){
      return Exif.tags;
   };
};
};

Exif.currentTag = Exif.tags.getByName("214");
tagRef = Exif.currentTag;
alert(Exif.tag("214").name);
alert("name: "+Exif.currentTag.name+"\n"+"dataSize: "+tagRef.dataSize+"\n"+"hex: "+tagRef.hex);

I think that will work, but if someone has a better way let me know. I haven't been able to test this with a file yet.

Because the data size can vary I need some kind of pointer system so the script will know where and how much to read. In the sample code of the first that is hardwired in as well as the tag order. I have found that I can count on all the tags being in the file or that they will be in the same order.

Mike Hale · Post by **Mike Hale** » Sun Jul 16, 2006 3:47 am

This did become a chore. I could find very little information on the different jpeg formats. Even less on 'private' tif tags and none at all on raw formats so I gave this up as not worth the effort.

But a user at the Abode forum posted a question about extracting data from a tiff file. His tif files are images captured by a scanning electron microscope. The files have data about the image which Photoshop discards when saving. He needed a way to save that data.

I created a script that reads the data from the tif file and saves it as a sidecar file. It not very useful as is if you don't have his brand of SEM, but it could be adapted to read other data. It's more a proof of concept that data can be extracted from a file which Photoshop doesn't provide access.

Here is the script plus a sample output.

Code: Select all// =================================================================
// Robert Stucky's hashtable
// =================================================================
Hashtable = function( caseSensitive ) {
this.caseSensitive = caseSensitive || false;
this.keyList = new Array( 0 );
this.keyIndex = {};
this.processedKeyList = new Array( 0 );
this.count = 0;
}
Hashtable.prototype._getKey = function( name ) {
return this.caseSensitive ? new String( "htx%_" + name ) : new String( "htx%_" + name ).toLowerCase();
}
Hashtable.prototype.clone = function() {
var t = new Hashtable( this.caseSensitive );
for ( var i = 0; i < this.keyList.length; i++ ) {
t.putValue( this.keyList[ i ], this.getValue( this.keyList[ i ] ) );
}
return t;
}
Hashtable.prototype.merge = function( ht ) {
for ( var i = 0; i < ht.keyList.length; i++ ) {
this.putValue( ht.keyList[ i ], ht.getValue( ht.keyList[ i ] ) );
}
}
Hashtable.prototype.isUniqueKey = function( key ) {
return this.keyIndex[ key ] == undefined;
}
Hashtable.prototype.putValue = function( name, value ) {
if ( ( value instanceof Boolean ) || ( typeof value == "boolean" ) ) {
value = value ? "true" : "false";
}
var key = this._getKey( name );
this[ key ] = value;
if ( this.isUniqueKey( key ) ) {
this.keyIndex[ key ] = this.keyList.length;
this.keyList.push( name );
this.count++;
} else {
this.keyList[ this.keyIndex[ key ] ] = name;
}
}
Hashtable.prototype.keys = function() {
return this.keyList;
}
Hashtable.prototype.getValue = function( name ) {
var val = this[ this._getKey( name ) ];
if ( ( val == "true" ) || ( val == "false" ) ) {
val = eval( val );
}
return val;
}
Hashtable.prototype.getCount = function() {
return this.count;
}
Hashtable.prototype.remove = function( name ) {
if ( this.getValue( name ) ) {
var key = this._getKey( name );
var idx = this.keyIndex[ key ];
this.keyIndex[ key ] = undefined;
this[ key ] = undefined;
this.keyList.splice( idx, 1 );
this.count--;
}
}
Hashtable.prototype.toString = function() {
return "[object Hashtable] " + this.keyList;
}
// ========================================================================
// These File functions are apapted from functions
// in Xbytor's Stream.js bb/viewtopic.php?t=369
// ET is the byte order. Use 'BE' for Mac. The default byte order is Intel.
// Function names come from Tiff file format data types.
// ========================================================================
File.prototype.readByte = function() {//aka unsigned byte
return this.read(1).charCodeAt(0);
};

File.prototype.readSByte = function(){//signed byte -128 to 127
var sb = this.readByte();
if(sb>0x7F){
sb= 0xFFFFFF00^sb;
};
};

File.prototype.readShort = function(ET) { //aka unsigned short, word = 2 bytes
var self = this;
var b1 = this.readByte();
var b2 = this.readByte();
if(ET == "BE"){
return (b1 << 8) + b2;
}else{
return (b2 << 8) + b1;
};
};

File.prototype.readSShort = function(ET){//signed short -32768 to 32767 = 2 bytes
var ss = this.readShort();
if(ss>0x7FFF){
ss = 0xFFFF0000^ss;
};
return ss;
};

File.prototype.readLong = function(ET) {//aka unsigned long = 4 bytes
var self = this;
var s1 = self.readShort(ET);
var s2 = self.readShort(ET);
if(ET == "BE"){
return (s1 << 16) + s2;
}else{
return (s2 << 16) + s1;
}
};

File.prototype.readSLong = function(ET){//signed long -2147483648 to 2147483647 = 4 bytes
var sl = this.readLong();
if(sl>0x7FFFFFFF){
sl = 0x00000000^sl;
};
};
//==============================================================================================
// This gets all the tags in a tif file and stores them for later
// Tests for valid Tiff format
// getTifTags(File Object)
//==============================================================================================
function getTifTags(fileRef){
var tags = new Hashtable();
var tag = 0;
var dataType = 0
var dataLength = 0
var tagData = 0
var pointer = 0;

fileRef.open( "r" );
fileRef.encoding = 'BINARY'
var ET = fileRef.readShort();// First two bytes of tif file denotes encoding
if(ET != 0x4949 && ET != 0x4D4D ){// Needs to be "II" or "MM" to be vaild
alert("Not a valid tiff file");
return;
}else{
if(ET == 0x4D4D){
ET = 'BE';
};
};
if(fileRef.readShort(ET) != 42){// Next two byte must = 0x002A to be vaild
alert('Not a valid tiff file');
return;
};
pointer = fileRef.readLong(ET);//get offset to first IFD
fileRef.seek(pointer,0);//move to IFD pointed to
var numberOfTags = fileRef.readShort(ET);
var tagsToRead = numberOfTags;
tags.putValue('filename',fileRef.name);
tags.putValue('Number of tags',numberOfTags);// for sanity check against tagDeatails.getCount
while(tagsToRead >= 1 ){
var tagDetails = new Hashtable();
tag = fileRef.readShort(ET);
tagDetails.putValue('dataType',fileRef.readShort(ET));
tagDetails.putValue('dataLength',fileRef.readLong(ET));
tagDetails.putValue('tagData',fileRef.readLong(ET));
tags.putValue(tag,tagDetails);
tagsToRead = tagsToRead - 1;
};//end search
fileRef.close;
return tags;
};
// ================================================================================
// read ASCII data from tiff file. No file checking is done
// as it should only be called after file confirmed as tif
// getASCIIData(File Object, start of data as Number, Length to read as Number)
// =================================================================================
function getASCIIData(fileRef, pointer, dataLength){
fileRef.open('r');
fileRef.seek(pointer);
var str = '';
while (fileRef.tell() < pointer + dataLength){
string = fileRef.readln();
str = str + string+'\n';
};
fileRef.close;
return str;
};
// =============================================================
// throwError from Xbytor
// =============================================================
throwError = function(e) {
throw e;
};

// =============================================================
// writeDat(File Object, Data as String);
// =============================================================
function writeDat(file, string){
file.open('w');
file.writeln('Created from ' + file.name.match(/([^\.]+)/)[1] + '.tif on ' + new Date());
file.writeln(string);
file.close();
file = null;
};
// =============================================================
// end of core functions
// =============================================================

// =============================================================
// End of Functions
// =============================================================

var fileRef = File.openDialog("Select an Tiff File to Read", "TIFF Files: *.tif,*.TIF");
if(fileRef == null){throwError('');};// User knows they canceled dialog

var fileTags = getTifTags(fileRef);
if(fileTags == undefined){throwError('Something went wrong');};

var FEITag = fileTags.getValue(34682);// tag number for FEI Scanning Eletron Microscope metadata
if(FEITag == undefined){throwError('Tag not found');};

var str = getASCIIData(fileRef, FEITag.getValue('tagData'), FEITag.getValue('dataLength'));
if(str == undefined )
{throwError('Data not found');};

var datFile = new File( fileRef.path+'/'+fileRef.name.match(/([^\.]+)/)[1]+".dat" );
writeDat(datFile, str);

Code: Select allCreated from 1016-2_kaol.tif on Sat Jul 15 2006 23:27:57 GMT-0400

[User]
User=supervisor
Usertext=MUN-IIC SEM/MLA
UsertextUnicode=4D0055004E002D004900490043002000530045004D002F004D004C004100
Date=03/22/2006
Time=11:28AM

[SYSTEM]
DNumber=D7989
Software=2.3
Source=W-Tetrode
Column=W-ESEM
FinalLens=W-ESEM
Chamber=Intermediate
Stage=Stage100x100
Pump=TMP
ESEM=yes
Aperture=Fixed
Scan=dispb1.0
Acq=ViperQuad1.0
EucWD=0.01

[Beam]
HV=15000
Spot=5
StigmatorX=-0.474284
StigmatorY=0.28995
BeamShiftX=-4.46332e-006
BeamShiftY=6.97833e-006
ScanRotation=0
ImageMode=HR

[Scan]
InternalScan=true
Dwelltime=1e-005
FrameTime=9.46746
PixelHeight=1.22039e-007
PixelWidth=1.22039e-007
Horfieldsize=0.000124968
Verfieldsize=0.000107883
Average=0
Integrate=0

[Stage]
StageX=0.00410234
StageY=0.0285764
StageZ=0.0110007
StageR=0.176366
StageT=-0.000855211
Spectilt=
WorkingDistance=0.0143798

[Vacuum]
UserMode=Highvacuum
CHPressure=61
Gas=

[Specimen]
Temperature=

[Detectors]
Number=1
Name=Etd

[Etd]
Contrast=23.0926
Brightness=47.33
Signal=SE
Mix=100
State=true
Active=true
Mode=0
Grid=300
ContrastDB=50.7492
BrightnessDB=47.33
Setting=300V
ContrastSpotsizeAlignment=20
MinimumDetectorDwellTime=1e-007
[PrivateFei]
BitShift=8
Databarheight=59
DataBarSelected=HV Det Mag WD HFW Spot Mode Pressure MicronBar Label
TimeOfCreation=22.03.2006 11:34:13

Larry Ligon · Post by **Larry Ligon** » Tue Jul 18, 2006 7:31 am

Mike,

Check out Independent JPEG Group: http://www.ijg.org/

Mike Hale · Post by **Mike Hale** » Tue Jul 18, 2006 12:14 pm

Thanks Larry,

Yes, I have looked there. The problem is that JPEG is a compression standard only and the ISO group in charge of the standard doesn't deal with file format.

The only standard for jpeg file formats I could find is the JPEG JFIF. That is the format Photoshop uses when saving jpg file using the save or save as menu.

But the file created with save for web doesn't conform to the JFIF standard nor do some files created by some digital point and shoots. I think at one point I have 3 different formats that didn't follow the JFIF standard.

But I'm glad that at least someone is reading this

Mike