Differences between two versions of the hebrew text of Genesis
In the table on the left hand side all differences between two versions of Genesis are shown.
The vesions have been retrieved from two different sources on the web: the website of
Mechon Mamre and the website of the
J. Alan Groves Center.
The full hebrew text of the 50 chapters of Genesis has been retrieved from both sites and compared
chapter by chapter, verse by verse and word by word.
The list of differences on the left is generated by comparing both texts
after removal of several pecularities in the respective textual representations. The whole proces
has been automated and implemented in a small python script.
In short, only differences in consonants and vowels are shown.
Every line in the table displays a single difference, where
- Column 1 (verse) gives the verse number where the difference occurs
- Column 2 (Mechon Mamre) shows the word where the difference occurs as it appears in the Mechon Mamre text
- Column 3 (Leningrad Codex) shows the word where the difference occurs as it appears in the Westminster-Leningrad text
- Column 4 (left) and 5 (right) show the very difference when it can be represented as a unicode string.
For larger differences or differences caused by html-markings these colums remain blank. Column 4 (left) refers to
the word in column 2 (Mechon Mamre). Column 5 (right) refers to the word in column 3 (Leningrad Codex).
Removed are cantillation marks and modern formatting characters that do not appear
in old handwritten sources. Specifically we removed:
In addition we performed some conversions to make the texts comparable:
- All cantillation marks
- All punctuation marks, such as: ', . ; :'
- The dubbel-meqaf '--' joining words, frequently occurring in the Mechon Mamre text.
- Html paragraph markings <p></p>, because they are already present in the hebrew text (Peh).
We kept html letter size markings <big></big> and <small></small> as they also
appear in several hand written source texts.
- The slash '/' marking syllabi
- The paseq '|', sometimes separating words
- All unicode directional characters, because they are not needed in pure hebrew text
- The old fashioned charset used by Mechon Mamre (Windows-1255) converted to unicode.
- The old fashioned html combination ‍+holam (unicode_05b9) replaced by the single
character unicode_05BA, which is the specific unicode-holam for this purpose.
- Reversal shindot-dagesh <--> dagesh-shindot
- Various formats for ketiv/qere have been unified to a representation where every qere word is preceded
by a quotation mark ("). This was done to simplify text parsing and word by word comparison. The
Leningrad codex contains more ketiv/qere variants than the Mechon Mamre text, where the ketiv is
often just annotated with the qere vowels.