3 bookmaker.py is a helper for optimizing PDFs for the production of small self-printed, self-bound physical books. Towards this goal it offers various PDF manipulation options that may also be used indepéndently and for other purposes.
6 OVERVIEW OF TARGET USAGE:
8 By cropping with -c and studying the results, define the areas of the input PDF's pages you want visible. Then, with--nup4, map those areas onto 4 input pages per 1 output page, arranged in such a way that double-sided print-out of those output pages can be cut, folded, and bound (helped by addition of stencils for small incisions to carry rubber bands or the like) into a small A6 book. Each unit of 8 pages from the input PDF is mapped by --nup4 onto two pages representing two sides of a (no-tumble-duplex-printed) A4 paper:
10 +-------=-------+ __________________
11 (front) (back) | 4 | 1 = 2 | 3 | 4 /=|===|============
12 +-------=-------+ ==> +-------=-------+ ===> _/|\_ v >=|===|============
13 | 4 | 1 = 2 | 3 | / | \_ \=|===|============
14 |-------=-------| +-------=-------+ 1-> | 2 | 3 | | \ / <- cut out!
15 | 8 | 5 = 6 | 7 | ==> | 8 | 5 = 6 | 7 | | _/ \_ | | \ |
16 +-------=-------+ +-------=-------+ |/ \| | \| (p. 5)
18 To turn this paper into a small 8-pages book, first cut it into two A5 papers along its horizontal middle. Fold both A5's by their vertical middles, with pages 2-3 and 7-6 on the folds' insides. You now have two 4-page A6 "books" of pages 1-4 and pages 5-8. Fold both closed and (counter-intuitively) stack the second one on top of the first one (creating a temporary page order of 5,6,7,8,1,2,3,4). This reveals a small stencil on the top left of page 5 – cut it out, with all other pages folded and aligned under it, creating a small notch in the upper "inner" corner of all pages. Turn around the stack to find a mirror stencil on the bottom and repeat the cutting. Each page now has cuts on top and bottom of its inner margins into which a rubber band can be hooked, or through which a string may be looped and tied, to bind the page's inner margins into a kind of book spine. You may now swap the order of the 4-page books back into a proper final page order (of 1,2,3,4,5,6,7,8) and repeat the whole process for each further --nup4 output paper.
22 Concatenate two PDFs A.pdf and B.pdf to COMBINED.pdf:
23 bookmaker.py --input_file A.pdf --input_file B.pdf --output_file COMBINED.pdf
25 Produce OUTPUT.pdf containing all pages of (inclusive) page number range 3-7 from INPUT.pdf:
26 bookmaker.py -i INPUT.pdf --page_range 3-7 -o OUTPUT.pdf
28 Produce COMBINED.pdf from A.pdf's first 7 pages, B.pdf's pages except its first two, and all pages of C.pdf:
29 bookmaker.py -i A.pdf -p start-7 -i B.pdf -p 3-end -i C.pdf -o COMBINED.pdf
31 Crop each page 5cm from the left, 10cm from the bottom, 2cm from the right, and 0cm from the top:
32 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --crops "5,10,2,0"
34 Include all pages from INPUT.pdf, but only crop pages 10-20 by 5cm each from bottom and top:
35 bookmaker.py -i INPUT.pdf -c "10-20:0,5,0,5" -o OUTPUT.pdf
37 Same crops for pages 10-20, but also crop all pages 30 and later by 3cm each from left and right:
38 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -c "10-20:0,5,0,5" -c "30-end:3,0,3,0"
40 Rotate by 90° pages 3, 5, 7; rotate page 7 once more by 90% (i.e. 180° in total):
41 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --rotate 3 -r 5 -r 7 -r 7
43 Initially declare 5cm crop from the left and 1cm crop from right, but alternate direction between even and odd pages:
44 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -c "5,0,1,0" --symmetry
46 Quarter each OUTPUT.pdf page to carry 4 pages from INPUT.pdf, draw stencils into inner margins for cuts to carry binding strings:
47 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --nup4
49 Same --nup4, but define a printable-region margin of 1.3cm to limit the space for the INPUT.pdf pages in OUTPUT.pdf page quarters:
50 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -n --print_margin 1.3
52 Same --nup4, but draw lines marking printable-region margins, page quarters, spine margins:
53 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -n --analyze
57 For arguments like -p, page numbers are assumed to start with 1 (not 0, which is treated as an invalid page number value).
59 The target page shape so far is assumed to be A4 in portrait orientation; bookmaker.py normalizes all pages to this format before applying crops, and removes any source PDF /Rotate commands (for their production of landscape orientations).
65 from collections import namedtuple
67 def handled_error_exit(msg):
68 print(f"ERROR: {msg}")
74 handled_error_exit("Can't run at all without pypdf installed.")
76 # some general paper geometry constants
77 POINTS_PER_CM = 10 * 72 / 25.4
78 A4_WIDTH = 21 * POINTS_PER_CM
79 A4_HEIGHT = 29.7 * POINTS_PER_CM
80 A4 = (A4_WIDTH, A4_HEIGHT)
82 # constants specifically for --nup4
83 A4_HALF_WIDTH = A4_WIDTH / 2
84 A4_HALF_HEIGHT = A4_HEIGHT / 2
85 CUT_DEPTH = 1.95 * POINTS_PER_CM
86 CUT_WIDTH = 1.05 * POINTS_PER_CM
87 MIDDLE_POINT_DEPTH = 0.4 * POINTS_PER_CM
88 INNER_SPINE_MARGIN_PER_PAGE = 1 * POINTS_PER_CM
89 QUARTER_SCALE_FACTOR = 0.5
90 PAGE_ORDER_FOR_NUP4 = (3,0,7,4,1,2,5,6)
95 def __init__(self, left_cm=0, bottom_cm=0, right_cm=0, top_cm=0):
96 self.left_cm = left_cm
97 self.bottom_cm = bottom_cm
98 self.right_cm = right_cm
100 self.left = float(self.left_cm) * POINTS_PER_CM
101 self.bottom = float(self.bottom_cm) * POINTS_PER_CM
102 self.right = float(self.right_cm) * POINTS_PER_CM
103 self.top = float(self.top_cm) * POINTS_PER_CM
104 zoom_horizontal = A4_WIDTH / (A4_WIDTH - self.left - self.right)
105 zoom_vertical = A4_HEIGHT / (A4_HEIGHT - self.bottom - self.top)
106 if (zoom_horizontal > 1 and zoom_vertical < 1) or (zoom_horizontal < 1 and zoom_vertical > 1):
107 raise HandledException("-c: crops would create opposing zoom directions")
108 elif zoom_horizontal + zoom_vertical > 2:
109 self.zoom = min(zoom_horizontal, zoom_vertical)
111 self.zoom = max(zoom_horizontal, zoom_vertical)
114 return str(vars(self))
117 def format_in_cm(self):
118 return f"left {self.left_cm}cm, bottom {self.bottom_cm}cm, right {self.right_cm}cm, top {self.top_cm}cm"
121 def remaining_width(self):
122 return A4_WIDTH - self.left - self.right
125 def remaining_height(self):
126 return A4_HEIGHT - self.bottom - self.top
128 def give_mirror(self):
129 return PageCrop(left_cm=self.right_cm, bottom_cm=self.bottom_cm, right_cm=self.left_cm, top_cm=self.top_cm)
134 def __init__(self, margin_cm):
135 self.margin = margin_cm * POINTS_PER_CM
136 self.shrink_for_margin = (A4_WIDTH - 2 * self.margin)/A4_WIDTH
137 # NB: We define spine size un-shrunk, but .shrink_for_spine is used with values shrunk for the margin, which we undo here.
138 spine_part_of_page = (INNER_SPINE_MARGIN_PER_PAGE / A4_HALF_WIDTH) / self.shrink_for_margin
139 self.shrink_for_spine = 1 - spine_part_of_page
142 class HandledException(Exception):
147 parser = argparse.ArgumentParser(description=__doc__, epilog=help_epilogue, formatter_class=argparse.RawDescriptionHelpFormatter)
148 parser.add_argument("-i", "--input_file", action="append", required=True, help="input PDF file")
149 parser.add_argument("-o", "--output_file", required=True, help="output PDF file")
150 parser.add_argument("-p", "--page_range", action="append", help="page range, e.g., '2-9' or '3-end' or 'start-14'")
151 parser.add_argument("-c", "--crops", action="append", help="cm crops left, bottom, right, top – e.g., '10,10,10,10'; prefix with ':'-delimited page range to limit effect")
152 parser.add_argument("-r", "--rotate_page", type=int, action="append", help="rotate page of number by 90° (usable multiple times on same page!)")
153 parser.add_argument("-s", "--symmetry", action="store_true", help="alternate horizontal crops between odd and even pages")
154 parser.add_argument("-n", "--nup4", action='store_true', help="puts 4 input pages onto 1 output page, adds binding cut stencil")
155 parser.add_argument("-a", "--analyze", action="store_true", help="in --nup4, print lines identifying spine, page borders")
156 parser.add_argument("-m", "--print_margin", type=float, default=0.43, help="print margin for --nup4 in cm (default 0.43)")
157 return parser.parse_args()
160 def validate_inputs_first_pass(args):
161 for filename in args.input_file:
162 if not os.path.isfile(filename):
163 raise HandledException(f"-i: {filename} is not a file")
165 with open(filename, 'rb') as file:
166 pypdf.PdfReader(file)
167 except pypdf.errors.PdfStreamError:
168 raise HandledException(f"-i: cannot interpret {filename} as PDF file")
170 for p_string in args.page_range:
171 validate_page_range(p_string, "-p")
172 if len(args.page_range) > len(args.input_file):
173 raise HandledException("-p: more --page_range arguments than --input_file arguments")
175 for c_string in args.crops:
176 initial_split = c_string.split(':')
177 if len(initial_split) > 2:
178 raise HandledException(f"-c: cropping string has multiple ':': {c_string}")
179 page_range, crops = split_crops_string(c_string)
180 crops = crops.split(",")
182 validate_page_range(page_range, "-c")
184 raise HandledException(f"-c: cropping does not contain exactly three ',': {c_string}")
189 raise HandledException(f"-c: non-number crop in: {c_string}")
191 for r in args.rotate_page:
195 raise HandledException(f"-r: non-integer value: {r}")
197 raise HandledException(f"-r: value must not be <1: {r}")
199 float(args.print_margin)
201 raise HandledException(f"-m: non-float value: {arg.print_margin}")
204 def validate_page_range(p_string, err_msg_prefix):
205 prefix = f"{err_msg_prefix}: page range string"
206 if '-' not in p_string:
207 raise HandledException(f"{prefix} lacks '-': {p_string}")
208 tokens = p_string.split("-")
210 raise HandledException(f"{prefix} has too many '-': {p_string}")
211 for i, token in enumerate(tokens):
214 if i == 0 and token == "start":
216 if i == 1 and token == "end":
221 raise HandledException(f"{prefix} carries value neither integer, nor 'start', nor 'end': {p_string}")
223 raise HandledException(f"{prefix} carries page number <1: {p_string}")
227 start = int(tokens[0])
231 if start > 0 and end > 0 and start > end:
232 raise HandledException(f"{prefix} has higher start than end value: {p_string}")
235 def split_crops_string(c_string):
236 initial_split = c_string.split(':')
237 if len(initial_split) > 1:
238 page_range = initial_split[0]
239 crops = initial_split[1]
242 crops = initial_split[0]
243 return page_range, crops
246 def parse_page_range(range_string, pages):
248 end_page = len(pages)
250 start, end = range_string.split('-')
251 if not (len(start) == 0 or start == "start"):
252 start_page = int(start) - 1
253 if not (len(end) == 0 or end == "end"):
255 return start_page, end_page
258 def read_inputs_to_pagelist(args_input_file, args_page_range):
262 for i, input_file in enumerate(args_input_file):
263 file = open(input_file, 'rb')
264 opened_files += [file]
265 reader = pypdf.PdfReader(file)
267 if args_page_range and len(args_page_range) > i:
268 range_string = args_page_range[i]
269 start_page, end_page = parse_page_range(range_string, reader.pages)
270 if end_page > len(reader.pages): # no need to test start_page cause start_page > end_page is checked above
271 raise HandledException(f"-p: page range goes beyond pages of input file: {range_string}")
272 for old_page_num in range(start_page, end_page):
274 page = reader.pages[old_page_num]
275 pages_to_add += [page]
276 print(f"-i, -p: read in {input_file} page number {old_page_num+1} as new page {new_page_num}")
277 return pages_to_add, opened_files
280 def validate_inputs_second_pass(args, pages_to_add):
282 for c_string in args.crops:
283 page_range, _= split_crops_string(c_string)
285 start, end = parse_page_range(page_range, pages_to_add)
286 if end > len(pages_to_add):
287 raise HandledException(f"-c: page range goes beyond number of pages we're building: {page_range}")
289 for r in args.rotate_page:
290 if r > len(pages_to_add):
291 raise HandledException(f"-r: page number beyond number of pages we're building: {r}")
294 def rotate_pages(args_rotate_page, pages_to_add):
296 for rotate_page in args_rotate_page:
297 page = pages_to_add[rotate_page - 1]
298 page.add_transformation(pypdf.Transformation().translate(tx=-A4_WIDTH/2, ty=-A4_HEIGHT/2))
299 page.add_transformation(pypdf.Transformation().rotate(-90))
300 page.add_transformation(pypdf.Transformation().translate(tx=A4_WIDTH/2, ty=A4_HEIGHT/2))
301 print(f"-r: rotating (by 90°) page {rotate_page}")
304 def pad_pages_to_multiple_of_8(pages_to_add):
305 mod_to_8 = len(pages_to_add) % 8
307 old_len = len(pages_to_add)
308 for _ in range(8 - mod_to_8):
309 new_page = pypdf.PageObject.create_blank_page(width=A4_WIDTH, height=A4_HEIGHT)
310 pages_to_add += [new_page]
311 print(f"-n: number of input pages {old_len} not required multiple of 8, padded to {len(pages_to_add)}")
314 def normalize_pages_to_A4(pages_to_add):
315 for page in pages_to_add:
316 if "/Rotate" in page: # TODO: preserve rotation, but in canvas?
317 page.rotate(360 - page["/Rotate"])
318 page.mediabox.left = 0
319 page.mediabox.bottom = 0
320 page.mediabox.top = A4_HEIGHT
321 page.mediabox.right = A4_WIDTH
322 page.cropbox = page.mediabox
325 def collect_per_page_crops_and_zooms(args_crops, args_symmetry, pages_to_add):
326 crop_at_page = [PageCrop()] * len(pages_to_add)
328 for c_string in args_crops:
329 page_range, crops = split_crops_string(c_string)
330 start_page, end_page = parse_page_range(page_range, pages_to_add)
331 prefix = "-c, -t" if args_symmetry else "-c"
332 suffix = " (but alternating left and right crop between even and odd pages)" if args_symmetry else ""
333 page_crop = PageCrop(*[x for x in crops.split(',')])
334 print(f"{prefix}: to pages {start_page + 1} to {end_page} applying crop: {page_crop.format_in_cm}{suffix}")
335 for page_num in range(start_page, end_page):
336 if args_symmetry and page_num % 2:
337 crop_at_page[page_num] = page_crop.give_mirror()
339 crop_at_page[page_num] = page_crop
343 def build_single_pages_output(writer, pages_to_add, crop_at_page):
344 print("building 1-input-page-per-output-page book")
346 for i, page in enumerate(pages_to_add):
347 page.add_transformation(pypdf.Transformation().translate(tx=-crop_at_page[i].left, ty=-crop_at_page[i].bottom))
348 page.add_transformation(pypdf.Transformation().scale(crop_at_page[i].zoom, crop_at_page[i].zoom))
349 page.mediabox.right = crop_at_page[i].remaining_width * crop_at_page[i].zoom
350 page.mediabox.top = crop_at_page[i].remaining_height * crop_at_page[i].zoom
351 writer.add_page(page)
352 odd_page = not odd_page
353 print(f"built page number {i+1} (of {len(pages_to_add)})")
356 def build_nup4_output(writer, pages_to_add, crop_at_page, args_print_margin, args_analyze, canvas_class):
357 print("-n: building 4-input-pages-per-output-page book")
358 print(f"-m: applying printable-area margin of {args_print_margin}cm")
360 print("-a: drawing page borders, spine limits")
361 nup4_geometry = Nup4Geometry(args_print_margin)
362 pages_to_add, new_i_order = resort_pages_for_nup4(pages_to_add)
366 for i, page in enumerate(pages_to_add):
368 new_page = pypdf.PageObject.create_blank_page(width=A4_WIDTH, height=A4_HEIGHT)
369 corrected_i = new_i_order[i]
370 nup4_inner_page_transform(page, crop_at_page[corrected_i], nup4_geometry, nup4_i)
371 nup4_outer_page_transform(page, nup4_geometry, nup4_i)
372 new_page.merge_page(page)
374 print(f"merged page number {page_count} (of {len(pages_to_add)})")
377 ornate_nup4(writer, args_analyze, is_front_page, new_page, nup4_geometry, canvas_class)
378 writer.add_page(new_page)
380 is_front_page = not is_front_page
383 def resort_pages_for_nup4(pages_to_add):
389 for page in pages_to_add:
396 for n in PAGE_ORDER_FOR_NUP4:
397 new_i_order += [8 * n_eights + n]
398 new_page_order += [eight_pack[n]]
400 return new_page_order, new_i_order
403 def nup4_inner_page_transform(page, crop, nup4_geometry, nup4_i):
404 page.add_transformation(pypdf.Transformation().translate(ty=(A4_HEIGHT / crop.zoom - (A4_HEIGHT - crop.top))))
405 if nup4_i == 0 or nup4_i == 2:
406 page.add_transformation(pypdf.Transformation().translate(tx=-crop.left))
407 elif nup4_i == 1 or nup4_i == 3:
408 page.add_transformation(pypdf.Transformation().translate(tx=(A4_WIDTH / crop.zoom - (A4_WIDTH - crop.right))))
409 page.add_transformation(pypdf.Transformation().scale(crop.zoom * nup4_geometry.shrink_for_spine, crop.zoom * nup4_geometry.shrink_for_spine))
410 if nup4_i == 2 or nup4_i == 3:
411 page.add_transformation(pypdf.Transformation().translate(ty=-2*nup4_geometry.margin/nup4_geometry.shrink_for_margin))
414 def nup4_outer_page_transform(page, nup4_geometry, nup4_i):
415 page.add_transformation(pypdf.Transformation().translate(ty=(1-nup4_geometry.shrink_for_spine)*A4_HEIGHT))
416 if nup4_i == 0 or nup4_i == 1:
417 y_section = A4_HEIGHT
418 page.mediabox.bottom = A4_HALF_HEIGHT
419 page.mediabox.top = A4_HEIGHT
420 if nup4_i == 2 or nup4_i == 3:
422 page.mediabox.bottom = 0
423 page.mediabox.top = A4_HALF_HEIGHT
424 if nup4_i == 0 or nup4_i == 2:
426 page.mediabox.left = 0
427 page.mediabox.right = A4_HALF_WIDTH
428 if nup4_i == 1 or nup4_i == 3:
429 page.add_transformation(pypdf.Transformation().translate(tx=(1-nup4_geometry.shrink_for_spine)*A4_WIDTH))
431 page.mediabox.left = A4_HALF_WIDTH
432 page.mediabox.right = A4_WIDTH
433 page.add_transformation(pypdf.Transformation().translate(tx=x_section, ty=y_section))
434 page.add_transformation(pypdf.Transformation().scale(QUARTER_SCALE_FACTOR, QUARTER_SCALE_FACTOR))
437 def ornate_nup4(writer, args_analyze, is_front_page, new_page, nup4_geometry, canvas_class):
440 packet = io.BytesIO()
441 c = canvas_class(packet, pagesize=A4)
443 c.line(0, A4_HEIGHT, A4_WIDTH, A4_HEIGHT)
444 c.line(0, A4_HALF_HEIGHT, A4_WIDTH, A4_HALF_HEIGHT)
445 c.line(0, 0, A4_WIDTH, 0)
446 c.line(0, A4_HEIGHT, 0, 0)
447 c.line(A4_HALF_WIDTH, A4_HEIGHT, A4_HALF_WIDTH, 0)
448 c.line(A4_WIDTH, A4_HEIGHT, A4_WIDTH, 0)
450 new_pdf = pypdf.PdfReader(packet)
451 new_page.merge_page(new_pdf.pages[0])
452 printable_offset_x = nup4_geometry.margin
453 printable_offset_y = nup4_geometry.margin * A4_HEIGHT / A4_WIDTH
454 new_page.add_transformation(pypdf.Transformation().scale(nup4_geometry.shrink_for_margin, nup4_geometry.shrink_for_margin))
455 new_page.add_transformation(pypdf.Transformation().translate(tx=printable_offset_x, ty=printable_offset_y))
456 x_left_spine_limit = A4_HALF_WIDTH * nup4_geometry.shrink_for_spine
457 x_right_spine_limit = A4_WIDTH - x_left_spine_limit
458 if args_analyze or is_front_page:
459 packet = io.BytesIO()
460 c = canvas_class(packet, pagesize=A4)
464 c.line(x_left_spine_limit, A4_HEIGHT, x_left_spine_limit, 0)
465 c.line(x_right_spine_limit, A4_HEIGHT, x_right_spine_limit, 0)
468 draw_cut(c, x_left_spine_limit, (1))
469 draw_cut(c, x_right_spine_limit, (-1))
470 if args_analyze or is_front_page:
472 new_pdf = pypdf.PdfReader(packet)
473 new_page.merge_page(new_pdf.pages[0])
476 def draw_cut(canvas, x_spine_limit, direction):
477 outer_start_x = x_spine_limit - 0.5 * CUT_WIDTH * direction
478 inner_start_x = x_spine_limit + 0.5 * CUT_WIDTH * direction
479 middle_point_y = A4_HALF_HEIGHT + MIDDLE_POINT_DEPTH * direction
480 end_point_y = A4_HALF_HEIGHT + CUT_DEPTH * direction
481 canvas.line(inner_start_x, A4_HALF_HEIGHT, x_spine_limit, end_point_y)
482 canvas.line(x_spine_limit, end_point_y, x_spine_limit, middle_point_y)
483 canvas.line(x_spine_limit, middle_point_y, outer_start_x, A4_HALF_HEIGHT)
488 validate_inputs_first_pass(args)
491 from reportlab.pdfgen.canvas import Canvas
493 raise HandledException("-n: need reportlab.pdfgen.canvas installed for --nup4")
494 pages_to_add, opened_files = read_inputs_to_pagelist(args.input_file, args.page_range)
495 validate_inputs_second_pass(args, pages_to_add)
496 rotate_pages(args.rotate_page, pages_to_add)
498 pad_pages_to_multiple_of_8(pages_to_add)
499 normalize_pages_to_A4(pages_to_add)
500 crop_at_page = collect_per_page_crops_and_zooms(args.crops, args.symmetry, pages_to_add)
501 writer = pypdf.PdfWriter()
503 build_nup4_output(writer, pages_to_add, crop_at_page, args.print_margin, args.analyze, Canvas)
505 build_single_pages_output(writer, pages_to_add, crop_at_page)
506 for file in opened_files:
508 with open(args.output_file, 'wb') as output_file:
509 writer.write(output_file)
512 if __name__ == "__main__":
515 except HandledException as e:
516 handled_error_exit(e)