3 bookmaker.py is a helper for optimizing PDFs for the production of small self-printed, self-bound physical books. Towards this goal it offers various PDF manipulation options that may also be used indepéndently and for other purposes.
6 OVERVIEW OF TARGET USAGE:
8 By cropping with -c and studying the results, define the areas of the input PDF's pages you want visible. Then, with--nup4, map those areas onto 4 input pages per 1 output page, arranged in such a way that double-sided print-out of those output pages can be cut, folded, and bound (helped by addition of stencils for small incisions to carry rubber bands or the like) into a small A6 book. Each unit of 8 pages from the input PDF is mapped by --nup4 onto two pages representing two sides of a (no-tumble-duplex-printed) A4 paper:
10 +-------=-------+ __________________
11 (front) (back) | 4 | 1 = 2 | 3 | 4 /=|===|============
12 +-------=-------+ ==> +-------=-------+ ===> _/|\_ v >=|===|============
13 | 4 | 1 = 2 | 3 | / | \_ \=|===|============
14 |-------=-------| +-------=-------+ 1-> | 2 | 3 | | \ / <- cut out!
15 | 8 | 5 = 6 | 7 | ==> | 8 | 5 = 6 | 7 | | _/ \_ | | \ |
16 +-------=-------+ +-------=-------+ |/ \| | \| (p. 5)
18 To turn this paper into a small 8-pages book, first cut it into two A5 papers along its horizontal middle. Fold both A5's by their vertical middles, with pages 2-3 and 7-6 on the folds' insides. You now have two 4-page A6 "books" of pages 1-4 and pages 5-8. Fold both closed and (counter-intuitively) stack the second one on top of the first one (creating a temporary page order of 5,6,7,8,1,2,3,4). This reveals a small stencil on the top left of page 5 – cut it out, with all other pages folded and aligned under it, creating a small notch in the upper "inner" corner of all pages. Turn around the stack to find a mirror stencil on the bottom and repeat the cutting. Each page now has cuts on top and bottom of its inner margins into which a rubber band can be hooked, or through which a string may be looped and tied, to bind the page's inner margins into a kind of book spine. You may now swap the order of the 4-page books back into a proper final page order (of 1,2,3,4,5,6,7,8) and repeat the whole process for each further --nup4 output paper.
22 Concatenate two PDFs A.pdf and B.pdf to COMBINED.pdf:
23 bookmaker.py --input_file A.pdf --input_file B.pdf --output_file COMBINED.pdf
25 Produce OUTPUT.pdf containing all pages of (inclusive) page number range 3-7 from INPUT.pdf:
26 bookmaker.py -i INPUT.pdf --page_range 3-7 -o OUTPUT.pdf
28 Produce COMBINED.pdf from A.pdf's first 7 pages, B.pdf's pages except its first two, and all pages of C.pdf:
29 bookmaker.py -i A.pdf -p start-7 -i B.pdf -p 3-end -i C.pdf -o COMBINED.pdf
31 Crop each page 5cm from the left, 10cm from the bottom, 2cm from the right, and 0cm from the top:
32 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --crops "5,10,2,0"
34 Include all pages from INPUT.pdf, but only crop pages 10-20 by 5cm each from bottom and top:
35 bookmaker.py -i INPUT.pdf -c "10-20:0,5,0,5" -o OUTPUT.pdf
37 Same crops for pages 10-20, but also crop all pages 30 and later by 3cm each from left and right:
38 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -c "10-20:0,5,0,5" -c "30-end:3,0,3,0"
40 Rotate by 90° pages 3, 5, 7; rotate page 7 once more by 90% (i.e. 180° in total):
41 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --rotate 3 -r 5 -r 7 -r 7
43 Initially declare 5cm crop from the left and 1cm crop from right, but alternate direction between even and odd pages:
44 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -c "5,0,1,0" --symmetry
46 Quarter each OUTPUT.pdf page to carry 4 pages from INPUT.pdf, draw stencils into inner margins for cuts to carry binding strings:
47 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --nup4
49 Same --nup4, but define a printable-region margin of 1.3cm to limit the space for the INPUT.pdf pages in OUTPUT.pdf page quarters:
50 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -n --print_margin 1.3
52 Same --nup4, but draw lines marking printable-region margins, page quarters, spine margins:
53 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -n --analyze
57 For arguments like -p, page numbers are assumed to start with 1 (not 0, which is treated as an invalid page number value).
59 The target page shape so far is assumed to be A4 in portrait orientation; bookmaker.py normalizes all pages to this format before applying crops, and removes any source PDF /Rotate commands (for their production of landscape orientations).
66 def handled_error_exit(msg):
67 print(f"ERROR: {msg}")
73 handled_error_exit("Can't run at all without pypdf installed.")
75 # some general paper geometry constants
76 POINTS_PER_CM = 10 * 72 / 25.4
77 A4_WIDTH = 21 * POINTS_PER_CM
78 A4_HEIGHT = 29.7 * POINTS_PER_CM
79 A4 = (A4_WIDTH, A4_HEIGHT)
81 # constants specifically for --nup4
82 A4_HALF_WIDTH = A4_WIDTH / 2
83 A4_HALF_HEIGHT = A4_HEIGHT / 2
84 CUT_DEPTH = 1.95 * POINTS_PER_CM
85 CUT_WIDTH = 1.05 * POINTS_PER_CM
86 MIDDLE_POINT_DEPTH = 0.4 * POINTS_PER_CM
87 INNER_SPINE_MARGIN_PER_PAGE = 1 * POINTS_PER_CM
88 QUARTER_SCALE_FACTOR = 0.5
89 PAGE_ORDER_FOR_NUP4 = (3,0,7,4,1,2,5,6)
94 def __init__(self, left_cm=0, bottom_cm=0, right_cm=0, top_cm=0):
95 self.left_cm = left_cm
96 self.bottom_cm = bottom_cm
97 self.right_cm = right_cm
99 self.left = float(self.left_cm) * POINTS_PER_CM
100 self.bottom = float(self.bottom_cm) * POINTS_PER_CM
101 self.right = float(self.right_cm) * POINTS_PER_CM
102 self.top = float(self.top_cm) * POINTS_PER_CM
103 zoom_horizontal = A4_WIDTH / (A4_WIDTH - self.left - self.right)
104 zoom_vertical = A4_HEIGHT / (A4_HEIGHT - self.bottom - self.top)
105 if (zoom_horizontal > 1 and zoom_vertical < 1) or (zoom_horizontal < 1 and zoom_vertical > 1):
106 raise HandledException("-c: crops would create opposing zoom directions")
107 elif zoom_horizontal + zoom_vertical > 2:
108 self.zoom = min(zoom_horizontal, zoom_vertical)
110 self.zoom = max(zoom_horizontal, zoom_vertical)
113 return str(vars(self))
116 def format_in_cm(self):
117 return f"left {self.left_cm}cm, bottom {self.bottom_cm}cm, right {self.right_cm}cm, top {self.top_cm}cm"
120 def remaining_width(self):
121 return A4_WIDTH - self.left - self.right
124 def remaining_height(self):
125 return A4_HEIGHT - self.bottom - self.top
127 def give_mirror(self):
128 return PageCrop(left_cm=self.right_cm, bottom_cm=self.bottom_cm, right_cm=self.left_cm, top_cm=self.top_cm)
133 def __init__(self, margin_cm):
134 self.margin = margin_cm * POINTS_PER_CM
135 self.shrink_for_margin = (A4_WIDTH - 2 * self.margin)/A4_WIDTH
136 # NB: We define spine size un-shrunk, but .shrink_for_spine is used with values shrunk for the margin, which we undo here.
137 spine_part_of_page = (INNER_SPINE_MARGIN_PER_PAGE / A4_HALF_WIDTH) / self.shrink_for_margin
138 self.shrink_for_spine = 1 - spine_part_of_page
141 class HandledException(Exception):
146 parser = argparse.ArgumentParser(description=__doc__, epilog=help_epilogue, formatter_class=argparse.RawDescriptionHelpFormatter)
147 parser.add_argument("-i", "--input_file", action="append", required=True, help="input PDF file")
148 parser.add_argument("-o", "--output_file", required=True, help="output PDF file")
149 parser.add_argument("-p", "--page_range", action="append", help="page range, e.g., '2-9' or '3-end' or 'start-14'")
150 parser.add_argument("-c", "--crops", action="append", help="cm crops left, bottom, right, top – e.g., '10,10,10,10'; prefix with ':'-delimited page range to limit effect")
151 parser.add_argument("-r", "--rotate_page", type=int, action="append", help="rotate page of number by 90° (usable multiple times on same page!)")
152 parser.add_argument("-s", "--symmetry", action="store_true", help="alternate horizontal crops between odd and even pages")
153 parser.add_argument("-n", "--nup4", action='store_true', help="puts 4 input pages onto 1 output page, adds binding cut stencil")
154 parser.add_argument("-a", "--analyze", action="store_true", help="in --nup4, print lines identifying spine, page borders")
155 parser.add_argument("-m", "--print_margin", type=float, default=0.43, help="print margin for --nup4 in cm (default 0.43)")
156 return parser.parse_args()
159 def validate_inputs_first_pass(args):
160 for filename in args.input_file:
161 if not os.path.isfile(filename):
162 raise HandledException(f"-i: {filename} is not a file")
164 with open(filename, 'rb') as file:
165 pypdf.PdfReader(file)
166 except pypdf.errors.PdfStreamError:
167 raise HandledException(f"-i: cannot interpret {filename} as PDF file")
169 for p_string in args.page_range:
170 validate_page_range(p_string, "-p")
171 if len(args.page_range) > len(args.input_file):
172 raise HandledException("-p: more --page_range arguments than --input_file arguments")
174 for c_string in args.crops:
175 initial_split = c_string.split(':')
176 if len(initial_split) > 2:
177 raise HandledException(f"-c: cropping string has multiple ':': {c_string}")
178 page_range, crops = split_crops_string(c_string)
179 crops = crops.split(",")
181 validate_page_range(page_range, "-c")
183 raise HandledException(f"-c: cropping does not contain exactly three ',': {c_string}")
188 raise HandledException(f"-c: non-number crop in: {c_string}")
190 for r in args.rotate_page:
194 raise HandledException(f"-r: non-integer value: {r}")
196 raise HandledException(f"-r: value must not be <1: {r}")
198 float(args.print_margin)
200 raise HandledException(f"-m: non-float value: {arg.print_margin}")
203 def validate_page_range(p_string, err_msg_prefix):
204 prefix = f"{err_msg_prefix}: page range string"
205 if '-' not in p_string:
206 raise HandledException(f"{prefix} lacks '-': {p_string}")
207 tokens = p_string.split("-")
209 raise HandledException(f"{prefix} has too many '-': {p_string}")
210 for i, token in enumerate(tokens):
213 if i == 0 and token == "start":
215 if i == 1 and token == "end":
220 raise HandledException(f"{prefix} carries value neither integer, nor 'start', nor 'end': {p_string}")
222 raise HandledException(f"{prefix} carries page number <1: {p_string}")
226 start = int(tokens[0])
230 if start > 0 and end > 0 and start > end:
231 raise HandledException(f"{prefix} has higher start than end value: {p_string}")
234 def split_crops_string(c_string):
235 initial_split = c_string.split(':')
236 if len(initial_split) > 1:
237 page_range = initial_split[0]
238 crops = initial_split[1]
241 crops = initial_split[0]
242 return page_range, crops
245 def parse_page_range(range_string, pages):
247 end_page = len(pages)
249 start, end = range_string.split('-')
250 if not (len(start) == 0 or start == "start"):
251 start_page = int(start) - 1
252 if not (len(end) == 0 or end == "end"):
254 return start_page, end_page
257 def read_inputs_to_pagelist(args_input_file, args_page_range):
261 for i, input_file in enumerate(args_input_file):
262 file = open(input_file, 'rb')
263 opened_files += [file]
264 reader = pypdf.PdfReader(file)
266 if args_page_range and len(args_page_range) > i:
267 range_string = args_page_range[i]
268 start_page, end_page = parse_page_range(range_string, reader.pages)
269 if end_page > len(reader.pages): # no need to test start_page cause start_page > end_page is checked above
270 raise HandledException(f"-p: page range goes beyond pages of input file: {range_string}")
271 for old_page_num in range(start_page, end_page):
273 page = reader.pages[old_page_num]
274 pages_to_add += [page]
275 print(f"-i, -p: read in {input_file} page number {old_page_num+1} as new page {new_page_num}")
276 return pages_to_add, opened_files
279 def validate_inputs_second_pass(args, pages_to_add):
281 for c_string in args.crops:
282 page_range, _= split_crops_string(c_string)
284 start, end = parse_page_range(page_range, pages_to_add)
285 if end > len(pages_to_add):
286 raise HandledException(f"-c: page range goes beyond number of pages we're building: {page_range}")
288 for r in args.rotate_page:
289 if r > len(pages_to_add):
290 raise HandledException(f"-r: page number beyond number of pages we're building: {r}")
293 def rotate_pages(args_rotate_page, pages_to_add):
295 for rotate_page in args_rotate_page:
296 page = pages_to_add[rotate_page - 1]
297 page.add_transformation(pypdf.Transformation().translate(tx=-A4_WIDTH/2, ty=-A4_HEIGHT/2))
298 page.add_transformation(pypdf.Transformation().rotate(-90))
299 page.add_transformation(pypdf.Transformation().translate(tx=A4_WIDTH/2, ty=A4_HEIGHT/2))
300 print(f"-r: rotating (by 90°) page {rotate_page}")
303 def pad_pages_to_multiple_of_8(pages_to_add):
304 mod_to_8 = len(pages_to_add) % 8
306 old_len = len(pages_to_add)
307 for _ in range(8 - mod_to_8):
308 new_page = pypdf.PageObject.create_blank_page(width=A4_WIDTH, height=A4_HEIGHT)
309 pages_to_add += [new_page]
310 print(f"-n: number of input pages {old_len} not required multiple of 8, padded to {len(pages_to_add)}")
313 def normalize_pages_to_A4(pages_to_add):
314 for page in pages_to_add:
315 if "/Rotate" in page: # TODO: preserve rotation, but in canvas?
316 page.rotate(360 - page["/Rotate"])
317 page.mediabox.left = 0
318 page.mediabox.bottom = 0
319 page.mediabox.top = A4_HEIGHT
320 page.mediabox.right = A4_WIDTH
321 page.cropbox = page.mediabox
324 def collect_per_page_crops_and_zooms(args_crops, args_symmetry, pages_to_add):
325 crop_at_page = [PageCrop()] * len(pages_to_add)
327 for c_string in args_crops:
328 page_range, crops = split_crops_string(c_string)
329 start_page, end_page = parse_page_range(page_range, pages_to_add)
330 prefix = "-c, -t" if args_symmetry else "-c"
331 suffix = " (but alternating left and right crop between even and odd pages)" if args_symmetry else ""
332 page_crop = PageCrop(*[x for x in crops.split(',')])
333 print(f"{prefix}: to pages {start_page + 1} to {end_page} applying crop: {page_crop.format_in_cm}{suffix}")
334 for page_num in range(start_page, end_page):
335 if args_symmetry and page_num % 2:
336 crop_at_page[page_num] = page_crop.give_mirror()
338 crop_at_page[page_num] = page_crop
342 def build_single_pages_output(writer, pages_to_add, crop_at_page):
343 print("building 1-input-page-per-output-page book")
345 for i, page in enumerate(pages_to_add):
346 page.add_transformation(pypdf.Transformation().translate(tx=-crop_at_page[i].left, ty=-crop_at_page[i].bottom))
347 page.add_transformation(pypdf.Transformation().scale(crop_at_page[i].zoom, crop_at_page[i].zoom))
348 page.mediabox.right = crop_at_page[i].remaining_width * crop_at_page[i].zoom
349 page.mediabox.top = crop_at_page[i].remaining_height * crop_at_page[i].zoom
350 writer.add_page(page)
351 odd_page = not odd_page
352 print(f"built page number {i+1} (of {len(pages_to_add)})")
355 def build_nup4_output(writer, pages_to_add, crop_at_page, args_print_margin, args_analyze, canvas_class):
356 print("-n: building 4-input-pages-per-output-page book")
357 print(f"-m: applying printable-area margin of {args_print_margin}cm")
359 print("-a: drawing page borders, spine limits")
360 nup4_geometry = Nup4Geometry(args_print_margin)
361 pages_to_add, new_i_order = resort_pages_for_nup4(pages_to_add)
365 for i, page in enumerate(pages_to_add):
367 new_page = pypdf.PageObject.create_blank_page(width=A4_WIDTH, height=A4_HEIGHT)
368 corrected_i = new_i_order[i]
369 nup4_inner_page_transform(page, crop_at_page[corrected_i], nup4_geometry, nup4_i)
370 nup4_outer_page_transform(page, nup4_geometry, nup4_i)
371 new_page.merge_page(page)
373 print(f"merged page number {page_count} (of {len(pages_to_add)})")
376 ornate_nup4(writer, args_analyze, is_front_page, new_page, nup4_geometry, canvas_class)
377 writer.add_page(new_page)
379 is_front_page = not is_front_page
382 def resort_pages_for_nup4(pages_to_add):
388 for page in pages_to_add:
395 for n in PAGE_ORDER_FOR_NUP4:
396 new_i_order += [8 * n_eights + n]
397 new_page_order += [eight_pack[n]]
399 return new_page_order, new_i_order
402 def nup4_inner_page_transform(page, crop, nup4_geometry, nup4_i):
403 page.add_transformation(pypdf.Transformation().translate(ty=(A4_HEIGHT / crop.zoom - (A4_HEIGHT - crop.top))))
404 if nup4_i == 0 or nup4_i == 2:
405 page.add_transformation(pypdf.Transformation().translate(tx=-crop.left))
406 elif nup4_i == 1 or nup4_i == 3:
407 page.add_transformation(pypdf.Transformation().translate(tx=(A4_WIDTH / crop.zoom - (A4_WIDTH - crop.right))))
408 page.add_transformation(pypdf.Transformation().scale(crop.zoom * nup4_geometry.shrink_for_spine, crop.zoom * nup4_geometry.shrink_for_spine))
409 if nup4_i == 2 or nup4_i == 3:
410 page.add_transformation(pypdf.Transformation().translate(ty=-2*nup4_geometry.margin/nup4_geometry.shrink_for_margin))
413 def nup4_outer_page_transform(page, nup4_geometry, nup4_i):
414 page.add_transformation(pypdf.Transformation().translate(ty=(1-nup4_geometry.shrink_for_spine)*A4_HEIGHT))
415 if nup4_i == 0 or nup4_i == 1:
416 y_section = A4_HEIGHT
417 page.mediabox.bottom = A4_HALF_HEIGHT
418 page.mediabox.top = A4_HEIGHT
419 if nup4_i == 2 or nup4_i == 3:
421 page.mediabox.bottom = 0
422 page.mediabox.top = A4_HALF_HEIGHT
423 if nup4_i == 0 or nup4_i == 2:
425 page.mediabox.left = 0
426 page.mediabox.right = A4_HALF_WIDTH
427 if nup4_i == 1 or nup4_i == 3:
428 page.add_transformation(pypdf.Transformation().translate(tx=(1-nup4_geometry.shrink_for_spine)*A4_WIDTH))
430 page.mediabox.left = A4_HALF_WIDTH
431 page.mediabox.right = A4_WIDTH
432 page.add_transformation(pypdf.Transformation().translate(tx=x_section, ty=y_section))
433 page.add_transformation(pypdf.Transformation().scale(QUARTER_SCALE_FACTOR, QUARTER_SCALE_FACTOR))
436 def ornate_nup4(writer, args_analyze, is_front_page, new_page, nup4_geometry, canvas_class):
439 packet = io.BytesIO()
440 c = canvas_class(packet, pagesize=A4)
442 c.line(0, A4_HEIGHT, A4_WIDTH, A4_HEIGHT)
443 c.line(0, A4_HALF_HEIGHT, A4_WIDTH, A4_HALF_HEIGHT)
444 c.line(0, 0, A4_WIDTH, 0)
445 c.line(0, A4_HEIGHT, 0, 0)
446 c.line(A4_HALF_WIDTH, A4_HEIGHT, A4_HALF_WIDTH, 0)
447 c.line(A4_WIDTH, A4_HEIGHT, A4_WIDTH, 0)
449 new_pdf = pypdf.PdfReader(packet)
450 new_page.merge_page(new_pdf.pages[0])
451 printable_offset_x = nup4_geometry.margin
452 printable_offset_y = nup4_geometry.margin * A4_HEIGHT / A4_WIDTH
453 new_page.add_transformation(pypdf.Transformation().scale(nup4_geometry.shrink_for_margin, nup4_geometry.shrink_for_margin))
454 new_page.add_transformation(pypdf.Transformation().translate(tx=printable_offset_x, ty=printable_offset_y))
455 x_left_spine_limit = A4_HALF_WIDTH * nup4_geometry.shrink_for_spine
456 x_right_spine_limit = A4_WIDTH - x_left_spine_limit
457 if args_analyze or is_front_page:
458 packet = io.BytesIO()
459 c = canvas_class(packet, pagesize=A4)
463 c.line(x_left_spine_limit, A4_HEIGHT, x_left_spine_limit, 0)
464 c.line(x_right_spine_limit, A4_HEIGHT, x_right_spine_limit, 0)
467 draw_cut(c, x_left_spine_limit, (1))
468 draw_cut(c, x_right_spine_limit, (-1))
469 if args_analyze or is_front_page:
471 new_pdf = pypdf.PdfReader(packet)
472 new_page.merge_page(new_pdf.pages[0])
475 def draw_cut(canvas, x_spine_limit, direction):
476 outer_start_x = x_spine_limit - 0.5 * CUT_WIDTH * direction
477 inner_start_x = x_spine_limit + 0.5 * CUT_WIDTH * direction
478 middle_point_y = A4_HALF_HEIGHT + MIDDLE_POINT_DEPTH * direction
479 end_point_y = A4_HALF_HEIGHT + CUT_DEPTH * direction
480 canvas.line(inner_start_x, A4_HALF_HEIGHT, x_spine_limit, end_point_y)
481 canvas.line(x_spine_limit, end_point_y, x_spine_limit, middle_point_y)
482 canvas.line(x_spine_limit, middle_point_y, outer_start_x, A4_HALF_HEIGHT)
487 validate_inputs_first_pass(args)
490 from reportlab.pdfgen.canvas import Canvas
492 raise HandledException("-n: need reportlab.pdfgen.canvas installed for --nup4")
493 pages_to_add, opened_files = read_inputs_to_pagelist(args.input_file, args.page_range)
494 validate_inputs_second_pass(args, pages_to_add)
495 rotate_pages(args.rotate_page, pages_to_add)
497 pad_pages_to_multiple_of_8(pages_to_add)
498 normalize_pages_to_A4(pages_to_add)
499 crop_at_page = collect_per_page_crops_and_zooms(args.crops, args.symmetry, pages_to_add)
500 writer = pypdf.PdfWriter()
502 build_nup4_output(writer, pages_to_add, crop_at_page, args.print_margin, args.analyze, Canvas)
504 build_single_pages_output(writer, pages_to_add, crop_at_page)
505 for file in opened_files:
507 with open(args.output_file, 'wb') as output_file:
508 writer.write(output_file)
511 if __name__ == "__main__":
514 except HandledException as e:
515 handled_error_exit(e)