3 bookmaker.py is a helper for optimizing PDFs of books for the production of small self-printed, self-bound physical books. Towards this goal it offers various PDF manipulation options that may also be used indepéndently and for other purposes.
8 Concatenate two PDFs A.pdf and B.pdf to COMBINED.pdf:
9 bookmaker.py --input_file A.pdf --input_file B.pdf --output_file COMBINED.pdf
11 Produce OUTPUT.pdf containing all pages of (inclusive) page number range 3-7 from INPUT.pdf:
12 bookmaker.py -i INPUT.pdf --page_range 3-7 -o OUTPUT.pdf
14 Produce COMBINED.pdf from A.pdf's first 7 pages, B.pdf's pages except its first two, and all pages of C.pdf:
15 bookmaker.py -i A.pdf -p start-7 -i B.pdf -p 3-end -i C.pdf -o COMBINED.pdf
17 Crop each page 5cm from the left, 10cm from the bottom, 2cm from the right, and 0cm from the top:
18 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --crops "5,10,2,0"
20 Include all pages from INPUT.pdf, but crop pages 10-20 by 5cm each from bottom and top:
21 bookmaker.py -i INPUT.pdf -c "10-20:0,5,0,5" -o OUTPUT.pdf
23 Same crops for pages 10-20, but also crop all pages 30 and later by 3cm each from left and right:
24 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -c "10-20:0,5,0,5" -c "30-end:3,0,3,0"
26 Rotate by 90° pages 3, 5, 7; rotate page 7 once more by 90% (i.e. 180° in total):
27 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --rotate 3 -r 5 -r 7 -r 7
29 Initially declare 5cm crop from the left and 1cm crop from right, but alternate direction between even and odd pages:
30 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -c "5,0,1,0" -s
32 Quarter each OUTPUT.pdf page to carry 4 pages from INPUT.pdf, draw stencils into inner margins for cuts to carry binding strings:
33 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --nup4
35 Same --nup4, but define a printable-region margin of 1.3cm to limit the space for the INPUT.pdf pages in OUTPUT.pdf page quarters:
36 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -n --print_margin 1.3
38 Same --nup4, but draw lines marking printable-region margins, page quarts, spine margins:
39 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -n --analyze
43 For arguments like -p, page numbers are assumed to start with 1 (not 0, which is treated as an invalid page number value).
45 The target page shape so far is assumed to be A4 in portrait orientation; bookmaker.py normalizes all pages to this format before applying crops, and removes any source PDF /Rotate commands (for their production of landscape orientations).
47 For --nup4, the -c cropping instructions do not so much erase content outside the cropped area, but rather zoom into the page in a way that maximes the cropped area as much as possible into the available per-page area between printable-area margins and the borders to the other quartered pages. If the zoomed cropped area does not fit in neatly into its per-page area, this will preserve additional page content.
49 The --nup4 quartering puts pages into a specific order optimized for no-tumble duplex print-outs that can easily be folded and cut into pages of a small A6 book. Each unit of 8 pages from the source PDF is mapped thus onto two subsequent pages (i.e. front and back of a printed A4 paper):
58 To facilitate this layout, --nup4 also pads the input PDF pages to a total number that is a multiple of 8, by adding empty pages if necessary.
60 (To turn above double-sided example page into a tiny 8-page book: Cut the paper in two on its horizontal middle line. Fold the two halves by their vertical middle lines, with pages 3-2 and 7-6 on the folds' insides. This creates two 4-page books of pages 1-4 and pages 5-8. Fold them both closed and (counter-intuitively) put the book of pages 5-8 on top of the other one (creating a temporary page order of 5,6,7,8,1,2,3,4). A binding cut stencil should be visible on the top left of this stack – cut it out (with all pages folded together) to add the same inner-margin upper cut to each page. Turn around your 8-pages stack to find the mirror image of aforementioned stencil on the stack's back's bottom, and cut that out too. Each page now has binding cuts on top and bottom of its inner margins. Swap the order of both books (back to the final page order of 1,2,3,4,5,6,7,8), and you now have an 8-pages book that can be "bound" in its binding cuts through a rubber band or the like. Repeat with the next 8-pages double-page, et cetera. (Actually, with just 8 pages, the paper may curl under the pressure of a rubber band – but go up to 32 pages or so, and the result will become quite stable.)
66 from collections import namedtuple
68 def handled_error_exit(msg):
69 print(f"ERROR: {msg}")
75 handled_error_exit("Can't run at all without pypdf installed.")
77 # some general paper geometry constants
78 POINTS_PER_CM = 10 * 72 / 25.4
79 A4_WIDTH = 21 * POINTS_PER_CM
80 A4_HEIGHT = 29.7 * POINTS_PER_CM
81 A4 = (A4_WIDTH, A4_HEIGHT)
83 # constants specifically for --nup4
84 A4_HALF_WIDTH = A4_WIDTH / 2
85 A4_HALF_HEIGHT = A4_HEIGHT / 2
86 CUT_DEPTH = 1.95 * POINTS_PER_CM
87 CUT_WIDTH = 1.05 * POINTS_PER_CM
88 MIDDLE_POINT_DEPTH = 0.4 * POINTS_PER_CM
89 INNER_SPINE_MARGIN_PER_PAGE = 1 * POINTS_PER_CM
90 QUARTER_SCALE_FACTOR = 0.5
91 PAGE_ORDER_FOR_NUP4 = (3,0,7,4,1,2,5,6)
96 def __init__(self, left_cm=0, bottom_cm=0, right_cm=0, top_cm=0):
97 self.left_cm = left_cm
98 self.bottom_cm = bottom_cm
99 self.right_cm = right_cm
101 self.left = float(self.left_cm) * POINTS_PER_CM
102 self.bottom = float(self.bottom_cm) * POINTS_PER_CM
103 self.right = float(self.right_cm) * POINTS_PER_CM
104 self.top = float(self.top_cm) * POINTS_PER_CM
105 zoom_horizontal = A4_WIDTH / (A4_WIDTH - self.left - self.right)
106 zoom_vertical = A4_HEIGHT / (A4_HEIGHT - self.bottom - self.top)
107 if (zoom_horizontal > 1 and zoom_vertical < 1) or (zoom_horizontal < 1 and zoom_vertical > 1):
108 raise HandledException("-c: crops would create opposing zoom directions")
109 elif zoom_horizontal + zoom_vertical > 2:
110 self.zoom = min(zoom_horizontal, zoom_vertical)
112 self.zoom = max(zoom_horizontal, zoom_vertical)
115 return str(vars(self))
118 def format_in_cm(self):
119 return f"left {self.left_cm}cm, bottom {self.bottom_cm}cm, right {self.right_cm}cm, top {self.top_cm}cm"
122 def remaining_width(self):
123 return A4_WIDTH - self.left - self.right
126 def remaining_height(self):
127 return A4_HEIGHT - self.bottom - self.top
129 def give_mirror(self):
130 return PageCrop(left_cm=self.right_cm, bottom_cm=self.bottom_cm, right_cm=self.left_cm, top_cm=self.top_cm)
135 def __init__(self, margin_cm):
136 self.margin = margin_cm * POINTS_PER_CM
137 self.shrink_for_margin = (A4_WIDTH - 2 * self.margin)/A4_WIDTH
138 # NB: We define spine size un-shrunk, but .shrink_for_spine is used with values shrunk for the margin, which we undo here.
139 spine_part_of_page = (INNER_SPINE_MARGIN_PER_PAGE / A4_HALF_WIDTH) / self.shrink_for_margin
140 self.shrink_for_spine = 1 - spine_part_of_page
143 class HandledException(Exception):
148 parser = argparse.ArgumentParser(description=__doc__, epilog=help_epilogue, formatter_class=argparse.RawDescriptionHelpFormatter)
149 parser.add_argument("-i", "--input_file", action="append", required=True, help="input PDF file")
150 parser.add_argument("-o", "--output_file", required=True, help="output PDF file")
151 parser.add_argument("-p", "--page_range", action="append", help="page range, e.g., '2-9' or '3-end' or 'start-14'")
152 parser.add_argument("-c", "--crops", action="append", help="cm crops left, bottom, right, top – e.g., '10,10,10,10'; prefix with ':'-delimited page range to limit effect")
153 parser.add_argument("-r", "--rotate_page", type=int, action="append", help="rotate page of number by 90° (usable multiple times on same page!)")
154 parser.add_argument("-s", "--symmetry", action="store_true", help="alternate horizontal crops between odd and even pages")
155 parser.add_argument("-n", "--nup4", action='store_true', help="puts 4 input pages onto 1 output page, adds binding cut stencil")
156 parser.add_argument("-a", "--analyze", action="store_true", help="in --nup4, print lines identifying spine, page borders")
157 parser.add_argument("-m", "--print_margin", type=float, default=0.43, help="print margin for --nup4 in cm (default 0.43)")
158 return parser.parse_args()
161 def validate_inputs_first_pass(args):
162 for filename in args.input_file:
163 if not os.path.isfile(filename):
164 raise HandledException(f"-i: {filename} is not a file")
166 with open(filename, 'rb') as file:
167 pypdf.PdfReader(file)
168 except pypdf.errors.PdfStreamError:
169 raise HandledException(f"-i: cannot interpret {filename} as PDF file")
171 for p_string in args.page_range:
172 validate_page_range(p_string, "-p")
173 if len(args.page_range) > len(args.input_file):
174 raise HandledException("-p: more --page_range arguments than --input_file arguments")
176 for c_string in args.crops:
177 initial_split = c_string.split(':')
178 if len(initial_split) > 2:
179 raise HandledException(f"-c: cropping string has multiple ':': {c_string}")
180 page_range, crops = split_crops_string(c_string)
181 crops = crops.split(",")
183 validate_page_range(page_range, "-c")
185 raise HandledException(f"-c: cropping does not contain exactly three ',': {c_string}")
190 raise HandledException(f"-c: non-number crop in: {c_string}")
192 for r in args.rotate_page:
196 raise HandledException(f"-r: non-integer value: {r}")
198 raise HandledException(f"-r: value must not be <1: {r}")
200 float(args.print_margin)
202 raise HandledException(f"-m: non-float value: {arg.print_margin}")
205 def validate_page_range(p_string, err_msg_prefix):
206 prefix = f"{err_msg_prefix}: page range string"
207 if '-' not in p_string:
208 raise HandledException(f"{prefix} lacks '-': {p_string}")
209 tokens = p_string.split("-")
211 raise HandledException(f"{prefix} has too many '-': {p_string}")
212 for i, token in enumerate(tokens):
215 if i == 0 and token == "start":
217 if i == 1 and token == "end":
222 raise HandledException(f"{prefix} carries value neither integer, nor 'start', nor 'end': {p_string}")
224 raise HandledException(f"{prefix} carries page number <1: {p_string}")
228 start = int(tokens[0])
232 if start > 0 and end > 0 and start > end:
233 raise HandledException(f"{prefix} has higher start than end value: {p_string}")
236 def split_crops_string(c_string):
237 initial_split = c_string.split(':')
238 if len(initial_split) > 1:
239 page_range = initial_split[0]
240 crops = initial_split[1]
243 crops = initial_split[0]
244 return page_range, crops
247 def parse_page_range(range_string, pages):
249 end_page = len(pages)
251 start, end = range_string.split('-')
252 if not (len(start) == 0 or start == "start"):
253 start_page = int(start) - 1
254 if not (len(end) == 0 or end == "end"):
256 return start_page, end_page
259 def read_inputs_to_pagelist(args_input_file, args_page_range):
263 for i, input_file in enumerate(args_input_file):
264 file = open(input_file, 'rb')
265 opened_files += [file]
266 reader = pypdf.PdfReader(file)
268 if args_page_range and len(args_page_range) > i:
269 range_string = args_page_range[i]
270 start_page, end_page = parse_page_range(range_string, reader.pages)
271 if end_page > len(reader.pages): # no need to test start_page cause start_page > end_page is checked above
272 raise HandledException(f"-p: page range goes beyond pages of input file: {range_string}")
273 for old_page_num in range(start_page, end_page):
275 page = reader.pages[old_page_num]
276 pages_to_add += [page]
277 print(f"-i, -p: read in {input_file} page number {old_page_num+1} as new page {new_page_num}")
278 return pages_to_add, opened_files
281 def validate_inputs_second_pass(args, pages_to_add):
283 for c_string in args.crops:
284 page_range, _= split_crops_string(c_string)
286 start, end = parse_page_range(page_range, pages_to_add)
287 if end > len(pages_to_add):
288 raise HandledException(f"-c: page range goes beyond number of pages we're building: {page_range}")
290 for r in args.rotate_page:
291 if r > len(pages_to_add):
292 raise HandledException(f"-r: page number beyond number of pages we're building: {r}")
295 def rotate_pages(args_rotate_page, pages_to_add):
297 for rotate_page in args_rotate_page:
298 page = pages_to_add[rotate_page - 1]
299 page.add_transformation(pypdf.Transformation().translate(tx=-A4_WIDTH/2, ty=-A4_HEIGHT/2))
300 page.add_transformation(pypdf.Transformation().rotate(-90))
301 page.add_transformation(pypdf.Transformation().translate(tx=A4_WIDTH/2, ty=A4_HEIGHT/2))
302 print(f"-r: rotating (by 90°) page {rotate_page}")
305 def pad_pages_to_multiple_of_8(pages_to_add):
306 mod_to_8 = len(pages_to_add) % 8
308 old_len = len(pages_to_add)
309 for _ in range(8 - mod_to_8):
310 new_page = pypdf.PageObject.create_blank_page(width=A4_WIDTH, height=A4_HEIGHT)
311 pages_to_add += [new_page]
312 print(f"-n: number of input pages {old_len} not required multiple of 8, padded to {len(pages_to_add)}")
315 def normalize_pages_to_A4(pages_to_add):
316 for page in pages_to_add:
317 if "/Rotate" in page: # TODO: preserve rotation, but in canvas?
318 page.rotate(360 - page["/Rotate"])
319 page.mediabox.left = 0
320 page.mediabox.bottom = 0
321 page.mediabox.top = A4_HEIGHT
322 page.mediabox.right = A4_WIDTH
323 page.cropbox = page.mediabox
326 def collect_per_page_crops_and_zooms(args_crops, args_symmetry, pages_to_add):
327 crop_at_page = [PageCrop()] * len(pages_to_add)
329 for c_string in args_crops:
330 page_range, crops = split_crops_string(c_string)
331 start_page, end_page = parse_page_range(page_range, pages_to_add)
332 prefix = "-c, -t" if args_symmetry else "-c"
333 suffix = " (but alternating left and right crop between even and odd pages)" if args_symmetry else ""
334 page_crop = PageCrop(*[x for x in crops.split(',')])
335 print(f"{prefix}: to pages {start_page + 1} to {end_page} applying crop: {page_crop.format_in_cm}{suffix}")
336 for page_num in range(start_page, end_page):
337 if args_symmetry and page_num % 2:
338 crop_at_page[page_num] = page_crop.give_mirror()
340 crop_at_page[page_num] = page_crop
344 def build_single_pages_output(writer, pages_to_add, crop_at_page):
345 print("building 1-input-page-per-output-page book")
347 for i, page in enumerate(pages_to_add):
348 page.add_transformation(pypdf.Transformation().translate(tx=-crop_at_page[i].left, ty=-crop_at_page[i].bottom))
349 page.add_transformation(pypdf.Transformation().scale(crop_at_page[i].zoom, crop_at_page[i].zoom))
350 page.mediabox.right = crop_at_page[i].remaining_width * crop_at_page[i].zoom
351 page.mediabox.top = crop_at_page[i].remaining_height * crop_at_page[i].zoom
352 writer.add_page(page)
353 odd_page = not odd_page
354 print(f"built page number {i+1} (of {len(pages_to_add)})")
357 def build_nup4_output(writer, pages_to_add, crop_at_page, args_print_margin, args_analyze, canvas_class):
358 print("-n: building 4-input-pages-per-output-page book")
359 print(f"-m: applying printable-area margin of {args_print_margin}cm")
361 print("-a: drawing page borders, spine limits")
362 nup4_geometry = Nup4Geometry(args_print_margin)
363 pages_to_add, new_i_order = resort_pages_for_nup4(pages_to_add)
367 for i, page in enumerate(pages_to_add):
369 new_page = pypdf.PageObject.create_blank_page(width=A4_WIDTH, height=A4_HEIGHT)
370 corrected_i = new_i_order[i]
371 nup4_inner_page_transform(page, crop_at_page[corrected_i], nup4_geometry, nup4_i)
372 nup4_outer_page_transform(page, nup4_geometry, nup4_i)
373 new_page.merge_page(page)
375 print(f"merged page number {page_count} (of {len(pages_to_add)})")
378 ornate_nup4(writer, args_analyze, is_front_page, new_page, nup4_geometry, canvas_class)
379 writer.add_page(new_page)
381 is_front_page = not is_front_page
384 def resort_pages_for_nup4(pages_to_add):
390 for page in pages_to_add:
397 for n in PAGE_ORDER_FOR_NUP4:
398 new_i_order += [8 * n_eights + n]
399 new_page_order += [eight_pack[n]]
401 return new_page_order, new_i_order
404 def nup4_inner_page_transform(page, crop, nup4_geometry, nup4_i):
405 page.add_transformation(pypdf.Transformation().translate(ty=(A4_HEIGHT / crop.zoom - (A4_HEIGHT - crop.top))))
406 if nup4_i == 0 or nup4_i == 2:
407 page.add_transformation(pypdf.Transformation().translate(tx=-crop.left))
408 elif nup4_i == 1 or nup4_i == 3:
409 page.add_transformation(pypdf.Transformation().translate(tx=(A4_WIDTH / crop.zoom - (A4_WIDTH - crop.right))))
410 page.add_transformation(pypdf.Transformation().scale(crop.zoom * nup4_geometry.shrink_for_spine, crop.zoom * nup4_geometry.shrink_for_spine))
411 if nup4_i == 2 or nup4_i == 3:
412 page.add_transformation(pypdf.Transformation().translate(ty=-2*nup4_geometry.margin/nup4_geometry.shrink_for_margin))
415 def nup4_outer_page_transform(page, nup4_geometry, nup4_i):
416 page.add_transformation(pypdf.Transformation().translate(ty=(1-nup4_geometry.shrink_for_spine)*A4_HEIGHT))
417 if nup4_i == 0 or nup4_i == 1:
418 y_section = A4_HEIGHT
419 page.mediabox.bottom = A4_HALF_HEIGHT
420 page.mediabox.top = A4_HEIGHT
421 if nup4_i == 2 or nup4_i == 3:
423 page.mediabox.bottom = 0
424 page.mediabox.top = A4_HALF_HEIGHT
425 if nup4_i == 0 or nup4_i == 2:
427 page.mediabox.left = 0
428 page.mediabox.right = A4_HALF_WIDTH
429 if nup4_i == 1 or nup4_i == 3:
430 page.add_transformation(pypdf.Transformation().translate(tx=(1-nup4_geometry.shrink_for_spine)*A4_WIDTH))
432 page.mediabox.left = A4_HALF_WIDTH
433 page.mediabox.right = A4_WIDTH
434 page.add_transformation(pypdf.Transformation().translate(tx=x_section, ty=y_section))
435 page.add_transformation(pypdf.Transformation().scale(QUARTER_SCALE_FACTOR, QUARTER_SCALE_FACTOR))
438 def ornate_nup4(writer, args_analyze, is_front_page, new_page, nup4_geometry, canvas_class):
441 packet = io.BytesIO()
442 c = canvas_class(packet, pagesize=A4)
444 c.line(0, A4_HEIGHT, A4_WIDTH, A4_HEIGHT)
445 c.line(0, A4_HALF_HEIGHT, A4_WIDTH, A4_HALF_HEIGHT)
446 c.line(0, 0, A4_WIDTH, 0)
447 c.line(0, A4_HEIGHT, 0, 0)
448 c.line(A4_HALF_WIDTH, A4_HEIGHT, A4_HALF_WIDTH, 0)
449 c.line(A4_WIDTH, A4_HEIGHT, A4_WIDTH, 0)
451 new_pdf = pypdf.PdfReader(packet)
452 new_page.merge_page(new_pdf.pages[0])
453 printable_offset_x = nup4_geometry.margin
454 printable_offset_y = nup4_geometry.margin * A4_HEIGHT / A4_WIDTH
455 new_page.add_transformation(pypdf.Transformation().scale(nup4_geometry.shrink_for_margin, nup4_geometry.shrink_for_margin))
456 new_page.add_transformation(pypdf.Transformation().translate(tx=printable_offset_x, ty=printable_offset_y))
457 x_left_spine_limit = A4_HALF_WIDTH * nup4_geometry.shrink_for_spine
458 x_right_spine_limit = A4_WIDTH - x_left_spine_limit
459 if args_analyze or is_front_page:
460 packet = io.BytesIO()
461 c = canvas_class(packet, pagesize=A4)
465 c.line(x_left_spine_limit, A4_HEIGHT, x_left_spine_limit, 0)
466 c.line(x_right_spine_limit, A4_HEIGHT, x_right_spine_limit, 0)
469 draw_cut(c, x_left_spine_limit, (1))
470 draw_cut(c, x_right_spine_limit, (-1))
471 if args_analyze or is_front_page:
473 new_pdf = pypdf.PdfReader(packet)
474 new_page.merge_page(new_pdf.pages[0])
477 def draw_cut(canvas, x_spine_limit, direction):
478 outer_start_x = x_spine_limit - 0.5 * CUT_WIDTH * direction
479 inner_start_x = x_spine_limit + 0.5 * CUT_WIDTH * direction
480 middle_point_y = A4_HALF_HEIGHT + MIDDLE_POINT_DEPTH * direction
481 end_point_y = A4_HALF_HEIGHT + CUT_DEPTH * direction
482 canvas.line(inner_start_x, A4_HALF_HEIGHT, x_spine_limit, end_point_y)
483 canvas.line(x_spine_limit, end_point_y, x_spine_limit, middle_point_y)
484 canvas.line(x_spine_limit, middle_point_y, outer_start_x, A4_HALF_HEIGHT)
489 validate_inputs_first_pass(args)
492 from reportlab.pdfgen.canvas import Canvas
494 raise HandledException("-n: need reportlab.pdfgen.canvas installed for --nup4")
495 pages_to_add, opened_files = read_inputs_to_pagelist(args.input_file, args.page_range)
496 validate_inputs_second_pass(args, pages_to_add)
497 rotate_pages(args.rotate_page, pages_to_add)
499 pad_pages_to_multiple_of_8(pages_to_add)
500 normalize_pages_to_A4(pages_to_add)
501 crop_at_page = collect_per_page_crops_and_zooms(args.crops, args.symmetry, pages_to_add)
502 writer = pypdf.PdfWriter()
504 build_nup4_output(writer, pages_to_add, crop_at_page, args.print_margin, args.analyze, Canvas)
506 build_single_pages_output(writer, pages_to_add, crop_at_page)
507 for file in opened_files:
509 with open(args.output_file, 'wb') as output_file:
510 writer.write(output_file)
513 if __name__ == "__main__":
516 except HandledException as e:
517 handled_error_exit(e)