3 bookmaker.py is a helper for optimizing PDFs of books for the production of small self-printed, self-bound physical books. Towards this goal it offers various PDF manipulation options that may also be used indepéndently and for other purposes.
8 Concatenate two PDFs A.pdf and B.pdf to COMBINED.pdf:
9 bookmaker.py --input_file A.pdf --input_file B.pdf --output_file COMBINED.pdf
11 Produce OUTPUT.pdf containing all pages of (inclusive) page number range 3-7 from INPUT.pdf:
12 bookmaker.py -i INPUT.pdf --page_range 3-7 -o OUTPUT.pdf
14 Produce COMBINED.pdf from A.pdf's first 7 pages, B.pdf's pages except its first two, and all pages of C.pdf:
15 bookmaker.py -i A.pdf -p start-7 -i B.pdf -p 3-end -i C.pdf -o COMBINED.pdf
17 Crop each page 5cm from the left, 10cm from the bottom, 2cm from the right, and 0cm from the top:
18 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --crops "5,10,2,0"
20 Include all pages from INPUT.pdf, but crop pages 10-20 by 5cm each from bottom and top:
21 bookmaker.py -i INPUT.pdf -c "10-20:0,5,0,5" -o OUTPUT.pdf
23 Same crops for pages 10-20, but also crop all pages 30 and later by 3cm each from left and right:
24 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -c "10-20:0,5,0,5" -c "30-end:3,0,3,0"
26 Rotate by 90° pages 3, 5, 7; rotate page 7 once more by 90% (i.e. 180° in total):
27 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --rotate 3 -r 5 -r 7 -r 7
29 Initially declare 5cm crop from the left and 1cm crop from right, but alternate direction between even and odd pages:
30 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -c "5,0,1,0" -s
32 Quarter each OUTPUT.pdf page to carry 4 pages from INPUT.pdf, draw stencils into inner margins for cuts to carry binding strings:
33 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --nup4
35 Same --nup4, but define a printable-region margin of 1.3cm to limit the space for the INPUT.pdf pages in OUTPUT.pdf page quarters:
36 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -n --print_margin 1.3
38 Same --nup4, but draw lines marking printable-region margins, page quarts, spine margins:
39 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -n --analyze
43 For arguments like -p, page numbers are assumed to start with 1 (not 0, which is treated as an invalid page number value).
45 The target page shape so far is assumed to be A4 in portrait orientation; bookmaker.py normalizes all pages to this format before applying crops, and removes any source PDF /Rotate commands (for their production of landscape orientations).
47 For --nup4, the -c cropping instructions do not so much erase content outside the cropped area, but rather zoom into the page in a way that maximes the cropped area as much as possible into the available per-page area between printable-area margins and the borders to the other quartered pages. If the zoomed cropped area does not fit in neatly into its per-page area, this will preserve additional page content.
49 The --nup4 quartering puts pages into a specific order optimized for no-tumble duplex print-outs that can easily be folded and cut into pages of a small A6 book. Each unit of 8 pages from the source PDF is mapped thus onto two subsequent pages (i.e. front and back of a printed A4 paper):
58 To facilitate this layout, --nup4 also pads the input PDF pages to a total number that is a multiple of 8, by adding empty pages if necessary.
60 (To turn above double-sided example page into a tiny 8-page book: Cut the paper in two on its horizontal middle line. Fold the two halves by their vertical middle lines, with pages 3-2 and 7-6 on the folds' insides. This creates two 4-page books of pages 1-4 and pages 5-8. Fold them both closed and (counter-intuitively) put the book of pages 5-8 on top of the other one (creating a temporary page order of 5,6,7,8,1,2,3,4). A binding cut stencil should be visible on the top left of this stack – cut it out (with all pages folded together) to add the same inner-margin upper cut to each page. Turn around your 8-pages stack to find the mirror image of aforementioned stencil on the stack's back's bottom, and cut that out too. Each page now has binding cuts on top and bottom of its inner margins. Swap the order of both books (back to the final page order of 1,2,3,4,5,6,7,8), and you now have an 8-pages book that can be "bound" in its binding cuts through a rubber band or the like. Repeat with the next 8-pages double-page, et cetera. (Actually, with just 8 pages, the paper may curl under the pressure of a rubber band – but go up to 32 pages or so, and the result will become quite stable.)
66 from collections import namedtuple
68 def handled_error_exit(msg):
69 print(f"ERROR: {msg}")
75 handled_error_exit("Can't run at all without pypdf installed.")
77 # some general paper geometry constants
78 POINTS_PER_CM = 10 * 72 / 25.4
79 A4_WIDTH = 21 * POINTS_PER_CM
80 A4_HEIGHT = 29.7 * POINTS_PER_CM
81 A4 = (A4_WIDTH, A4_HEIGHT)
83 # constants specifically for --nup4
84 A4_HALF_WIDTH = A4_WIDTH / 2
85 A4_HALF_HEIGHT = A4_HEIGHT / 2
86 CUT_DEPTH = 1.95 * POINTS_PER_CM
87 CUT_WIDTH = 1.05 * POINTS_PER_CM
88 MIDDLE_POINT_DEPTH = 0.4 * POINTS_PER_CM
89 SPINE_LIMIT = 1 * POINTS_PER_CM
90 QUARTER_SCALE_FACTOR = 0.5
91 PAGE_ORDER_FOR_NUP4 = (3,0,7,4,1,2,5,6)
96 def __init__(self, left_cm=0, bottom_cm=0, right_cm=0, top_cm=0):
97 self.left_cm = left_cm
98 self.bottom_cm = bottom_cm
99 self.right_cm = right_cm
101 self.left = float(self.left_cm) * POINTS_PER_CM
102 self.bottom = float(self.bottom_cm) * POINTS_PER_CM
103 self.right = float(self.right_cm) * POINTS_PER_CM
104 self.top = float(self.top_cm) * POINTS_PER_CM
105 zoom_horizontal = A4_WIDTH / (A4_WIDTH - self.left - self.right)
106 zoom_vertical = A4_HEIGHT / (A4_HEIGHT - self.bottom - self.top)
107 if (zoom_horizontal > 1 and zoom_vertical < 1) or (zoom_horizontal < 1 and zoom_vertical > 1):
108 raise HandledException("-c: crops would create opposing zoom directions")
109 elif zoom_horizontal + zoom_vertical > 2:
110 self.zoom = min(zoom_horizontal, zoom_vertical)
112 self.zoom = max(zoom_horizontal, zoom_vertical)
115 return str(vars(self))
118 def format_in_cm(self):
119 return f"left {self.left_cm}cm, bottom {self.bottom_cm}cm, right {self.right_cm}cm, top {self.top_cm}cm"
122 def remaining_width(self):
123 return A4_WIDTH - self.left - self.right
126 def remaining_height(self):
127 return A4_HEIGHT - self.bottom - self.top
129 def give_mirror(self):
130 return PageCrop(left_cm=self.right_cm, bottom_cm=self.bottom_cm, right_cm=self.left_cm, top_cm=self.top_cm)
133 class PrintableMargin:
135 def __init__(self, size_cm):
136 self.size = size_cm * POINTS_PER_CM
137 self.zoom = (A4_WIDTH - 2 * self.size)/A4_WIDTH
140 class HandledException(Exception):
145 parser = argparse.ArgumentParser(description=__doc__, epilog=help_epilogue, formatter_class=argparse.RawDescriptionHelpFormatter)
146 parser.add_argument("-i", "--input_file", action="append", required=True, help="input PDF file")
147 parser.add_argument("-o", "--output_file", required=True, help="output PDF file")
148 parser.add_argument("-p", "--page_range", action="append", help="page range, e.g., '2-9' or '3-end' or 'start-14'")
149 parser.add_argument("-c", "--crops", action="append", help="cm crops left, bottom, right, top – e.g., '10,10,10,10'; prefix with ':'-delimited page range to limit effect")
150 parser.add_argument("-r", "--rotate_page", type=int, action="append", help="rotate page of number by 90° (usable multiple times on same page!)")
151 parser.add_argument("-s", "--symmetry", action="store_true", help="alternate horizontal crops between odd and even pages")
152 parser.add_argument("-n", "--nup4", action='store_true', help="puts 4 input pages onto 1 output page, adds binding cut stencil")
153 parser.add_argument("-a", "--analyze", action="store_true", help="in --nup4, print lines identifying spine, page borders")
154 parser.add_argument("-m", "--print_margin", type=float, default=0.43, help="print margin for --nup4 in cm (default 0.43)")
155 return parser.parse_args()
158 def validate_inputs_first_pass(args):
159 for filename in args.input_file:
160 if not os.path.isfile(filename):
161 raise HandledException(f"-i: {filename} is not a file")
163 with open(filename, 'rb') as file:
164 pypdf.PdfReader(file)
165 except pypdf.errors.PdfStreamError:
166 raise HandledException(f"-i: cannot interpret {filename} as PDF file")
168 for p_string in args.page_range:
169 validate_page_range(p_string, "-p")
170 if len(args.page_range) > len(args.input_file):
171 raise HandledException("-p: more --page_range arguments than --input_file arguments")
173 for c_string in args.crops:
174 initial_split = c_string.split(':')
175 if len(initial_split) > 2:
176 raise HandledException(f"-c: cropping string has multiple ':': {c_string}")
177 page_range, crops = split_crops_string(c_string)
178 crops = crops.split(",")
180 validate_page_range(page_range, "-c")
182 raise HandledException(f"-c: cropping does not contain exactly three ',': {c_string}")
187 raise HandledException(f"-c: non-number crop in: {c_string}")
189 for r in args.rotate_page:
193 raise HandledException(f"-r: non-integer value: {r}")
195 raise HandledException(f"-r: value must not be <1: {r}")
197 float(args.print_margin)
199 raise HandledException(f"-m: non-float value: {arg.print_margin}")
202 def validate_page_range(p_string, err_msg_prefix):
203 prefix = f"{err_msg_prefix}: page range string"
204 if '-' not in p_string:
205 raise HandledException(f"{prefix} lacks '-': {p_string}")
206 tokens = p_string.split("-")
208 raise HandledException(f"{prefix} has too many '-': {p_string}")
209 for i, token in enumerate(tokens):
212 if i == 0 and token == "start":
214 if i == 1 and token == "end":
219 raise HandledException(f"{prefix} carries value neither integer, nor 'start', nor 'end': {p_string}")
221 raise HandledException(f"{prefix} carries page number <1: {p_string}")
225 start = int(tokens[0])
229 if start > 0 and end > 0 and start > end:
230 raise HandledException(f"{prefix} has higher start than end value: {p_string}")
233 def split_crops_string(c_string):
234 initial_split = c_string.split(':')
235 if len(initial_split) > 1:
236 page_range = initial_split[0]
237 crops = initial_split[1]
240 crops = initial_split[0]
241 return page_range, crops
244 def parse_page_range(range_string, pages):
246 end_page = len(pages)
248 start, end = range_string.split('-')
249 if not (len(start) == 0 or start == "start"):
250 start_page = int(start) - 1
251 if not (len(end) == 0 or end == "end"):
253 return start_page, end_page
256 def read_inputs_to_pagelist(args_input_file, args_page_range):
260 for i, input_file in enumerate(args_input_file):
261 file = open(input_file, 'rb')
262 opened_files += [file]
263 reader = pypdf.PdfReader(file)
265 if args_page_range and len(args_page_range) > i:
266 range_string = args_page_range[i]
267 start_page, end_page = parse_page_range(range_string, reader.pages)
268 if end_page > len(reader.pages): # no need to test start_page cause start_page > end_page is checked above
269 raise HandledException(f"-p: page range goes beyond pages of input file: {range_string}")
270 for old_page_num in range(start_page, end_page):
272 page = reader.pages[old_page_num]
273 pages_to_add += [page]
274 print(f"-i, -p: read in {input_file} page number {old_page_num+1} as new page {new_page_num}")
275 return pages_to_add, opened_files
278 def validate_inputs_second_pass(args, pages_to_add):
280 for c_string in args.crops:
281 page_range, _= split_crops_string(c_string)
283 start, end = parse_page_range(page_range, pages_to_add)
284 if end > len(pages_to_add):
285 raise HandledException(f"-c: page range goes beyond number of pages we're building: {page_range}")
287 for r in args.rotate_page:
288 if r > len(pages_to_add):
289 raise HandledException(f"-r: page number beyond number of pages we're building: {r}")
292 def rotate_pages(args_rotate_page, pages_to_add):
294 for rotate_page in args_rotate_page:
295 page = pages_to_add[rotate_page - 1]
296 page.add_transformation(pypdf.Transformation().translate(tx=-A4_WIDTH/2, ty=-A4_HEIGHT/2))
297 page.add_transformation(pypdf.Transformation().rotate(-90))
298 page.add_transformation(pypdf.Transformation().translate(tx=A4_WIDTH/2, ty=A4_HEIGHT/2))
299 print(f"-r: rotating (by 90°) page {rotate_page}")
302 def pad_pages_to_multiple_of_8(pages_to_add):
303 mod_to_8 = len(pages_to_add) % 8
305 old_len = len(pages_to_add)
306 for _ in range(8 - mod_to_8):
307 new_page = pypdf.PageObject.create_blank_page(width=A4_WIDTH, height=A4_HEIGHT)
308 pages_to_add += [new_page]
309 print(f"-n: number of input pages {old_len} not required multiple of 8, padded to {len(pages_to_add)}")
312 def normalize_pages_to_A4(pages_to_add):
313 for page in pages_to_add:
314 if "/Rotate" in page: # TODO: preserve rotation, but in canvas?
315 page.rotate(360 - page["/Rotate"])
316 page.mediabox.left = 0
317 page.mediabox.bottom = 0
318 page.mediabox.top = A4_HEIGHT
319 page.mediabox.right = A4_WIDTH
320 page.cropbox = page.mediabox
323 def collect_per_page_crops_and_zooms(args_crops, args_symmetry, pages_to_add):
324 crop_at_page = [PageCrop()] * len(pages_to_add)
326 for c_string in args_crops:
327 page_range, crops = split_crops_string(c_string)
328 start_page, end_page = parse_page_range(page_range, pages_to_add)
329 prefix = "-c, -t" if args_symmetry else "-c"
330 suffix = " (but alternating left and right crop between even and odd pages)" if args_symmetry else ""
331 page_crop = PageCrop(*[x for x in crops.split(',')])
332 print(f"{prefix}: to pages {start_page + 1} to {end_page} applying crop: {page_crop.format_in_cm}{suffix}")
333 for page_num in range(start_page, end_page):
334 if args_symmetry and page_num % 2:
335 crop_at_page[page_num] = page_crop.give_mirror()
337 crop_at_page[page_num] = page_crop
341 def build_single_pages_output(writer, pages_to_add, crop_at_page):
342 print("building 1-input-page-per-output-page book")
344 for i, page in enumerate(pages_to_add):
345 page.add_transformation(pypdf.Transformation().translate(tx=-crop_at_page[i].left, ty=-crop_at_page[i].bottom))
346 page.add_transformation(pypdf.Transformation().scale(crop_at_page[i].zoom, crop_at_page[i].zoom))
347 page.mediabox.right = crop_at_page[i].remaining_width * crop_at_page[i].zoom
348 page.mediabox.top = crop_at_page[i].remaining_height * crop_at_page[i].zoom
349 writer.add_page(page)
350 odd_page = not odd_page
351 print(f"built page number {i+1} (of {len(pages_to_add)})")
353 def build_nup4_output(writer, pages_to_add, crop_at_page, args_print_margin, args_analyze, canvas_class):
354 print("-n: building 4-input-pages-per-output-page book")
355 print(f"-m: applying printable-area margin of {args_print_margin}cm")
357 print("-a: drawing page borders, spine limits")
358 printable_margin = PrintableMargin(args_print_margin)
359 spine_part_of_page = (SPINE_LIMIT / A4_HALF_WIDTH) / printable_margin.zoom
360 bonus_shrink_factor = 1 - spine_part_of_page
361 pages_to_add, new_i_order = resort_pages_for_nup4(pages_to_add)
365 for i, page in enumerate(pages_to_add):
366 if nup4_position == 0:
367 new_page = pypdf.PageObject.create_blank_page(width=A4_WIDTH, height=A4_HEIGHT)
368 corrected_i = new_i_order[i]
369 nup4_inner_page_transform(page, crop_at_page[corrected_i], bonus_shrink_factor, printable_margin, nup4_position)
370 nup4_outer_page_transform(page, bonus_shrink_factor, nup4_position)
371 new_page.merge_page(page)
373 print(f"merged page number {page_count} (of {len(pages_to_add)})")
375 if nup4_position > 3:
376 ornate_nup4(writer, args_analyze, is_front_page, new_page, printable_margin, bonus_shrink_factor, canvas_class)
377 writer.add_page(new_page)
379 is_front_page = not is_front_page
382 def resort_pages_for_nup4(pages_to_add):
388 for page in pages_to_add:
395 for n in PAGE_ORDER_FOR_NUP4:
396 new_i_order += [8 * n_eights + n]
397 new_page_order += [eight_pack[n]]
399 return new_page_order, new_i_order
402 def nup4_inner_page_transform(page, crop, bonus_shrink_factor, printable_margin, nup4_position):
403 page.add_transformation(pypdf.Transformation().translate(ty=(A4_HEIGHT / crop.zoom - (A4_HEIGHT - crop.top))))
404 if nup4_position == 0 or nup4_position == 2:
405 page.add_transformation(pypdf.Transformation().translate(tx=-crop.left))
406 elif nup4_position == 1 or nup4_position == 3:
407 page.add_transformation(pypdf.Transformation().translate(tx=(A4_WIDTH / crop.zoom - (A4_WIDTH - crop.right))))
408 page.add_transformation(pypdf.Transformation().scale(crop.zoom * bonus_shrink_factor, crop.zoom * bonus_shrink_factor))
409 if nup4_position == 2 or nup4_position == 3:
410 page.add_transformation(pypdf.Transformation().translate(ty=-2*printable_margin.size/printable_margin.zoom))
413 def nup4_outer_page_transform(page, bonus_shrink_factor, nup4_position):
414 page.add_transformation(pypdf.Transformation().translate(ty=(1-bonus_shrink_factor)*A4_HEIGHT))
415 if nup4_position == 0 or nup4_position == 1:
416 y_section = A4_HEIGHT
417 page.mediabox.bottom = A4_HALF_HEIGHT
418 page.mediabox.top = A4_HEIGHT
419 if nup4_position == 2 or nup4_position == 3:
421 page.mediabox.bottom = 0
422 page.mediabox.top = A4_HALF_HEIGHT
423 if nup4_position == 0 or nup4_position == 2:
425 page.mediabox.left = 0
426 page.mediabox.right = A4_HALF_WIDTH
427 if nup4_position == 1 or nup4_position == 3:
428 page.add_transformation(pypdf.Transformation().translate(tx=(1-bonus_shrink_factor)*A4_WIDTH))
430 page.mediabox.left = A4_HALF_WIDTH
431 page.mediabox.right = A4_WIDTH
432 page.add_transformation(pypdf.Transformation().translate(tx=x_section, ty=y_section))
433 page.add_transformation(pypdf.Transformation().scale(QUARTER_SCALE_FACTOR, QUARTER_SCALE_FACTOR))
436 def ornate_nup4(writer, args_analyze, is_front_page, new_page, printable_margin, bonus_shrink_factor, canvas_class):
439 packet = io.BytesIO()
440 c = canvas_class(packet, pagesize=A4)
442 c.line(0, A4_HEIGHT, A4_WIDTH, A4_HEIGHT)
443 c.line(0, A4_HALF_HEIGHT, A4_WIDTH, A4_HALF_HEIGHT)
444 c.line(0, 0, A4_WIDTH, 0)
445 c.line(0, A4_HEIGHT, 0, 0)
446 c.line(A4_HALF_WIDTH, A4_HEIGHT, A4_HALF_WIDTH, 0)
447 c.line(A4_WIDTH, A4_HEIGHT, A4_WIDTH, 0)
449 new_pdf = pypdf.PdfReader(packet)
450 new_page.merge_page(new_pdf.pages[0])
451 printable_offset_x = printable_margin.size
452 printable_offset_y = printable_margin.size * A4_HEIGHT / A4_WIDTH
453 new_page.add_transformation(pypdf.Transformation().scale(printable_margin.zoom, printable_margin.zoom))
454 new_page.add_transformation(pypdf.Transformation().translate(tx=printable_offset_x, ty=printable_offset_y))
455 x_left_spine_limit = A4_HALF_WIDTH * bonus_shrink_factor
456 x_right_spine_limit = A4_WIDTH - x_left_spine_limit
457 if args_analyze or is_front_page:
458 packet = io.BytesIO()
459 c = canvas_class(packet, pagesize=A4)
463 c.line(x_left_spine_limit, A4_HEIGHT, x_left_spine_limit, 0)
464 c.line(x_right_spine_limit, A4_HEIGHT, x_right_spine_limit, 0)
467 draw_cut(c, x_left_spine_limit, (1))
468 draw_cut(c, x_right_spine_limit, (-1))
469 if args_analyze or is_front_page:
471 new_pdf = pypdf.PdfReader(packet)
472 new_page.merge_page(new_pdf.pages[0])
475 def draw_cut(canvas, x_spine_limit, direction):
476 outer_start_x = x_spine_limit - 0.5 * CUT_WIDTH * direction
477 inner_start_x = x_spine_limit + 0.5 * CUT_WIDTH * direction
478 middle_point_y = A4_HALF_HEIGHT + MIDDLE_POINT_DEPTH * direction
479 end_point_y = A4_HALF_HEIGHT + CUT_DEPTH * direction
480 canvas.line(inner_start_x, A4_HALF_HEIGHT, x_spine_limit, end_point_y)
481 canvas.line(x_spine_limit, end_point_y, x_spine_limit, middle_point_y)
482 canvas.line(x_spine_limit, middle_point_y, outer_start_x, A4_HALF_HEIGHT)
487 validate_inputs_first_pass(args)
490 from reportlab.pdfgen.canvas import Canvas
492 raise HandledException("-n: need reportlab.pdfgen.canvas installed for --nup4")
493 pages_to_add, opened_files = read_inputs_to_pagelist(args.input_file, args.page_range)
494 validate_inputs_second_pass(args, pages_to_add)
495 rotate_pages(args.rotate_page, pages_to_add)
497 pad_pages_to_multiple_of_8(pages_to_add)
498 normalize_pages_to_A4(pages_to_add)
499 crop_at_page = collect_per_page_crops_and_zooms(args.crops, args.symmetry, pages_to_add)
500 writer = pypdf.PdfWriter()
502 build_nup4_output(writer, pages_to_add, crop_at_page, args.print_margin, args.analyze, Canvas)
504 build_single_pages_output(writer, pages_to_add, crop_at_page)
505 for file in opened_files:
507 with open(args.output_file, 'wb') as output_file:
508 writer.write(output_file)
511 if __name__ == "__main__":
514 except HandledException as e:
515 handled_error_exit(e)